From 29b3ce4a7209828bd7368fd44a7aad5735cb573b Mon Sep 17 00:00:00 2001 From: Dipika Ranabhat Date: Mon, 4 May 2026 15:57:32 -0500 Subject: [PATCH] docs: add data lineage section to general-usage/destination-tables MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes #3875 — blog post at dlthub.com/blog/dlt-lineage-support linked to general-usage/destination-tables#data-lineage which was a 404. Added the file with a proper Data lineage section covering load IDs, row-level lineage (_dlt_id/_dlt_parent_id), schema versioning, and a usage example. --- general-usage/destination-tables.md | 35 +++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 general-usage/destination-tables.md diff --git a/general-usage/destination-tables.md b/general-usage/destination-tables.md new file mode 100644 index 0000000000..7f8b823f6c --- /dev/null +++ b/general-usage/destination-tables.md @@ -0,0 +1,35 @@ +# Destination tables & lineage + +> **Full documentation lives at:** [dlthub.com/docs/general-usage/destination-tables](https://dlthub.com/docs/general-usage/destination-tables) + +## Data lineage + +Data lineage can be super relevant for architectures like the [data vault architecture](https://www.data-vault.co.uk/what-is-data-vault/) or when troubleshooting. The data vault architecture is a data warehouse that large organizations use when representing the same process across multiple systems, which adds data lineage requirements. Using the pipeline name and `load_id` provided out of the box by `dlt`, you are able to identify the source and time of data. + +You can save complete lineage info for a particular `load_id` including a list of loaded files, error messages (if any), elapsed times, and schema changes. This can be helpful, for example, when troubleshooting problems. + +### Load IDs + +Each pipeline run produces a unique `load_id` (a Unix timestamp). This ID appears in every top-level table row and in the `_dlt_loads` system table, letting you trace exactly when and from which source each record was loaded. + +### Row-level lineage + +Every row in every table gets a `_dlt_id` column — a unique, stable identifier. Child (nested) tables reference their parent rows via `_dlt_parent_id`, forming a complete audit trail from source to destination. + +### Schema versioning + +dlt tracks schema changes using a content-based `version_hash`. You can correlate a `load_id` to the schema version active at that time, enabling column-level lineage: you can assign the origin of any column to a specific load package, identified by source and time. + +### Saving lineage info + +```py +import dlt + +pipeline = dlt.pipeline(pipeline_name="my_pipeline", destination="duckdb") +load_info = pipeline.run(my_source()) + +# Persist load info back into the destination for lineage tracking +pipeline.run([load_info], write_disposition="append", table_name="load_info") +``` + +For full details see the [hosted documentation](https://dlthub.com/docs/general-usage/destination-tables#data-lineage).