Creating historized table - [model_name]_snapshot.sql
#
In our example, we will use dbt snapshot functionality to create Slowly Changing Dimensions (SCD).
type-2 Slowly Changing Dimensions
A Slowly Changing Dimension is a dimension table that includes historical data, allowing you to track changes to dimension data over time. Type 2 SCDs add a new row to the dimension table whenever a change occurs. This new row contains the updated data while the old row is marked as inactive. This allows you to keep a historical record of changes to the dimension data over time.
How dbt snapshot works#
We have an orders table where the status field can be overwritten as the order is processed.
order_id |
status |
order_date |
---|---|---|
1 |
placed |
2018-01-01 |
Now, the order goes from “placed” to “shipped”. That same record will now look like:
order_id |
status |
order_date |
---|---|---|
1 |
shipped |
2018-01-01 |
To prevent losing the primary information about placed state, we use dbt snapshots which will add a new row containing the most recent state together with two new columns dbt_valid_from and dbt_valid_to:
order_id |
status |
order_date |
dbt_valid_from |
dbt_valid_to |
---|---|---|---|---|
1 |
placed |
2018-01-01 |
2018-01-01 |
2018-01-04 |
1 |
shipped |
2018-01-01 |
2018-01-04 |
null |
Creating snapshots#
Create a
[model_name]_snapshot.sql
file insidesnapshots
folder.Use a snapshot block:
{% snapshot orders_snapshot %} {% endsnapshot %}
Insert select statement which defined what you want to snapshot. Don’t forget to use
source
orref
function here.{% snapshot orders_snapshot %} select * from {{ source('jaffle_shop', 'orders') }} {% endsnapshot %}
Add snapshot configuration using config block:
{% snapshot orders_snapshot %} {{ config( target_schema='snapshots', unique_key='order_id', strategy='check', check_cols=['status'], ) }} select * from {{ source('jaffle_shop', 'orders') }} {% endsnapshot %}
Config block:
target_schema (required)
: where to store snapshot tablesunique_key (required)
: primary key columnstrategy (required)
: either check or timestampcheck_cols
: array of columns to check for changes or ‘all’ to check all columnsupdated_at
: If using the timestamp strategy, the timestamp column to compare
Run command
dbt snapshot
in the terminal to create snapshots