Documentation & tests - _stg_[sourcename].yml#

To properly documentate and test our newly created models, we will create _stg_[sourcename].yml file for each data source we have (e.g. 1 file for jaffle_shop source data and 1 file for google_analytics data). These files are stored in models/staging/[sourcename] folder and provide a way to define the properties of dbt models (such as description and used tests) in a structured, human-readable format.

For example, a typical YAML configuration file for a model might look something like this:

version: 2

models:
 - name: my_model
   description: This is my model.
   columns:
     - name: column_1
       description: This is column 1.
       tests:
         - unique
         - not_null
     - name: column_2
       description: This is column 2.
       tests:
         - not_null

Defining description#

We can define description for each model and column. It can be either simple one line description or if you have a long description, you can use docs block and create additional markdown file in the same folder:

  • _stg_[sourcename].yml

version: 2

models:
  - name: orders
    description: This table has basic information about orders, as well as some derived facts based on payments

    columns:
      - name: status
        description: '{{ doc("orders_status") }}'
  • docs.md

{% docs orders_status %}

Orders can be one of the following statuses:

| status         | description                                                               |
|----------------|---------------------------------------------------------------------------|
| placed         | The order has been placed but has not yet left the warehouse              |
| shipped        | The order has ben shipped to the customer and is currently in transit     |
| completed      | The order has been received by the customer                               |
| returned       | The order has been returned by the customer and received at the warehouse |


{% enddocs %}

Defining tests#

Additionally, in _stg_[sourcename].yml we can add tests that for example ensure a column contains no duplicates (unique) or zero null values (not_null). Once these tests are defined, you can validate their correctness by running dbt test in command line.

version: 2

models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null

To check your test, run dbt test command in terminal

title

Summary#

📝 example for _stg_jaffle_shop.sql

version: 2

models:
 - name: base_jaffle_shop_customers
   description: Basic information about customers.
   columns:
     - name: customer_id
       tests:
         - unique
         - not_null
 - name: base_jaffle_shop_orders
   description: Basic information about orders.
   columns:
     - name: order_id
       tests:
         - unique
         - not_null
 - name: base_jaffle_shop_payments
   description: Basic information about payments.
   columns:
     - name: payment_id
       tests:
         - unique
         - not_null
 - name: stg_jaffle_shop_customers
   description: Contains information about customers.
   columns:
     - name: customer_id
       tests:
         - unique
         - not_null
   ...