dbt Python model#
To incorporate Python in our workflow, we will use Fal. For more information on Fal, please refer to the Fal documentation. The latest version of dbt support python models on its own, however these are models running directly inside vendors: e.g. Snowpark or Databricks. Due to limits of this workshops, we are running python model locally and for this purpose Fal AI is perfect choice.
Setup all things needed#
fal python models:
Create a new folder called
ml
in the models/marts folder. This is where we will store our ML models.Inside the
ml
folder, create a new file called_ml.yml
with model and source definition._ml.yml
version: 2 sources: - name: ml schema: dbt_ml tables: - name: customer_registration_prediction models: - name: customer_registration_prediction meta: fal: scripts: after: - customer_registration_prediction.py
Also add the dbt model itself
customer_registration_prediction.sql
with simple query inside.customer_registration_prediction.sql
select 1
fal python scripts:
Navigate to
scripts
folder in dbt_demo folder and add prediction script calledcustomer_registration_prediction.py
in it.customer_registration_prediction.py
import pandas as pd from sklearn.linear_model import LogisticRegression ref_df = ref('customers') # fill missing values with 0 ref_df.fillna(0, inplace=True) # Extract the input and output variables X = ref_df[["no_of_orders", "total_amount"]] y = ref_df["is_registered"] # Create a logstic regression model model = LogisticRegression() # Fit the model to the data model.fit(X, y) # Print the intercept and coefficient print('Intercept:', model.intercept_) print('Coefficient:', model.coef_) ref_df['is_predicted_to_register'] = model.predict(X) # Upload a `pandas.DataFrame` back to the datawarehouse write_to_model(ref_df[['customer_id','is_predicted_to_register']])
Append fal var setting into
dbt_project.yml
so fal-ai knows where to find them.Append fal ai setting into
dbt_project.yml
vars: fal-models-paths: "models/marts/ml" fal-scripts-path: "scripts"
fal python models: Ready to run it
First run the selected new dbt models
dbt run --select marts.ml.*
Take a look at the created table
dbt_ml.customer_registration_prediction
Then run the python script itself
fal run
Now you can take a look at the table again.
[BONUS] Add staging ml folder and incorporate all the way up into
customers
table so you manager can see it in his dashboard!