dbt Python model#
To incorporate Python in our workflow, we will use Fal. For more information on Fal, please refer to the Fal documentation. The latest version of dbt support python models on its own, however these are models running directly inside vendors: e.g. Snowpark or Databricks. Due to limits of this workshops, we are running python model locally and for this purpose Fal AI is perfect choice.
Setup all things needed#
fal python models:
Create a new folder called
mlin the models/marts folder. This is where we will store our ML models.Inside the
mlfolder, create a new file called_ml.ymlwith model and source definition._ml.ymlversion: 2 sources: - name: ml schema: dbt_ml tables: - name: customer_registration_prediction models: - name: customer_registration_prediction meta: fal: scripts: after: - customer_registration_prediction.py
Also add the dbt model itself
customer_registration_prediction.sqlwith simple query inside.customer_registration_prediction.sqlselect 1
fal python scripts:
Navigate to
scriptsfolder in dbt_demo folder and add prediction script calledcustomer_registration_prediction.pyin it.customer_registration_prediction.pyimport pandas as pd from sklearn.linear_model import LogisticRegression ref_df = ref('customers') # fill missing values with 0 ref_df.fillna(0, inplace=True) # Extract the input and output variables X = ref_df[["no_of_orders", "total_amount"]] y = ref_df["is_registered"] # Create a logstic regression model model = LogisticRegression() # Fit the model to the data model.fit(X, y) # Print the intercept and coefficient print('Intercept:', model.intercept_) print('Coefficient:', model.coef_) ref_df['is_predicted_to_register'] = model.predict(X) # Upload a `pandas.DataFrame` back to the datawarehouse write_to_model(ref_df[['customer_id','is_predicted_to_register']])
Append fal var setting into
dbt_project.ymlso fal-ai knows where to find them.Append fal ai setting into
dbt_project.ymlvars: fal-models-paths: "models/marts/ml" fal-scripts-path: "scripts"
fal python models: Ready to run it
First run the selected new dbt models
dbt run --select marts.ml.*Take a look at the created table
dbt_ml.customer_registration_predictionThen run the python script itself
fal runNow you can take a look at the table again.
[BONUS] Add staging ml folder and incorporate all the way up into
customerstable so you manager can see it in his dashboard!