A Step-by-Step Walkthrough: Implementing PSM in Python
In previous article, we learned the magic of Propensity Score Matching (PSM) —how it finds “statistical twins” to mimic a randomized experiment. Now, it’s time to open the toolbox and perform the magic ourselves.
This hands-on tutorial will walk you through implementing PSM in Python to solve our running business problem: did the in-store flyer campaign for a credit card cause higher customer churn?
We’ll use a synthetic dataset for clarity and reproducibility. By the end, you’ll have a template you can adapt for your own causal inference problems.
The complete code is available on my GitHub: propensity_score
Step 0: Setup and Importing Libraries
We’ll use a simple stack: pandas for data handling, sklearn for modeling, and the excellent causalinference library for the heavy lifting. You can install it via pip install causalinference.
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
import statsmodels.api as sm
import matplotlib.pyplot as plt
import logging
from causalinference import CausalModel
# For reproducibility
np.random.seed(42)
Step 1: Create a Synthetic Dataset
Let’s simulate data for 5000 customers. We’ll create:
age: Millennials (18-38)credit_score: A score around 650income: Annual incomeregion: A region code (1-3)saw_flyer: Our treatment (1 if they used the flyer promo code, 0 otherwise)churn: Our outcome (1 if they churned within 12 months, 0 otherwise)
Crucially, we will design the data so that seeing the flyer is influenced by age, credit_score, and region (confounders), and churn is influenced by those same confounders AND the treatment.
data/generate_data.py
Step 2: Estimate the Propensity Score
We use a logistic regression to predict saw_flyer based on the confounders. This gives each customer their propensity score.
# main
logger.info("Loading data...")
df = pd.read_csv('../data/syntetic_data.csv')
# Define covariates and treatment
X = df[['age', 'credit_score', 'income', 'region']]
X = sm.add_constant(X) # add intercept
treatment = df['saw_flyer']
outcome = df['churn']
# Fit logistic regression model
logger.info("Fit Logistic Regression...")
ps_model = PropensityScoreModel(y=outcome,d=treatment, x=X, model=LogisticRegression(random_state=42))
# Predict propensity scores
df['propensity_score'] = ps_model.predict_propensity_score()
# Let's see the distribution
df.groupby('saw_flyer')['propensity_score'].hist(alpha=0.7, bins=20)
plt.legend(['Control', 'Treated'])
plt.title("Propensity Score Distribution")
plt.xlabel("Propensity Score")
plt.ylabel("Frequency")
plt.show()
This plot shows the overlap. We have a good foundation for matching!
Step 3: Perform the Matching
Now for the core of PSM. We’ll use the CausalModel class from the causalinference library, which simplifies the process.
# Matching
match_pred = ps_model.get_neighbors(df)
#Calculate ATE
logger.info(f"PSM ATE: {ps_model.ate()}")
##Caluclate with CausalInference Lib
causal = CausalModel(Y=df['churn'].values, D=df['saw_flyer'].values,
X=df[['age', 'credit_score', 'income', 'region']].values )
causal.reset()
causal.est_propensity_s()
causal.est_via_matching(matches=1, weights="maha", bias_adj=True)
logger.info(f"{causal.estimates}")
Output:
INFO PSM ATE: -0.0592
Treatment Effect Estimates: Matching
Est. S.e. z P>|z| [95% Conf. int.]
--------------------------------------------------------------------------------
ATE -0.071 0.022 -3.193 0.001 -0.115 -0.028
ATC -0.076 0.025 -3.085 0.002 -0.124 -0.028
ATT -0.069 0.026 -2.705 0.007 -0.119 -0.019
The library calculates the Average Treatment Effect (ATE) for us: -0.071. But we MUST check balance before trusting this number.
Step 4: The Critical Step - Checking Balance
Did the matching actually make our groups comparable? We must check. This is the most important step.
# This gives us a summary of the balance before and after matching
logger.info(f"{causal.summary_stats}")
The output will show a table with standardized mean differences for each covariate. After matching, the absolute differences should be well below 0.05 (5%). This shows our matching successfully created balanced groups.
Step 5: Estimate the Treatment Effect
Once we’ve verified balance, we can trust our estimate. Our model output an ATE of -0.071.
Business Interpretation: After controlling for age, credit score, income, and region, acquiring a customer through the in-store flyer campaign caused a 7.1 percentage point decrease in the probability of churn.
This is a massive, statistically significant (p < 0.001) effect. This campaign is attracting high-quality customers. The business decision might be to applied this marketing campaign across the country.
Conclusion and Next Steps
You’ve just completed a full causal inference analysis!
- You defined a business problem.
- You identified confounders.
- You estimated propensity scores.
- You matched treated and control units.
- You validated your model with a balance check.
- You interpreted the causal effect.
Remember, PSM only controls for observed confounders. An unmeasured variable (e.g., “financial literacy”) could still bias our results.
As discussed in our previous article, a robustness check would be to also implement Inverse Probability Weighting (IPW) and see if the estimate is similar.
In the next article, we’ll complete our toolkit with a tour of other powerful methods like Difference-in-Differences and Instrumental Variables.
Try it yourself! Clone the repo, run the code, and change the parameters. What happens if you change the true treatment effect in the synthetic data? [propensity_score](https://github.com/jonatanmendez29/practicalCausalInference-project/tree/main/propensity_score