A Step-by-Step Walkthrough: Implementing PSM in Python

In previous article, we learned the magic of Propensity Score Matching (PSM) —how it finds “statistical twins” to mimic a randomized experiment. Now, it’s time to open the toolbox and perform the magic ourselves.

This hands-on tutorial will walk you through implementing PSM in Python to solve our running business problem: did the in-store flyer campaign for a credit card cause higher customer churn?

We’ll use a synthetic dataset for clarity and reproducibility. By the end, you’ll have a template you can adapt for your own causal inference problems.

The complete code is available on my GitHub: propensity_score

Step 0: Setup and Importing Libraries

We’ll use a simple stack: pandas for data handling, sklearn for modeling, and the excellent causalinference library for the heavy lifting. You can install it via pip install causalinference.

import pandas as pd  
import numpy as np  
from sklearn.linear_model import LogisticRegression  
from sklearn.neighbors import KNeighborsClassifier  
import statsmodels.api as sm  
import matplotlib.pyplot as plt  
import logging  
from causalinference import CausalModel

# For reproducibility
np.random.seed(42)

Step 1: Create a Synthetic Dataset

Let’s simulate data for 5000 customers. We’ll create:

age: Millennials (18-38)
credit_score: A score around 650
income: Annual income
region: A region code (1-3)
saw_flyer: Our treatment (1 if they used the flyer promo code, 0 otherwise)
churn: Our outcome (1 if they churned within 12 months, 0 otherwise)

Crucially, we will design the data so that seeing the flyer is influenced by age, credit_score, and region (confounders), and churn is influenced by those same confounders AND the treatment.

data/generate_data.py

Step 2: Estimate the Propensity Score

We use a logistic regression to predict saw_flyer based on the confounders. This gives each customer their propensity score.

# main
logger.info("Loading data...")  
df = pd.read_csv('../data/syntetic_data.csv')  
# Define covariates and treatment  
X = df[['age', 'credit_score', 'income', 'region']]  
X = sm.add_constant(X)  # add intercept  
treatment = df['saw_flyer']  
outcome = df['churn']  
  
# Fit logistic regression model  
logger.info("Fit Logistic Regression...")  
ps_model = PropensityScoreModel(y=outcome,d=treatment, x=X, model=LogisticRegression(random_state=42))  
  
# Predict propensity scores  
df['propensity_score'] = ps_model.predict_propensity_score()  
  
# Let's see the distribution  
df.groupby('saw_flyer')['propensity_score'].hist(alpha=0.7, bins=20)  
plt.legend(['Control', 'Treated'])  
plt.title("Propensity Score Distribution")  
plt.xlabel("Propensity Score")  
plt.ylabel("Frequency")  
plt.show()

This plot shows the overlap. We have a good foundation for matching!

Step 3: Perform the Matching

Now for the core of PSM. We’ll use the CausalModel class from the causalinference library, which simplifies the process.

# Matching  
match_pred = ps_model.get_neighbors(df)  
#Calculate ATE  
logger.info(f"PSM ATE: {ps_model.ate()}")  
  
##Caluclate with CausalInference Lib  
causal = CausalModel(Y=df['churn'].values, D=df['saw_flyer'].values,  
                     X=df[['age', 'credit_score', 'income', 'region']].values )  
causal.reset()  
  
causal.est_propensity_s()  
causal.est_via_matching(matches=1, weights="maha", bias_adj=True)  
logger.info(f"{causal.estimates}")

Output:

INFO PSM ATE: -0.0592
Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE     -0.071      0.022     -3.193      0.001     -0.115     -0.028
           ATC     -0.076      0.025     -3.085      0.002     -0.124     -0.028
           ATT     -0.069      0.026     -2.705      0.007     -0.119     -0.019

The library calculates the Average Treatment Effect (ATE) for us: -0.071. But we MUST check balance before trusting this number.

Step 4: The Critical Step - Checking Balance

Did the matching actually make our groups comparable? We must check. This is the most important step.

# This gives us a summary of the balance before and after matching
logger.info(f"{causal.summary_stats}")

The output will show a table with standardized mean differences for each covariate. After matching, the absolute differences should be well below 0.05 (5%). This shows our matching successfully created balanced groups.

Step 5: Estimate the Treatment Effect

Once we’ve verified balance, we can trust our estimate. Our model output an ATE of -0.071.

Business Interpretation: After controlling for age, credit score, income, and region, acquiring a customer through the in-store flyer campaign caused a 7.1 percentage point decrease in the probability of churn.

This is a massive, statistically significant (p < 0.001) effect. This campaign is attracting high-quality customers. The business decision might be to applied this marketing campaign across the country.

Conclusion and Next Steps

You’ve just completed a full causal inference analysis!

You defined a business problem.
You identified confounders.
You estimated propensity scores.
You matched treated and control units.
You validated your model with a balance check.
You interpreted the causal effect.

Remember, PSM only controls for observed confounders. An unmeasured variable (e.g., “financial literacy”) could still bias our results.

As discussed in our previous article, a robustness check would be to also implement Inverse Probability Weighting (IPW) and see if the estimate is similar.

In the next article, we’ll complete our toolkit with a tour of other powerful methods like Difference-in-Differences and Instrumental Variables.

Try it yourself! Clone the repo, run the code, and change the parameters. What happens if you change the true treatment effect in the synthetic data? [propensity_score](https://github.com/jonatanmendez29/practicalCausalInference-project/tree/main/propensity_score