PSM vs. IPW: A Practical Guide to Choosing Your Causal Method
So, you’ve decided to use Propensity Scores to find causal answers in your observational data. You set out to implement Propensity Score Matching (PSM), and then you hit a common roadblock: you read something like this:
PSM is biased and it is difficult to derive its variance… a better approach is to use Inverse Probability Weighting (IPW). - Matheus Facure, Causal Inference in Python
This is a fantastic and valid point. It leaves many analysts wondering: “Did I choose the wrong method? Should I always use IPW instead?”
The answer, like most things in data science, is “it depends.” Today, we’ll break down the philosophical difference between these two sister methods, their strengths, their weaknesses, and when you might choose one over the other.
The Common Ground: The Propensity Score
First, remember the goal. Both PSM and IPW use the propensity score — the probability of receiving treatment given observed covariates — to adjust for confounding and estimate a causal effect.
Their shared purpose is to control for pre-treatment differences between groups to mimic randomization. Where they differ is their strategy for achieving this.
The Core Difference: Throwing Away vs. Re-weighting
Imagine our dataset of credit card customers, some acquired via flyer (treated) and some not (control).
Propensity Score Matching (PSM) takes a subset-based approach. - Its Philosophy: “I will find a control customer who is a twin for every treated customer. I will then throw away any data points that don’t have a good twin.” - The Result: You analyze a smaller, but presumably well-balanced, dataset of matched pairs.
Inverse Probability Weighting (IPW) takes a weighting-based approach.
- Its Philosophy: “I will keep every single data point. However, I will re-weight them to create a synthetic population where the treatment is independent of the covariates.”
- How? It gives higher weight to control units that look like the treated group (i.e., those with a high propensity score) and lower weight to control units that are very different. It does the inverse for the treatment group.
- The Result: You analyze the entire dataset, but each row is weighted by 1 / propensity_score for treated units and 1 / (1 - propensity_score) for control units.
Addressing the Criticism: Why PSM Can Be Problematic
Facure’s criticism is well-founded. Let’s break it down:
- Bias: PSM can be biased if the “common support” condition is violated (i.e., if we try to match treated and control units that are too different). By throwing away data, we might be changing the population we are making inferences about. The estimated effect from the matched sample might not generalize to the entire treated group.
- Variance: Deriving the statistical variance (and thus confidence intervals) for a PSM estimate is complex because the matching process itself introduces uncertainty. The standard errors you get from simply running a regression on the matched sample are too small because they ignore the fact that the matched sample was estimated itself.
So, Is IPW Always the Answer? Not Quite
While IPW doesn’t throw away data and has more straightforward variance estimation, it has a notorious Achilles’ heel:
Extreme Propensity Scores: If your propensity model produces scores very close to 0 or 1, the weights can explode to infinity. A single data point with a propensity score of 0.999 would get a weight of . This gives that one point an enormous influence over the final result, leading to highly unstable and erratic estimates.
IPW requires very careful propensity score model specification and often needs techniques like weight trimming to mitigate this issue.
The Practical Verdict: Which One Should You Use?
Here’s a simple decision framework based on your dataset:
| Use Propensity Score Matching (PSM) when… | Use Inverse Probability Weighting (IPW) when… | |
|---|---|---|
| Goal | You want to analyze a clearly defined, comparable subpopulation. | You want to estimate an effect for the entire original population. |
| Data | You have a large control pool and are confident in finding good matches. | Your data has good overlap and no extreme propensity scores. |
| Audience | You need to present intuitive, visually convincing results (balance tables are very easy to understand). | Your audience is technically sophisticated and understands weighting. |
| Pros | Intuitive, easy to check balance, creates a clean cohort. | Uses all data, variance estimation is more straightforward. |
| Cons | Can discard data, variance estimation is tricky. | Highly sensitive to model misspecification and extreme scores. |
The Best Practice: In many advanced applications, the answer is to use both and see if they give similar results. If they do, you can be more confident in your findings. If they don’t, it’s a sign you need to check your propensity model or the overlap of your data.
Up Next: The PSM Tutorial
For the upcoming hands-on tutorial in next article, I will focus on implementing Propensity Score Matching in Python. I chose PSM for the tutorial because its output —a clean balance table showing how it improved covariate distribution— is incredibly intuitive and visual for learning the core concepts.
However, a complete analysis would often involve checking IPW estimates as a robustness check. Remember, no single method is a silver bullet. The true skill is understanding the toolbox and knowing which wrench to grab.