-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve matching approach #13
Comments
Another solution is to follow the SPC approach: do categorical matching iteratively, and after each iteration relax the constraints slightly. This should result in better matching at the household level. Example:
This should pprovide better results than the current match_categorical implementation where we only match once, and have to sacrifice some variables to improve matching (as shown here). In the final round, all households that are yet to be matched can be matched either randomly, or to a household with values close to the mean additional arguments to pass to
|
@sgreenbury the statistical matching approach I mentioned is in the ile-de-france project: link. I haven't tried to use the pipeline in the ile de france project, but it's well documented. Maybe it's something we should explore |
I've just found the following description in this paper:
I like the step taken to avoid overfitting. They do statistical matching, but it can also be applied to categorical matching at the household level, and we would have a threshold for minimum number of matches |
Notes on implementation of
New approach - Sample of 15,000 households (columns used here):
|
The current approach to matching the SPC to the NTS is:
Categorical matching is inflexible and some households in the SPC don't have any exact matches in the NTS (see here for matching results. It would be better to do Propensity Score Matching at the Household level from the beginning. This would ensure that each Household in the SPC is matched to at least one household in the NTS
Tools
The matchit R package is very comprehensive. It has different matching algorithms, and also allows you to specify different calipers for each covariate. This is very handy because we might want to be stricter on some covariates than others (e.g. for households, we may want the household size to match exactly, but be more forgiving on household income)
I didn't find a python library that has the same functionality as matchIt. In psmpy, you can only provide one caliper based on the overall distance
The text was updated successfully, but these errors were encountered: