Introduction
In this project we discuss the matching for observational data as well as the optimization on matching using graph theory.
Three Modes of Statistical Inference
Descriptive Inference: summarizing and exploring data
Predictive Inference : forecasting out-of-sample data points
Causal Inference: predicting counterfactuals
Matching in Causal Inference
Consider an observed sample of units taken from a finite population of .
In particular, the causal effect for individual is the comparison of individual ’s outcome if individual receives the treatment (the potential outcome under treatment), , and individual ’s outcome if individual receives the control (the potential outcome under control), .
Treatment effect: .
for each individual, we can observe only one of these potential outcomes, because each unit (each individual at a particular point in time) will receive either treatment or control, not both.
we do matching for the th treated individual.
If there is no imbalance between treatment and control then .
Regression Model:
where = Causal variable and = Weight.
Regression Model
where = Causal variable and = Weight.
The Problem Matching Solves:
Without Matching:
With Matching
Distance Measures
Exact:
Mahalanobis:
If interest is in the ATT, is the variance covariance matrix of in the full control group; if interest is in the ATE then is the variance covariance matrix of in the pooled treatment and full control groups. If contains categorical variables they should be converted to a series of binary indicators, although the distance works best with continuous variables.
Propensity score: , where is the propensity score for individual .
Propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment.
Linear propensity score:
It found that matching on the linear propensity score can be particularly effective in terms of reducing bias
Goals in matching
Sample Average Treatment Effect (SATE):
Population Average Treatment Effect (PATE):
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-5-94c9da70a756> in <module>()
1 import networkx as nx
2 import matplotlib.pyplot as plt
----> 3 import sympy
4 from pandas import DataFrame
5 from sympy import *
/projects/sage/sage-7.5/local/lib/python2.7/site-packages/sympy/__init__.py in <module>()
47 from .logic import *
48 from .assumptions import *
---> 49 from .polys import *
50 from .series import *
51 from .functions import *
/projects/sage/sage-7.5/local/lib/python2.7/site-packages/sympy/polys/__init__.py in <module>()
3 __all__ = []
4
----> 5 from . import polytools
6 __all__.extend(polytools.__all__)
7 from .polytools import *
/projects/sage/sage-7.5/local/lib/python2.7/site-packages/sympy/polys/polytools.py in <module>()
52 from mpmath.libmp.libhyper import NoConvergence
53
---> 54 from sympy.polys.domains import FF, QQ, ZZ
55 from sympy.polys.constructor import construct_domain
56
/projects/sage/sage-7.5/local/lib/python2.7/site-packages/sympy/polys/domains/__init__.py in <module>()
3 __all__ = []
4
----> 5 from . import domain
6 __all__.extend(domain.__all__)
7 from .domain import *
ImportError: cannot import name domain
Steps in implementing matching methods
Matching methods have four key steps (Stuart (2010)), with the first three representing the “design” and the fourth the “analysis:”
Defining “closeness”: the distance measure used to determine whether an individual is a good match for another,
Implementing a matching method, given that measure of closeness,
Assessing the quality of the resulting matched samples, and perhaps iterating with Steps (1) and (2) until well-matched samples result, and
Analysis of the outcome and estimation of the treatment effect, given the matching done in Step (3).
Matching in Randomized experiment
Randomized experiments use a known randomized assignment mechanism to ensure “balance” of the covariates between the treated and control groups: The groups will be only randomly different from one another on all covariates, observed and unobserved.
Matching problem in observational data
An observational study is an attempt to estimate the effects of a treatment when, for ethical or practical reasons, it is not possible to randomly assign units to treatment or control.
Let us consider we have treated and control units, . A match pair is an ordered pair , indicating th treated unit is matched with th control.
Different types of matching: \emph{complete matched-pair sample} and \emph{incomplete matched-pair sample}.
A pair is closely matched if is close to are in some sense, for instance close in terms of distance or rank.
In a matched sample , the total distance between matched pairs, is one measure of the quality of the matched sample.
Example: Optimal Matching Versus a Greedy Heuristic
The example uses the now-familiar data on 26 U.S. light water nuclear power plant
The distance between two plants is defined in terms of two covariates: the date the construction permit was issued and the capacity of the power plant
3 | 5 | 9 | 18 | 20 | 22 | 24 | |
---|---|---|---|---|---|---|---|
1 | 28 | 24 | 10 | 7 | 17 | 20 | 14 |
2 | 0 | 3 | 18 | 28 | 20 | 31 | 32 |
4 | 3 | 0 | 14 | 24 | 16 | 28 | 29 |
6 | 22 | 22 | 18 | 8 | 32 | 35 | 30 |
7 | 14 | 1 | 4 | 14 | 18 | 20 | 18 |
8 | 30 | 27 | 12 | 2 | 26 | 29 | 24 |
10 | 17 | 14 | 5 | 10 | 20 | 22 | 17 |
11 | 28 | 26 | 11 | 6 | 18 | 20 | 16 |
12 | 26 | 24 | 9 | 12 | 12 | 14 | 9 |
13 | 28 | 24 | 10 | 0 | 24 | 26 | 22 |
14 | 20 | 16 | 14 | 24 | 0 | 12 | 12 |
15 | 22 | 19 | 12 | 22 | 2 | 9 | 10 |
16 | 23 | 20 | 5 | 4 | 20 | 22 | 17 |
17 | 26 | 23 | 14 | 24 | 6 | 5 | 6 |
19 | 21 | 18 | 22 | 32 | 7 | 15 | 16 |
21 | 18 | 16 | 10 | 20 | 4 | 12 | 14 |
23 | 34 | 31 | 16 | 18 | 14 | 9 | 4 |
25 | 40 | 37 | 22 | 16 | 20 | 11 | 5 |
26 | 28 | 25 | 28 | 38 | 14 | 12 | 17 |
Comparing Greedy and Optimal Matching with an example
Suppose there are N treated units having covariate values 2,4,...,2N and an equal number of potential controls having covariate values 1-, 3-,...,2N-1-, where >0 is vanishingly small. Suppose that the absolute difference in the covariate values is used as the measure of distance. Then, 2 is slightly closer to 3- and to 1-, and so forth.
Greedy Matching: 2 with 3-, 3 with 4-, and so on, and is finally forced to pair 2N with the only unmatched treated unit, namely 1-. If the covariate were age,this would mean matching the oldest treated unit to the youngest control. The total absolute difference within N pairs is (N-1)+(2N-1).
Optimal Matching: The optimal procedure pairs 2 with 1-, 3 with 2-, and so on, for a total distance of N.
So, the percent increase in distance due to using greedy rather than optiomal matching is 200% as N goes to , so greedy can be quite poor in large problems as well as small ones.
###A paper by Paul R. Rosenbaum Journal of the American Statistical Association Vol. 84, No. 408 (Dec., 1989), pp. 1024-1032
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-10-70472bff3804> in <module>()
----> 1 Image(filename='Greedy and optimal3.jpg')
NameError: name 'Image' is not defined
Comparing Greedy and Optimal Matching
First, there is the obvious point that the optimal match is always as good as and often better than the greedy match.
The second point, though distinct, is closely related to the first. Although a greedy algorithm (like forward stepwise regression) may provide a tolerable answer, it rarely comes with a guarantee that the answer is in fact tolerable. Greedy matching can be arbitrarily poor compared to optimal matching
greedy grabs the (1, 1) match at a cost of 0, can never reconsider, and is forced to pay a cost of for the (2, 2) match. Of course, the optimal match is (1, 2) and (2, 1) with a cost of .
The third reason for preferring an optimal match is relevant when there are calipers, such as the propensity score caliper. Calipers forbid the matching of certain treated units to certain controls. When there are calipers, a complete pair matching may or may not exist. Even if a complete pair matching exists, greedy may not find it, whereas the optimal matching procedure will always find a complete pair matching if it exist.
Modeling Matchings by using bipartite graph
We uses bipartite graph to modele machings. Bipartite graphs very often arise naturally. There are many real-world problems that can be formed as Bipartite Matching. For example, consider the following examples:
Real-life Examples of Bipartite Matching
1-Online dating:
Assume maching.com receive applications from 4 men['Adam','Bob','Alex','Jeff'] amd 4 women['Sarah','Rita','Ema','Ana'] after filling in their details, the match.com algorithem work out the following:
-Jeff appears to be compatible with all 4 women
-Alex appears to be compatible with Rita
-Bob appears to be compatible with Rita and Sarah
-Adam appears to be compatible with Sarah and Ana
Model the situation by drawing a Bipartite graph
2-Employment:
There are 4 job applicants (['Adam','Bob','Alex','Jeff']) and 4 jobs([baker, Chef, bartender, Cashier]}. Each applicant has a subset of jobs that he is interested in.
Each job opening can only accept one applicant and a job applicant can be appointed for only one job.
3-talent show:
Bipartite Matching:
A Bipartite Graph G = (V, E) is a graph in which the vertex set V can be divided into two disjoint subsets L and R such that every edge e ∈ E has one end point in L and the other end point in R . So no edge connects two vertices of the same set. A matching M is a subset of edges such that each node in V appears in at most one edge in M.
A balanced bipartite graph is one that has an equal number of left and right vertices.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-2d03a2c6b31a> in <module>()
----> 1 Image(filename='bipartite.jpg')
NameError: name 'Image' is not defined
Maximum Bipartite Matching
A matching in a Bipartite Graph is a set of the edges chosen in such a way that no two edges share an endpoint.
A maximum matching is a matching of maximum size (maximum number of edges).
In a maximum matching, if any edge is added to it, it is no longer a matching.
There can be more than one maximum matchings for a given Bipartite Graph.
In first example: we assigns men to women such that maximum marriages occurs. In the second example finding an assignment of jobs to applicants in such that as many applicants as possible get jobs.
Maximum weighted Matching in Bipartite graphs
In a weighted bipartite graph, each edge has an associated value.
A maximum weighted bipartite matching is defined as a matching where the sum of the values of the edges in the matching have a maximal value.
So we want to match those pairs of nodes in a way that the total weight of matched edges is maximum.
Lets get back to our problem
An Example in observational studies: "control and treated power plant"
" The example uses the now-familiar data on 26 U.S. light water nuclear power plants, as collected by W. E. Mooz and as reported by Cox and Snell (1981). (Excluded are the six "partial turnkey" plants, whose costs may contain hidden subsidies.) Seven of the plants were constructed on a site at which a light water reactor had previously existed; they are the treated units. Each such unit will be matched with two controls from among the remaining 19 plants. A comparison of the costs of the treated and control plants might be the basis for thinking about the advantages or disadvantages of building a new plant at an existing site."
In what follows we plot the graph in a bipartite layout
(By assigning the coordinates to each node. Treated nodes in right column and Controlled in left.)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-4aafd108ea0c> in <module>()
----> 1 T_list = list(T)
2 C_list = list(C)
3 One = np.ones(len(T_list))
4 Z = np.zeros(len(C_list))
5 pos1 = zip(10*One, range(len(T_list)))
NameError: name 'T' is not defined
Problems with solving max weighted matching
There is not any polynomial time method to solve maximom weighted matching in a general graph. Fortunetly, for a bipartite graph we can solve a problem with polinomial time!
A well-known Joke!
Joke! To tell a difference between a mathematician and an engineer, perform this experiment. Put an empty kettle in the middle of the kitchen floor and tell your subjects to boil some water.
The engineer will fill the kettle with water, put it on the stove, and turn the flame on. The mathematician will do the same thing.
Next, put the kettle already filled with water on the stove, and ask the subjects to boil the water. The engineer will turn the flame on. The mathematician will empty the kettle and put it in the middle of the kitchen floor... thereby reducing the problem to one that has already been solved!
We Do the same trick, and we transform problem of maximum weighted matching to a well-known problem that is already solved: max flow problem.
Transforming Max Weighted Matching to Max Flow
Maximum Bipartite Matching (MBP) problem can be solved by converting it into a flow network
Following are the steps.
Build a Flow Network
There must be a source and sink in a flow network.
So we add a source and add edges from source to all applicants. Such that and , and
Capacity (maximum possible flow) in every link is . Because each link can carry a matching.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-edf74b91b0c2> in <module>()
----> 1 B = nx.DiGraph()
2 B.add_nodes_from(['Sarah','Rita','Ema','Ana'], bipartite=0) # Add the node attribute "bipartite"
3 B.add_nodes_from(['Adam','Bob','Alex','Jeff'])
4 B.add_edges_from([('Sarah','Adam'), ('Sarah','Bob'), ('Rita','Bob'), ('Rita','Alex'), ('Ana','Adam'),('Sarah','Jeff'),('Rita','Jeff'),('Ema','Jeff'),('Ana','Jeff')])
5 B = B.reverse()
NameError: name 'nx' is not defined
Solving our problem
NetworkX has bult-in algorithms to find maximum weigthted matching and maximum cardinality matching.
Find the maximum flow.
We use Ford-Fulkerson algorithm to find the maximum flow in the flow network built in step 1.
The maximum flow is actually the MBP we are looking for.
◮A simple and practical max-flow algorithm
◮ Main idea: find valid flow paths until there is none left, and add them up
◮ How do we know if this gives a maximum flow? – Proof sketch: Suppose not. Take a maximum flow f⋆ and“subtract” our flow f. It is a valid flow of positive total flow. By the flow decomposition, it can be decomposed into flow paths and circulations. These flow paths must have been found by Ford-Fulkerson. Contradiction
Back Edges
We don’t need to maintain the amount of flow on each edge but work with capacity values directly
◮ If f amount of flow goes through u → v, then:
– Decrease c(u → v) by f
– Increase c(v → u) by f
◮ Why do we need to do this?
– Sending flow to both directions is equivalent to canceling flow
Ford-Fulkerson Pseudocode
◮ Set ftotal = 0
◮ Repeat until there is no path from s to t:
– Run DFS from s to find a flow path to t
– Let f be the minimum capacity value on the path
– Add f to ftotal
– For each edge u → v on the path:
◮ Decrease c(u → v) by f
◮ Increase c(v → u) by f
Analysis
◮ Assumption: capacities are integer-valued
◮ Finding a flow path takes Θ(n + m) time
◮ We send at least 1 unit of flow through the path
◮ If the max-flow is f⋆ , the time complexity is O((n + m)f⋆)
– “Bad” in that it depends on the output of the algorithm
– Nonetheless, easy to code and works well in practice
Computing Min-Cut
◮ We know that max-flow is equal to min-cut
◮ And we now know how to find the max-flow
◮ Question: how do we find the min-cut?
◮ Answer: use the residual graph
◮ “Subtract” the max-flow from the original graph