Tuesday, July 16, 2013

Some more evidence that Florida's 'Stand Your Ground' law increased firearm homicide rates

The acquittal of George Zimmerman in the shooting death of Trayvon Martin has rightly turned attention to the permissiveness of Florida's self-defense laws. Although the state's 2005 "Stand your Ground" law was not used by the defense, it nevertheless framed the Zimmerman case from the very beginning. References to "Stand your Ground" and self-defense were included in the judge's instructions to the jury, and in a post-verdict interview, one of the jurors admitted that the law factored into their decision.

Under the common law "Castle doctrine" principle, individuals facing an imminent threat of death or bodily harm do not have a duty to retreat and may respond with force when in one's home. Stand Your Ground laws (SYG) generally extend this principle to any location where a person has a legal right to be and allow the use of deadly force in self-defense when an individual is presumed to have a "reasonable fear" of death or severe bodily injury. Since the passage of Florida's law in 2005, over thirty states have followed suit and adopted similar expansions of the Castle doctrine.

By definition, SYG laws make homicide less costly by providing the attacker with an additional legal defense. Indeed, as expected, these laws are associated with greater numbers of homicides that are ruled "justifiable." More troubling is that determinations of "justifiability" exhibit a stark racial bias in both SYG and non-SYG states - white-on-black killings are the most likely to be ruled justifiable, with while black-on-white killings are the least likely.

Defenders of SYG laws argue that, although homicides are more likely to be ruled justified, SYG can be expected to reduce the overall rate of homicide and violent crime. By permitting persons being attacked to retaliate in full force rather than retreating, SYG laws theoretically increase the costs of committing a violent offense. Even if justifiable homicides increase, defenders would argue that these homicides substitute for otherwise non-justifiable homicides. The net homicide and violent crime rates, in the presence of SYG laws, should decrease.

Two recent studies find the opposite. Far from deterring homicide, SYG laws increase its incidence. Moreover, the laws have no appreciable deterrent effect on violent crime. Analyzing data from the FBI's Uniform Crime Reporting system, Cheng and Hoekstra find that SYG laws lead to roughly an 8% increase in reported murders and non-negligent manslaughters. McClellan and Terkin find a similar effect on firearm homicides and firearm accidents using monthly data from the CDC. These findings are consistent with a different understanding of the incentives generated by Stand Your Ground. Rather than increase the costs of violence, SYG laws decrease them by expanding the range of legal defenses available to an attacker. Because of the vagueness of the "presumption of reasonable fear," and the absence of many third-party witnesses, SYG laws stack the deck in favor of an assailant by raising the prosecution's evidentiary burden (as was made clear in the Zimmerman trial).

Whether SYG laws increase or reduce murder rates is an important policy question deserving of further study. The Cheng/Hoekstra and McClellan/Terkin papers provide convincing evidence, but it is always valuable to re-examine any scientific finding using different approaches and methods. Both of these studies use standard panel regression techniques to estimate the causal effect of Stand Your Ground laws while controlling for other potential confounders. While parametric regression is an ubiquitous and powerful tool for causal inference, it is a very model-dependent approach. This can sometimes lead to misleading conclusions when the model gets too far away from the data.

Any approach to figuring out whether some "treatment" T causes Y relies on comparing the factual (what actually happened) to the counterfactual (what would have happened had T been different). The fundamental problem of causal inference is that we can never observe the counterfactual - we only see what happened. Statistical approaches to determining causality rely on estimating an appropriate counterfactual from the data. The ideal counterfactual is a case that is identical to the "factual" one on all relevant characteristics except for T. However, such cases are often lacking. There is no exact copy of Florida somewhere in the U.S. that did not pass Stand Your Ground. Ideally we would like to pick the closest case possible, but even then such a case may be nonexistent, particularly when potential confounding variables for which we would like to control are highly correlated with our treatment. The counterfactual may be a case that has never been seen before. When regression techniques are used to estimate these "extreme" counterfactuals, they rely on extrapolation outside of the scope of the observed data. As Gary King and Langche Zeng show, such extrapolations are highly dependent on often indefensible modeling assumptions that become more and more tenuous as one gets further and further away from the data. Slight alterations to the model can yield drastically different results. Moreover, typical ways of presenting regression results (tables of coefficients) rarely make the counterfactual apparent. It is very difficult to get a sense of the extent to which the results in an empirical paper are based on extrapolation. While robustness checks help, basic regression papers often obscure the factual/counterfactual comparison on which a causal claim is based. This is not to say that regression is useless or that the Cheng/Hoekstra and McClellan/Terkin results are fundamentally flawed. However, it is worthwhile to see whether the finding holds when using a different approach to causal inference.

Instead of regression, I use the Synthetic Control method developed by Abadie, Diamond and Hainmueller to estimate the effect of Florida's 2005 Stand Your Ground law on firearm homicide rates. This method has been used to evaluate comparable state-level interventions. Abadie and Gardeazebal (2003) use it to measure the effect of terrorism on economic growth in the Basque Country while Abadie et.al. (2010) assess the impact of California's Proposition 99 on cigarette sales. Synthetic control methods compare the factual time series of the outcome variable in a unit exposed to the treatment (Florida) with a "synthetic" counterfactual constructed by weighting a set of "donor" units not exposed to the treatment (states without SYG) such that the synthetic control matches the factual unit as closely as possible on potential confounding variables and pre-treatment outcomes. By forcing the weights to be positive and sum to one, this method ensures that the estimated counterfactual stays within the bounds of the data, thereby guarding against extrapolation. The intuition is that a combination of control states can approximate the counterfactual of "Florida without Stand Your Ground" better than any one state. The "synthetic" Florida provides a baseline for comparing homicide rates after SYG was implemented in 2005. It would certainly be possible to use the synthetic control approach to evaluate the effect of SYG in other states. However, I focus here on Florida because it was the earliest to enact such a law and has the most years for which the effects of SYG can be observed.

I use state-level mortality data from the CDC's Wonder database to construct a measure of per-capita firearm homicides for each state in years 2000 to 2010. Following the lists in Cheng/Hoekstra and McClellan/Terkin I also obtain a set of state-level covariates from Census, BLS and DOJ data sources related to age and racial composition of the population, poverty, median income, urbanization, unemployment, incarceration, and federal police presence. All of the covariates are measured in 2000 - prior to the start of the time-series.

The rapid adoption of SYG laws after 2005 unfortunately limits the set of "donor" states available for constructing the synthetic control. Only 22 states do not have a "Stand Your Ground"-equivalent law in force during the 2000-2010 period: Arkansas, California, Colorado, Connecticut, Delaware, Hawaii, Iowa, Maine, Maryland, Massachusetts, Minnesota, Nebraska, Nevada, New Jersey, New Mexico, New York, North Carolina, Oregon, Pennsylvania, Rhode Island, Vermont, Wisconsin. Nevada, North Carolina and Pennsylvania passed SYG laws in 2011. Additionally, because of data privacy concerns, the CDC does not report data for regions where a sufficiently small number of events occurred, which further constrains the total set of viable donor states. Nevertheless, the pool of donors is able to provide a reasonable synthetic counterfactual for Florida.

Florida's SYG was passed in October, 2005 meaning that it really only affected years 2006 onward. Matching Florida to the pool of controls on the set of covariates and on firearm homicide rates from 2000 to 2005 yields a synthetic counterfactual that reasonably approximates Florida's pre-SYG homicide patterns. The figure below plots the actual trajectory of Florida's firearm homicide rate relative to the path followed by the synthetic Florida sans-SYG. Homicide rates in actual and synthetic Florida match up rather well in the 2000-2005 period. However, from 2006-2010, the factual and counterfactual diverge dramatically. Florida's firearm homicide rate sees a huge increase from 2006 to 2007, while the synthetic rate begins to decline. Although rates drop from 2007 to 2010, they remain significantly higher than they would have been had SYG not been in place. The results suggest that that Florida experienced about 1-1.5 more annual homicides from 2006-2010 than it would have had Stand Your Ground not been implemented.

Firearm homicide rates in Florida - Actual vs. Synthetic Control

As with all statistical techniques, it's important to evaluate how unlikely it is that the observed pattern was generated purely by randomness. That is, how significant is this result? Although there are no specific parameters and standard errors to estimate, one can get a sense of the "statistical significance" of the apparent effect of SYG using placebo tests on our donor pool. A placebo test applies the same synthetic control techniques to cases known to be unaffected by the treatment. The resulting distribution of "placebo effects" gives a sense of  the types of patterns that we would see under the hypothesis of no effect - that is, pure randomness. If the pattern exhibited by Florida appears unusual relative to then one can be relatively confident that it is not due to chance.

Firearm homicide rates in Florida - Placebo tests
(Discards states with pre-2006 MSPE five times higher than Florida's)

The figure above plots the gap in firearm homicide rates between the actual time series and the estimated synthetic control for Florida and for each of the control states. Relative to the distribution of relevant placebos, the Florida effect stands out post-2006. Florida's is the most unusual line in the set and from 2007-2010 shows a positive deviation from the control greater than any of the placebo tests. Although the pool of control states is somewhat small, limiting the number of possible placebo tests, the trajectory of Florida's homicide rate is certainly unusual and difficult to attribute to pure chance.

Supporters of Florida's law point to reductions in the violent crime rate since 2005 as evidence that the law's deterrent effect is working. However, just looking at a trend as evidence of causation makes no sense - in order to assign causality, one needs to make a comparison with some counterfactual case. Violent crime rates in Florida have been overall declining since 2000, so it is unlikely that the downward trend would not have existed had SYG not been passed.

Unfortunately, it is difficult to evaluate whether SYG reduced violent crime rates using a synthetic control approach because no good counterfactual exists in the data. Florida generally has some of the highest violent crime rates in the country and they are consistently higher than those of any of the states in the donor pool (New Mexico is close, but still lower). As a consequence, it is impossible to find any combination of control states that consistently match Florida's pre-2005 trend. Any counterfactual for Florida's overall violent crime rate would rely heavily on extrapolation outside of the data.

While these results are certainly not definitive (the relative novelty of SYG laws limits the number of periods under observation), they corroborate existing findings. Florida's Stand Your Ground law did not have a deterrence effect on homicide, and may in fact have increased the state's murder rate. This and other evidence strongly suggests that state governments should re-think their approach to self-defense laws. While politically appealing from a "tough on crime" perspective, Stand Your Ground laws likely do much more harm than good.

Edit 7/17 - Fixed broken links

Sunday, June 23, 2013

I guessed wrong (kind of)

In a recent post, I argued that Edward Snowden's extradition from Hong Kong was likely. Now this happens:
US whistle-blower Edward Snowden has left Hong Kong and is on a commercial flight to Russia, but Moscow will not be his final destination. 
The fugitive whistle-blower boarded the Moscow-bound flight earlier on Sunday and would continue on to another country, possibly Cuba then Venezuela, according to media reports. 
The Hong Kong government said in a statement that Snowden had departed "on his own accord for a third country through a lawful and normal channel".
 The Hong Kong government played the politics of this case very well. From their press release
The US Government earlier on made a request to the HKSAR Government for the issue of a provisional warrant of arrest against Mr Snowden. Since the documents provided by the US Government did not fully comply with the legal requirements under Hong Kong law, the HKSAR Government has requested the US Government to provide additional information so that the Department of Justice could consider whether the US Government's request can meet the relevant legal conditions. As the HKSAR Government has yet to have sufficient information to process the request for provisional warrant of arrest, there is no legal basis to restrict Mr Snowden from leaving Hong Kong.
Provisions that allow for requests of "additional information" are common in many extradition treaties. Certainly its not known what was in the documents provided by the United States government and precisely how they did not comply with Hong Kong law, but it is very clear that this was the easiest way to deny extradition without explicitly refusing it. Snowden's case is a particularly challenging one, given that the U.S. chose to indict Snowden under the Espionage act. The Hong Kong government may have a strong argument that the initial documents were insufficient, even if it's unlikely that the United States will believe it. The novelty of this case makes a request for additional information perfectly legitimate, even if  convenient given Snowden's subsequent departure. While the H.K.-U.S. legal cooperation may have been somewhat slighted, the HK government's decision is unlikely to affect its relationships with other states since they held to the letter and intent of the treaty and, more importantly, China did not appear to overtly intervene.

So, I was wrong...somewhat. In the original post, I included the caveat that my prediction assumed Snowden would not choose to flee Hong Kong, since a long and drawn-out extradition process would also give him ample time to escape. I thought that for Snowden, transit to a non-extraditable third country was undesirable, otherwise he would have simply gone there in the first place. However, it appears that self-preservation ultimately won out and Venezuela is Snowden's new destination. lthough the United States does have an extradition treaty with Venezuela, it dates back to 1923. Moreover, the U.S. and Venezuela historically have had an extremely rocky relationship over legal cooperation particularly after the high profile refusal of the U.S. to extradite accused airline bomber Luis Posada Carriles .The current Venezuelan government certainly would oppose Snowden's transfer, but even if it chose to dust off the old agreement and comply with its exact provisions, the United States would have an extremely weak case. Unlike the U.S.-HK agreement, it does not have a mutual criminality clause, meaning that the only offenses that are extraditable are the ones explicitly listed in the agreement. The list is very short and extremely dated ("willfull and unlawful destruction or obstruction of railroads" is the 6th offense on the list). Espionage would certainly be treated as a political offense, and under Article 3, even offenses connected to a political offense (in Snowden's case, "theft of government property") are non-extraditable. Despite the treaty, Snowden seems to be untouchable while in Venezuela. [EDIT: Apparently Snowden is instead seeking asylum in Ecuador]

While it is impossible to know what would have happened had Snowden stayed in HK, his flight does suggest that he did not believe that his defense against extradition would have been successful. The HKSAR's hands are more tied than are Venezuela's. All things being equal, Snowden would certainly have preferred to stay in HK rather than Venezuela. However, the Hong Kong government seemed to have made it clear that it could not hold out against extradition for much longer without putting its legal arrangements into much more serious jeopardy. That China chose not to explicitly intervene at the outset does illustrate that international law does operate as a constraint, even if states can strategically use it to their advantage. Delaying extradition while offloading Snowden to a less constrained third party was an inexpensive way of satisfying the Chinese government's preference against extradition while minimizing damage to Hong Kong's international legal standing.

The closing paragraph of the press release is also absolutely perfect from a political standpoint
Meanwhile, the HKSAR Government has formally written to the US Government requesting clarification on earlier reports about the hacking of computer systems in Hong Kong by US government agencies. The HKSAR Government will continue to follow up on the matter so as to protect the legal rights of the people of Hong Kong.
Translation: "I am altering the deal, pray I don't alter it any further"

Monday, June 17, 2013

Marginal Effect Plots for Interaction Models in R

Political scientists often want to test hypotheses regarding interactive relationships. Typically, a theory might imply that the effect of one variable on another depends on the value of some third quantity. For example, political structures like institutional rules might mediate the effect of individual preferences on political behavior. Scholars using regression to test these types of hypotheses will include interaction terms in their models. These models take on the basic form (in the linear case)

$$y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_1x_2 + \epsilon$$

where $\beta_3$ is the coefficient on the "interaction term" $x_1x_2$. However, interaction terms are often tricky to work with. Bear Braumoeller's 2004 article in International Organization illustrated how published quantitative papers often made basic mistakes in interpreting interaction models. Scholars frequently would mis-interpret the lower-order coefficients, $\beta_1$ and $\beta_2$. Published research articles would argue that a significant coefficient on $x_1$  suggests a meaningful relationship between $x_1$ and $y$. In a model with an interaction term, this is not necessarily the case. The marginal effect of $x_1$ on $y$ in a linear model is not equal to $\beta_1$, it is actually $\beta_1 + \beta_2x_2$. That is, $\beta_1$, is the relationship between $x_1$ and $y$ when $x_2$ is zero. Often this is a meaningless quantity since some variables (for example, the age of a registered voter) cannot possibly equal zero.

The correct way to interpret an interaction model is to plot out the relationship between $x_1$ and $y$ for the possible values of $x_2$. It's not a matter of simply looking at a single coefficient and declaring a positive or negative effect. Even if the interaction coefficient $\beta_3$ is significant, the actual meaning of the interaction can differ. One interpretation may be that $x_1$ is always positively related with y, but the effect is greater for some values of $x_2$. Another is that $x_1$ is sometimes positively associated with $y$ and sometimes negatively associated with y, depending on the value of $x_2$. Looking only at the coefficients does not capture these two different types of relationships.

Luckily, figuring out the marginal effect of $x_1$ on $y$ is rather easy. In a linear model, the point estimate for how much $y$ increases when $x_1$ is increased by 1, $\hat{\delta_1}$, is equal to

$$\hat{\delta_1} = \hat{\beta_1} + \hat{\beta_3}x_2$$

The variance of the estimator $\hat{\delta_1}$ is

$$Var(\delta_1) = Var(\hat{\beta_1} + \hat{\beta_3}x_2)$$
$$Var(\delta_1) = Var(\hat{\beta_1}) + Var(\hat{\beta_3}x_2) + 2Cov(\hat{\beta_1}, \hat{\beta_3}x_2)$$
$$Var(\delta_1) = Var(\hat{\beta_1}) + x_2^2Var(\hat{\beta_3}) + 2x_2Cov(\hat{\beta_1}, \hat{\beta_3})$$

Note that when $x_2 = 0$, $\hat{\delta_1} = \hat{\beta_1}$ and $Var(\hat{\delta_1}) = Var(\hat{\beta_1})$. The standard deviation or standard error of $\hat{\delta_1}$ is equal to the square root of this variance. Extending these formulae to the non-linear case is easy - the coefficient estimates and variances are computed the same way, and from there one can simulate relevant quantities of interest (probabilities, predicted counts).

An even simpler way to calculate the marginal effect of $x_1$ for an arbitrary value of $x_2$ is to re-center $x_2$ by subtracting from it some value $k$ and re-estimating the regression model. The coefficient and standard error of $x_1$ will be the marginal effect of x_1 on  y when $x_2 = k$. A handy trick is to mean-center $x_2$ (subtract the mean of $x_2$ from each value of $x_2$). Then, the coefficient on $x_1$ (in a linear model) is equal to the average effect of $x_1$ on $y$ over all of the values of $x_2$.*

Braumoeller's article came with Stata code to make interaction plots (though I can't seem to find it online anymore). In 2011, Stata 12 added the marginsplot command, making these sorts of figures even easier to create. Quantitative political scientists appear to have taken notice. I could not find a single article in the 2012/2013 issues of the American Political Science Review, American Journal of Political Science, and International Organization that used an interaction model without including a corresponding marginal effects plot. Correctly interpreting interaction effects is now about as easy as running the regression itself.

This is all well and good for Stata users, but what about R? Coding up these sorts of plots from scratch can get a little tedious, and no canned function (to my knowledge) exists on CRAN. Moreover, the availability of easy-to-use functions for statistical methods seems to encourage wider use among applied quantitative researchers. 

So here's my code for quickly making decent-looking two-variable interaction plots in R. The first function, interaction_plot_continuous(), plots the estimated marginal effect for one variable that is interacted with a continuous "moderator." In simple terms, it plots $\delta_1$ for the range of values of $x_2$.

Below is an example of the output. For the sake of demonstration, I took the built-in R dataset airquality, which contains air quality measurements in New York taken during the 70s, and regressed maximum daily temperature on ozone content, wind speed and an interaction of ozone and wind. The plot below shows the marginal effect of wind speed moderated by ozone content:

Note that just interpreting the main effect of wind speed at zero (the regression coefficient) gives a misleading picture of the actual relationship. At 0 parts per billion of ozone, wind speed is negatively associated with temperature. But for higher values of ozone content wind speed becomes positively associated with temperature (I have no idea why this is the case, or why there would even be an interaction - my guess is there's some omitted variable). For the average value of ozone concentration (the red-dashed line), wind speed is not significantly associated with temperature.

Sometimes the moderating variable is a binary indicator. In these cases, a continuous interaction plot like the one above is probably less useful - we just want the effect when the moderator is "0" and when it's "1". The second function in the file, interaction_plot_binary(), handles this case. Again to demonstrate, I took the classic LaLonde job training experiment dataset and fitted a simple (and very much wrong) regression model. The model predicted real 1978 wages using assignment to a job training program (treatment), marital status and an interaction of the two. I then estimated the marginal "effect" of treatment assignment on wages for each of the two marital status levels. In this case, the interaction was not statistically significant.

So hopefully these two functions will save R users some time. Note that these functions also work perfectly fine with non-linear models, but the quantity plotted will be the regression coefficient and not necessarily something with substantive meaning. Unlike simple OLS, the coefficients of most non-linear models do not have a clear interpretation. You'll have to do a little bit of work to convert the coefficient estimates into something actually meaningful.

Feel free to copy and share this code, and let me know if there are any bugs. If there's enough demand, I might clean it up more and put together an R package (time permitting of course).

* I'm using the word "effect" here loosely and as shorthand for "relationship between." Assigning a causal relationship between two variables requires further conditional independence assumptions that may or may not hold.

Edit: Thanks to Patrick Lam for pointing out a typo in the variance formula (missed the 2) - fixed above and in the code.

Edit 6/18: Forgot to also include a link to Brambor, Clark, and Golder's excellent 2006 paper in Political Analysis which discussed similar issues regarding interpretation of interaction terms.

Monday, June 10, 2013

Will Edward Snowden be Extradited?

On Sunday, 29-year old Booz Allen Hamilton employee Edward Snowden was revealed to be the source behind the recent disclosure of highly classified documents describing top secret National Security Agency surveillance programs of telephone and internet data. Snowden disclosed his identity to the Guardian in an interview from Hong Kong, where he currently remains. His choice to leave the U.S. for Hong Kong was, in his words, driven by the belief that the Chinese-administered region has a "spirited commitment to free speech and the right of political dissent." But Snowden is not completely free from the reach of U.S. law. The United States and Hong Kong have in force an extradition treaty under which the U.S. could obtain his return for prosecution.

Why did Snowden escape to Hong Kong, likely knowing that he could be extradited? My sense is that it isn't because he was poorly informed, but rather because his no-extradition alternatives were not particularly great. The overlap between countries with "spirited commitments" to freedom and countries with which the United States does not have an extradition treaty is virtually nil. Even Iceland, Snowden's asylum target, has an extradition treaty with the U.S. in force since the 1900s.

Countries the US has extradition treaties with (light blue, US shown here in dark blue) From Wikimedia Commons
But will Snowden be extradited back to the United States? I would argue that yes, extradition is probable*, but the process will be very tedious simply because the offense is so different from previous extradition cases. Whereas most extradition requests concern explicitly criminal matters, Snowden's may be a "political offense" for which extradition is not permitted. Working out this question will take time, but Hong Kong's track record of typically approving requests suggests that the odds are in the U.S. government's favor. Moreover, I do not think that Beijing will be able to tip the legal scales towards rejection if it so desires, despite its influence in Hong Kong.

China's preferences in this matter are certainly relevant, but the law is much more of a constraint in this process than some commentators have suggested. The security stakes are actually rather low. Snowden is not particularly useful as an intelligence asset. He no longer has access to NSA databases - all he has are whatever documents or files he could bring with him to Hong Kong, documents that were apparently selectively chosen. This is not a Cablegate-style data-dump and PRISM is hardly China's greatest intelligence fear. Certainly Beijing might want to obtain anything that Snowden still has in his possession, but it's unclear how cooperative he would be given his political leanings.

Influencing the extradition proceedings is also not costless for Beijing. Despite China's sovereignty over Hong Kong, the SAR has significant political autonomy in most areas. Indeed, it can and has concluded a number of extradition and mutual legal assistance treaties with other states. Hong Kong is a global financial hub and devotes significant legal and administrative resources to combating money laundering, financial crimes, trafficking and other such offenses. The high cross-border mobility of these types of offenders gives the Government of Hong Kong significant incentives to maintain the integrity of its extradition agreements in order to prosecute financial criminals who flee its territory. Undue influence by Beijing in the process might jeopardize the credibility of Hong Kong's other agreements. While China's ability to weigh in on extradition is clear, the decision to refuse extradition ultimately lies with the Chief Executive of Hong Kong.

Moreover, Hong Kong is also much more limited in its ability to reject extradition than some news reports have suggested. This South China Morning Post article, quoted by Doug Mataconis at OTB, gives the impression that China has de-facto veto power over any extradition request. This is not the case.

The SCMP article states that, according to the 1996 treaty, "Hong Kong has the "right of refusal when surrender implicates the 'defense, foreign affairs or essential public interest or policy'." However, this provision is irrelevant to the Snowden case since, according to Article 3, it only applies when the subject of the extradition request is a national of the PRC.
...(3) The executive authority of the Government of Hong Kong reserves the right to refuse the surrender of nationals of the State whose government is responsible for the foreign affairs relating to Hong Kong in cases in which: 
(a) The requested surrender relates to the defence, foreign affairs or essential public interest or policy of the State whose government is responsible for the foreign affairs relating to Hong Kong, or...
The article also notes that Hong Kong could reject a request if it determines that the extradition is "politically motivated." However, the article omits the fact that this determination is to be made by the "competent authority of the requested Party" which, according to the committee report that accompanied the treaty's ratification, is interpreted as the judiciary and not the executive of Hong Kong.
...Notwithstanding the terms of paragraph (2) of this Article, surrender shall not be granted if the competent authority of the requested Party, which for the United States shall be the executive authority, determines: 
(a) that the request was politically motivated...
Meddling with Hong Kong's independent judiciary would be politically costly for Beijing. In fact, it could jeopardize the extradition agreement itself. When the U.S. Senate ratified the treaty, it attached an understanding emphasizing the continued independence of Hong Kong's judiciary
"Any attempt by the Government of Hong Kong or the Government of the People's Republic of China to curtail the jurisdiction and power of final adjudication of the Hong Kong courts may be considered grounds for withdrawal from the Agreement."
It's unlikely that Snowden could win a political persecution argument. Regardless of whether his actions are justified, they very clearly violated U.S. statute - he is not being arbitrarily singled out. A more potentially persuasive case for refusal might be made in light of Bradley Manning's treatment in pre-trial detention. Article 7 permits the refusal of extradition "when such surrender is likely to entail exceptionally serious consequences related to age or health." However, concerns over treatment are typically resolved by bilateral legal assurances that the extradited person will not be mistreated. Indeed, U.S. authorities have strong incentives to not mistreat Snowden if he is extradited as such actions would likely jeopardize future legal cooperation with Hong Kong.

This is not to say that extradition is a sure thing, but the challenges facing Snowden's extradition are currently more legal than political. Pretty much all previous extradition proceedings between the U.S. and Hong Kong have concerned clear-cut criminal offenses - violent and white-collar crimes in particular. Hong Kong has consistently accepted U.S. requests for extradition for these cases. But the leaking of classified information might fall under the category of "political offenses" for which extradition is prohibited. Extradition treaties have, for centuries contained provisions that refuse extradition for "offenses of a political character." The exception emerged in treaties during the 1800s as a way of limiting the ability of states to pursue dissenters and political opponents. However, interpretation of what constitutes a "political offense" has always been unclear and open to interpretation. Fearing that a vague interpretation of "political offense" would unduly burden states, most treaties since then have included provisions delineating offenses that cannot be considered "political." Compared to most modern treaties, the U.S.-Hong Kong treaty has relatively few political offense exceptions: murder or other crimes committed against a head of state are exempt as are offenses criminalized by a multilateral international agreement. This leaves a lot of grey area.

Most treaties signed in the last half-century do not explicitly enumerate offenses for which extradition is granted, typically defining an extraditable offense as anything criminalized under the laws of both parties. However, the Hong Kong treaty has both a list and a provision for extradition for "dual criminality." A request for extradition under the Espionage Act would likely fall under the scope of extraditable offenses since Hong Kong's Official Secrets Ordinance has similar provisions criminalizing the release of classified information, but could potentially be ruled a political offense. Law professor Julian Ku suggests that the U.S. might choose to pursue an alternative route and seek extradition under an offense explicitly enumerated in the treaty.
The Snowden leaks have now been referred to the Justice Department, and U.S. prosecutors have several options available to them. According to Ku, prosecutors may avoid charging Snowden under the Espionage Act -- which could be considered a political prosecution by courts in Hong Kong -- and indict him under a different statute. 
Among the crimes listed on the U.S.-Hong Kong agreement as within the bounds of extradition, one offense in particular stands out: "the unlawful use of computers."
Requesting extradition for an explicitly enumerated offense might help support an argument that the offense is not a political one. The patchwork of U.S. classification law also appears to have provisions relating to the "unlawful use of computers" - the Computer Fraud and Abuse Act.  According to a CRS report,
18 U.S.C. Section 1030(a)(1) punishes the willful retention, communication, or transmission, etc., of classified information retrieved by means of knowingly accessing a computer without (or in excess of) authorization, with reason to believe that such information “could be used to the injury of the United States, or to the advantage of any foreign nation.”...The provision imposes a fine or imprisonment for not more than 10 years, or both, in the case of a first offense or attempted violation. Repeat offenses or attempts can incur a prison sentence of up to 20 years.
Moreover, the sentences under the Computer Fraud and Abuse Act are comparable to those in the relevant Espionage Act provisions. Assuming U.S. prosecutors are strategic, I would expect the forthcoming request  to be for offenses under the CFAA and not the Espionage Act.

The base rate for extradition approval is high so the easiest bet is that Snowden will likely be extradited (assuming he does not successfully flee Hong Kong as well). But this case is markedly different from previous ones. There is not much "data" for extradition when the offense involves the leaking of classified information. It has certainly occurred in other contexts, but is entirely new ground for the U.S.-Hong Kong legal cooperation relationship (I'd love it if anyone could point me to more examples). Snowden may have some basis for challenging a request and extradition proceedings are likely to drag on for some time. Indeed, it's not entirely impossible for Hong Kong to deny extradition, though it would generate significant political friction with the U.S. The legal deck is stacked in the United States' favor, despite the novelty of this case. So be wary of pop-Realist foreign policy commentaries that claim it's all about China - Chinese influence in Hong Kong's affairs is neither infinite, nor cost-free.

*Sadly, I don't have a model for making a more precise claim/prediction here, though an interesting project might involve using GDELT to identify successful/failed extradition cases and model P(approval) using a combination of political and legal variables. Certainly more than a blog post can carry.

Thursday, April 18, 2013

Trading with Democracies

The topic of international trade has received some heightened attention in the news over the past month. Japan will soon be joining negotiations over the Trans-Pacific Partnership (TPP) and the U.S. and European Union are moving forward with plans for a possible transatlantic FTA.

Trade is one of the few international issues that has a significant domestic political dimension. Consider the ongoing political debates over the NAFTA agreement throughout the 1990s or the recent battle over the Korea-U.S. FTA. The distribution of voters' preferences for and against trade influences the extent of trade liberalization.

As a result, a number of political scientists studied the question of what determines individuals' preferences toward international trade. Economic self-interest is one intuitive explanation. By the classic Heckscher-Ohlin model of trade, factors that are abundant in a country relative to the rest of the world gain from trade while scarce factors lose out. If we assume that losers from trade will oppose liberalization and winners will support, then we would expect that an individual's factor endowment would be associated with their trade attitudes. The U.S. is relatively abundant in high-skilled labor but relatively scarce in low-skill labor. The theory suggests that, all else equal, high-skilled U.S. workers will be pro-trade while low-skilled workers will oppose barrier reductions. Kenneth Scheve and Matthew Slaughter (2001) found support for this relationship using survey data for the U.S. (ungated paper). Anna Maria Mayda and Dani Rodrik (2005) provide further evidence from a cross-national dataset of trade attitudes (ungated).

However, there is also strong evidence that factors beyond both self-interest and economics influence individuals' trade preferences. For example, Edward Mansfield and Diana Mutz (2009) (ungated draft) show that support and opposition to trade are influenced more by individuals' perceptions of trade's impact on the overall U.S. economy rather than by how trade affects them personally. Additionally, ethnocentric and isolationist views are strongly connected to hostility toward trade. That is, openness to trade depends on more than individuals perceptions of economic gains and losses; it reflects, in part, their fundamental attitudes towards other nations.

My most recent research is motivated by this last question: how do the characteristics of potential trading partners influence support for trade? To date, research on trade preferences has looked at support for trade in general. Given that most trade liberalization in recent years has been conducted through preferential trade agreements (PTAs), trade agreements between specific countries, it is important to also understand support for trade with specific countries. Indeed, empirically, there is some evidence of meaningful variation in support for trade when respondents are asked to consider different countries. A 2008 survey by the Chicago Council on Global Affairs showed that while 59% of American respondents said they would support a trade agreement with Japan, only 41% supported a trade agreement with China.

Specifically, I'm interested in the role of trading partner democracy on support for trade. I look at whether Americans prefer to trade with other democracies, how large the effect is compared to other relevant variables, and why democracies are favored. Regime type is a particularly interesting variable since it's associated with patterns of PTA formation at the global level. Empirically, democracies tend to sign significantly more preferential trade agreements with other democracies (Mansfield, Milner and Rosendorff, 2002) (ungated). While this research project certainly does not aim to provide an explanation for these macro-level patterns (since diffuse public attitudes are only one of many inputs into trade policy decisions), it does suggest that domestic debates over trade are not simply about the distribution of gains and losses. Trading partners matter.

In the vein of forthcoming research by Mike Tomz and Jessica Weeks on the democratic peace, I use a survey experiment to test whether Americans have a preference for democracies in trade. For my pilot survey, I recruited a sample of U.S.-located respondents through Amazon's Mechanical Turk platform. Each respondent was asked to evaluate a set of hypothetical trade agreements. Each question presented the respondents with a brief vignette describing a bilateral trade agreement that would reduce barriers such as tariffs. The vignette provided information on the trading partner's regime type, level of economic development, existing trade with the United States, and number of trade agreements signed with other countries. Additionally, it provided information on estimated job losses/gains from the trade agreement (in agriculture, manufacturing and service sectors), expected welfare gains, and whether the trade agreement contains a labor rights clause. Each piece of information or "attribute" of the agreement was randomly drawn from a set of pre-defined levels (with 2 to 5 levels per attribute). Respondents were then asked whether they thought the United States government should or should not sign the trade agreement and to briefly explain why. The figure below provides an example of such a question (in this case, a rather unfavorable agreement).

The survey is a type of experiment known as a "conjoint" design. Conjoint experiments ask individuals ask individuals to evaluate profiles composed of multiple, randomly assigned attributes. Respondents are asked to rate individual profiles (as in this experiment) or choose a preferred profile from a random set (a "choice-based conjoint" design). The advantage of this design is that it allows researchers to efficiently run multiple experiments simultaneously and to test competing hypotheses. For example, Jens Hainmueller and Dan Hopkins have recently used such conjoint experiments to study voter preferences over immigration.

My sample contained 363 respondents, each answering 3 questions. Three questions were unanswered by respondents, giving a sample of 1086 approve/disapprove responses. I estimate a binary logistic regression model on the 0-1 outcome variable with all of the attribute levels as dummy variable predictors. The figure below shows the estimated effect of each level on the probability that an individual will support the trade agreement relative to the baseline level (denoted by an asterisk). Lines denote 95% cluster-bootstrapped confidence intervals, clustering on individual. Note that because confidence intervals are bootstrapped rather than computed via normal approximation, they are not necessarily symmetric around the point estimate.

Job gains/losses have the largest effect on approval, which makes sense given that the political debate in the U.S. surrounding trade policy (and foreign economic policy in general) focuses so heavily on prospective employment gains. However, even when accounting for these economic variables, trading partner democracy has a statistically non-zero and rather sizable effect. Democracy increases probability of support by, on average, 9 percentage points, with the simulated 95% confidence interval running from about 4% to 15%. That is, respondents in the sample were 9% more likely to support an agreement with a democracy than with an otherwise equivalent autocracy. This is, interestingly, roughly comparable to the estimated effect of minor job gains/losses. So while trading partner democracy is not voters' overriding concern, it does appear to have a meaningful impact.

But the more interesting question is not whether voters prefer democracies, but why. Do voters use democracy as a signal or "heuristic" to infer other favorable characteristics about a trading partner, or is the observed preference simply a taste for other, similar, states? How much of the effect of democracy is transmitted through some mediating variable? I posit three possible mediators for which democracy serves as a proxy - democracies are perceived as trustworthy or "fair" trading partners, democracies are perceived as politically closer to the United States, and democracies are perceived as respectful of citizens' rights. To estimate these mediator effects, I use a "crossover" experimental design suggested by Kosuke Imai, Dustin Tingley and Teppei Yamamoto (2013). For each of the first three questions (used in estimating the main effects above), I measure the value of one of the mediator variables in addition to the outcome. The figure below shows the estimated effect of democracy on each of the mediators. Democracies are perceived as less likely to engage in unfair trading practices, more likely to have friendly relations with the U.S. and more likely to respect their citizens' rights.

After a brief break, respondents are then asked to evaluate three more trade agreements. The second set of profiles is identical to the first in all respects except that the regime type variable is assigned the opposite value. So if the first question asked about a democracy, the fourth will ask about an autocracy. In addition, I "force" the value of one of the mediators to take on the value measured in the first experiment. For example, if the respondent says that they believe the democracy in the first experiment is a trustworthy and fair trader, they are told, as part of the profile in the fourth experiment, that "experts say this country does not engage in unfair trading practices." This design allows me to identify how much of the effect of democracy can be explained by one of these mediator variables. That is, it identifies the counterfactual level of support for an autocracy when the mediator is held at the value for a democracy and vice-versa. The mediator effect is the effect of a change in the mediator variable from the level measured under autocracy to the level measured under democracy.

The figure above shows the estimates and bootstrapped 95% confidence intervals for these mediator effects. I use logit regression to control for the other attributes since there is likely some imbalance due to smaller sample size and imperfect randomization. There are two mediation effects for each mediating variable - one when the "treatment" is held at democracy and one when it is held at autocracy. A significant difference in mediator effects suggests a possible interaction between the treatment and mediator. That is, the effect of a variable like perceived trade fairness differs when the country in question is a democracy rather than an autocracy.

Indeed, this appears to be the case for some of the mediators. While the estimated mediator effect of trade fairness for democracy is positive and statistically significant at the .05 level, the estimated effect appears to be negative for autocracy, although the 95% confidence band crosses 0. Democracies that are perceived as "unfair" trading partners receive a penalty support, but autocracies do not receive a corresponding "boost" for being trustworthy. That the estimate for autocracies is so negative may just be a function of sampling variability, but it may also hint at possible violations of mediator identification assumptions or profile order effects. These are issues I'll try to address in the next iteration of the experiment.

The most interesting mediator effect appears to be political affinity with the United States. Democracies who have unfriendly relations with the U.S. receive a statistically significant penalty to support while friendly autocracies receive a boost. It's difficult to infer the exact magnitude of the relations bonus since the precision of these estimates is much lower since each respondent only answered one question for each mediator. That is, it's still unclear whether friendly autocracies are, on average, equivalent to democracies as far as voters' trade attitudes are concerned. But the evidence is suggestive that some portion of the estimated democracy effect is transmitted through perceptions of political proximity with the United States. Democracy functions as a heuristic for "ally."

Surprisingly, human rights issues, despite their prominence in domestic trade debates, don't seem to influence voters' trade preferences on average. Neither the presence of a labor rights clause nor perceptions of the human rights record of a trading partner have a statistically discernible effect on the probability of support. It may be that labor rights clauses are not perceived as credible constraints, or that rights concerns are important to specific constituencies. The effect may be heterogeneous and averaging across the full sample misses interesting cross-individual differences.

This is of course all very preliminary research so none of the results should be taken as anything more than suggestive. The sample from Mechanical Turk is not perfectly representative of the U.S. population (MTurk samples tend to skew younger and more liberal). While research suggests that experimental effects estimated on MTurk samples tend to be comparable to effects estimated on nationally representative samples, a better (and larger) sample would be preferred. Indeed, the preliminary results do not have much power in estimating the mediation effects - the confidence bounds are large for all of the estimates. In future revisions of this design, I will likely test the mediation effects jointly (as Tomz and Weeks do for the democratic peace) rather than separately. Additionally, other trading partner characteristics omitted from the descriptions might have even larger effects when included in the profiles. I certainly welcome any suggestions for improving the design.

The preliminary evidence suggests that the type of trading partner matters for domestic debates over liberalization. In my sample, I estimate that trade agreements with democracies receive approximately a 9 +/- 5 % boost in support compared to otherwise comparable agreements with non-democracies. Furthermore, I find that some of the effect of democracy is transmitted through perceptions that democracies are more trustworthy trading partners and more likely to have close relations with the United States. Regime type functions partly as a heuristic for inferring other traits about a prospective trading partner. I certainly cannot make specific statements about how these preferences translate into government actions, but the results do imply that democratic governments (or at least the United States government) is not only constrained by the distributional economic consequences of trade, but also by the characteristics of prospective trading partners.

Sunday, February 17, 2013

Is "skin in the game" the only way to solve principal-agent problems?

Constantine Sandis and Nassim Nicholas Taleb recently invited comments on a working paper titled "Ethics and Asymmetry: Skin in the Game as a Required Heuristic for Acting Under Uncertainty." This opened up a rather heated discussion on Twitter, so I felt it might be useful to comment in longer-form. And while not articulated explicitly as such, it discusses an issue important to many political economists - the principal-agent problem.

From the abstract:
We propose a global and mandatory heuristic that anyone involved in an action that can possibly generate harm for others, even probabilistically, should be required to be exposed to some damage, regardless of context. We link the rule to various philosophical approaches to ethics and moral luck.
This heuristic is, bluntly speaking, the mandate that actors have "skin in the game."

Reading the paper, I was reminded of a classic scene in the last episode of Joss Whedon's Firefly, "Objects in Space," where the philosophical bounty hunter Jubal Early converses with his hostage, doctor Simon Tam, about the merits of experience in expert decision-making.
Jubal Early: You ever been shot? 
Simon: No. 
Jubal Early: You oughta be shot. Or stabbed, lose a leg. To be a surgeon, you know? Know what kind of pain you're dealing with. They make psychiatrists get psychoanalyzed before they can get certified, but they don't make a surgeon get cut on. That seem right to you? 
Individuals who act for others should be exposed to the same risks. Bankers who manage others' money should know the pain of loss. Politicians who fail to act on behalf of their constituents should be voted out of office. Only when the agent and the principal share the same preferences is timeless principal-agent problem solved.

Although a bit simplified, this is the overarching idea that I took from the paper.

I think the article obfuscates a pretty basic concept with a great deal of unrelated discussion of ethics, axiology and morality. Not to say that these are not worth considering, but the core problem that Sandis and Taleb identify is a practical one: when people are tasked with making decisions that have repercussions for others, how do we ensure that they make "good" decisions on behalf of those that they affect? Additionally, how do we limit agents' ability to enrich themselves at the expense of their principal(s)?

Sandis and Taleb's argument is uncompromising, which perhaps makes it more appealing as an ethical claim than as a practical one. By arguing that agents are only justified in acting on behalf of principles when they have "skin-in-the-game," they have assumed away the entire principal-agent problem. If the agent has the exact same preferences as the principal (i.e. they are exposed to the same risks), then there is no problem. The agent will always behave in the manner that the principal prescribes.

This is a nice thought exercise, but agents almost never have preferences identical to their principals. They are rarely exposed to identical risks. So "skin-in-the-game" ends up as a kind of aspirational goal for how principal-agent relations should be managed. Even then, Sandis and Taleb's discussion is still much too simple to be of practical value. According to the paper's logic, the best policy is the one that alters the agent's incentives such that they become aligned with those of the principal. 

This is obvious.

But yet again, there would be no principal-agent problem if the principal always had the means to perfectly discipline the agent. 

In the real world, agents rarely share the same preferences as their principals and principals are almost never in perfect control of their agents. Power is shared and relationships are tense. Yet delegation is a necessary aspect of nearly all human institutions.

Moreover, there is rarely a single principal. Agents face conflicting pressures from a myriad of sources. Politicians do not respond to a unified "constituency" but to a diverse array of "constituents." So when Sandis and Taleb argue that decision-makers need "skin-in-the-game," they raise the question of "whose game are we talking about?" 

The paper provides a first-best solution to the principal-agent problem, one where the agent is fully attuned to the risks suffered by the principal. But there are costs to such a solution. Depending on the viability of a "skin in the game" solution, a second-best approach may be more desirable. "Skin in the game" is certainly neither global nor mandatory.

Take one example in the paper
The ancients were fully aware of this incentive to hide risks, and implemented very simple but potent heuristics (for the effectiveness and applicability of fast and frugal heuristics, see Gigerenzer, 2010). About 3,800 years ago, Hammurabi’s code specified that if a builder builds a house and the house collapses and causes the death of the owner of the house, that builder shall be put to death. This is the best risk-management rule ever. The ancients understood that the builder will always know more about the risks than the client, and can hide sources of fragility and improve his profitability by cutting corners. The foundation is the best place to hide such things. The builder can also fool the inspector, for the person hiding risk has a large informational advantage over the one who has to find it. The same absence of personal risk is what motivates people to only appear to be doing good, rather than to actually do it.
As a risk management rule, Hammurabi's code ensures that builders suffer the same costs as the owners. However, it also probably ensures an under-provision of houses. Suppose that even a house built by an expert builder has some risk of collapsing despite the builder's best efforts. Knowing this, a builder suffers an additional lifetime cost of possible death every time he/she constructs a house. If the builder places some non-zero value on his life, he/she will choose to constrain the amount of houses that he builds even if there is demand for more. In economic terms, the death risk is an additional "cost" to production.

Now, if the builder valued the life of an owner to the same extent as his/her own life, then the law would simply enforce an already incentive-compatible relationship - there would be no need to force the builder to have "skin in the game," it already would be. But this is almost never the case for obvious reasons. Indeed, the reason why we care about institutions is that rarely do individuals have incentives aligned with what is "optimal" behavior.

But institutional designs impose costs of their own.

The point of this stylized example is that there are constraints in every principal-agent relationship. How do we weigh the benefit of fewer deaths from shoddy houses against the costs of an under-supply of houses? Should we tolerate some shirking on the part of builders if there is an imminent need for more houses? Sandis and Taleb's heuristic gives no answers.

Moreover, even if "skin in the game" is a "first-best" solution to controlling agent behavior, it is not a necessary condition.

In Sandis and Taleb's article, "bad luck" plays a crucial role in justifying their heuristic. But "bad luck" creates its own issues that make perfect enforcement inexorably costly. To get an agent's "skin in the game," the principal needs to be able to punish the agent for bad behavior. But rarely is behavior observed. Rather, it is some outcome on which the principal conditions their punishment. The outcome is to some extent a function of the agent's effort, but it's also subject to unknown randomness. In the builder example, the probability that a building will fail is related to the builder's effort, but it is not inconceivable that an expertly constructed structure might collapse.

Principals get noisy signals of agent behavior. It is unclear whether an outcome is the result of poor decision-making or bad luck. This distinction may or may not matter, depending on the case. However, in many instances where it is difficult to observe the agent's behavior, the optimal solution to the principal-agent problem still leaves the agent somewhat insulated from the costs of their actions.

Consider a political example, the relationship between a government (agent) and the citizens (principal) of a country.

This example draws on an extensive body of game theoretic work by Ferejohn, Barro, Acemoglu, Robinson, Przeworski and recently Fearon (among many, many others). It is very much a simplification.

The citizens, if collectively powerful enough, can overthrow a government that does not behave in a manner that reflects their interests. Assuming that they are perfectly able to see what the government does, they will always overthrow a government that deviates from their desires. The government, knowing this, and preferring not to get overthrown, will comply perfectly with the wishes of the citizens. The government's "skin is in the game" in that the government faces costs (rebellion) concurrently with the citizens (reduced welfare from government shirking).

But typically, citizens cannot perfectly observe government action. The relationship between policy and outcome is complex, particularly in the economic realm. Citizens only observe their and their comrades' welfare, which is a function of both government policy and unforeseen events (like a war in a neighboring country that leads to reduced export revenue). More generally, citizens are unsure how much to blame government action for their current predicament.

If they adopt the same rule as in the perfect observation case - overthrow the government if we observe outcomes different from our ideal outcome - then they are likely to overthrow governments that did nothing wrong, a costly scenario.

Citizens may therefore tolerate some deviation from perfection, that is, weaken the amount to which the government has "skin in the game," based on the fact that they can't perfectly observe the government's behavior.  Rejection becomes a question of expectation - small deviations from perfection might just be bad luck, but a massive downturn in welfare can likely be blamed on government incompetence.

Indeed, having too rigid of an "overthrow rule" may even lead to perverse outcomes. Suppose a government knows that, no matter what it does, it is going to be blamed for an economic downturn beyond its control. It now has strong incentives to ignore the will of the public and be as predatory as possible, knowing that it has no hope of appeasing the public either way. These sorts of twisted incentives are at the heart of much game theoretic work on democratization and why leaders choose to give up power.

Is it worth putting the agent's "skin in the game" if that leads the agent toward predatory behavior?

Returning to the example of ancient Babylon, instead of killing builders of failed houses and running the risk that some good builders will get killed by accident, Hammurabi could have developed building codes. This is different from a "skin the game" solution. The builder's "skin" is no longer in the "game" since he/she does not suffer any costs after the house is built. However, failing to build to code is a clear and observable signal that the builder is shirking their responsibilities. The builder still has more private knowledge and can cut corners, but only within limits. Clear deviations from the code get punished. Some non-compliance is tolerated in order to avoid mistaken executions.

What Sandis and Taleb miss in their brief discussion is that there is a continuum of possible mechanisms for controlling an agent's behavior - "skin in the game" is one extreme, but there are other viable solutions depending on the context.

Importantly, there are trade-offs. Conditional on some level of imperfect observation of the agent's behavior (which is made smaller by increasing quality of monitoring), the principal must weigh the benefits of tight control over the agent's behavior against the costs of punishing a "good" agent. If standards are high, the principal might accidentally punish an compliant agent. If standards are low, the "false positive" risk drops, but the incentive to shirk rises. Sandis and Taleb's argument provides no method of resolving this question.

Moreover, there are a myriad of cases where a principal actually benefits by delegating decisions to an agent who is somewhat insulated from the costs facing the principal. This is often labeled dynamic or time-inconsistent preference problem. Governments often find it in their best long-term interests to delegate decision-making to someone who has absolutely no "skin in the game." International courts are a great example of this, as are domestic courts (imagine the prospects for civil rights in 1960s America if Supreme Court judges faced periodic elections).

My ultimate issue with the paper is that it simply lacks sufficient nuance to be useful or informative. By focusing on what should be the ideal-typical relationship between principal and agent, it ignores the trade-offs that plague real world principal-agent problems. Either "skin in the game" is such a broad standard that it is absolutely meaningless, or it represents a specific solution to principal-agent problems, namely subjecting the agent and principal to identical rewards/punishments, that is neither feasible nor, in many instances, desirable.

Thursday, January 31, 2013

Why Inference Matters

This is a post inspired by a recent exchange on twitter between myself and friend, colleague, and TNR writer Nate Cohn (@electionate). The initial exchange was pretty superfluous, but it got me thinking about a broader question how writers about politics should approach quantitative data. That spiraled into this rather long-winded post. Indeed, the original conversation gets lost, but I think a more important broader point gets made. As political journalists and analysts incorporate more data into their writing, they would benefit by thinking in terms of statistical inference. It's just not possible to meaningfully talk about "real-world" data - electoral, or otherwise - without also talking about probability.

First, a quick summary of the discussion that led me to write this: Gary King tweeted a link to this post by Andrew C. Thomas covering a paper written by himself, King, Andrew Gelman and Jonathan Katz that examined "bias" in the Electoral College system. The authors found that despite the winner-take-all allocation of votes, there was no statistically significant evidence of bias in recent election years. Essentially, they modeled the vote share of each candidate at the district level and estimated the Electoral College results for a hypothetical election where the popular vote was tied. Andrew made a graph of their "bias" estimates - all of the 95% confidence intervals in recent years contain 0.

Nate shot back a quick tweet response asking why the results suggest no bias in 2012 when the electorally "decisive" states were 1.6 percentage points more democratic than the national popular vote.

This led to a longer and rather interesting discussion between Nate and myself on how to evaluate Electoral College bias (for what it's worth, my twitter arguments were pretty bad). Nate made some good points about differences in Obama's margin-of-victory in the states needed to win the Electoral College versus his overall popular vote margin-of-victory. He notes that Obama won the states needed to reach 272 by a minimum of 5.4% while his popular vote margin was only 3.9%. If the existence of the Electoral College exaggerated Obama's vote advantage relative to a popular vote system, then it's possible to conclude that the college was "biased" in Obama's favor.

Nate has since turned this into part of an article, arguing:
The easiest way to judge the Democrats’ newfound Electoral College advantage is by comparing individual states to the popular vote. Last November, Obama won states worth 285 electoral votes by a larger margin than the country as a whole, suggesting that Obama would have had the advantage if the popular vote were tied. But the current system appears even more troubling for Republicans when you consider the wide gap between the “tipping point” state and the national popular vote. Obama’s 270th electoral vote came from Colorado, which voted for the president by nearly 5.4 points—almost 1.8 points more than his popular vote victory. Simply put, the GOP is probably better off trying to win the national popular vote the state contests in Pennsylvania or Colorado, since the national popular vote was much closer in 2012 than the vote in those tipping point states. Obama enjoyed a similar Electoral College advantage in 2008. 
Here the wording is slightly different - the term is Electoral College "advantage" rather than Electoral College "bias," but the argument is essentially the same - currently the electoral geographic landscape is such that the Democrats benefit disproportionately - a shift to a popular vote system would help the Republicans.

The argument is interesting (and could in fact be true), but the evidence presented doesn't really say anything. The big problem is that it ignores probability. You can't make a credible argument about the "nature" of an election cycle just by comparing election results data points without a discussion of uncertainty in the data. The results of an election are not the election itself - they are data. The data are what we use to make inferences about certain aspects of one election (or many elections). This distinction is essential. We don't observe whether the Democrats have an Electoral College advantage in 2012 or whether Colorado was more favorable to the Democrats than Virginia or Ohio. We can't observe these things because all we have are the output - the results.

Studying elections is like standing outside an automotive plant and watching cars roll off the assembly line.  We never get to see how the car is put together; we only see the final product.

If we knew exactly what went into the final vote tally - that is, if we were the plant managers and not just passive observers - then we wouldn't need statistics. But reality is complicated and we're not omniscient. This is what makes statistical inference so valuable - it lets us quantify our uncertainty about complex processes.

Just for curiosity, I decided to grab the 2012 election data to take a closer look at Nate's argument. I estimated the number of electoral votes that each candidate would receive for a given share of the two-party popular vote. I altered the state-level vote shares assuming a uniform swing - each state was shifted by the difference between the "true" and "hypothesized" popular vote - and computed the corresponding "electoral vote" that the candidate would receive. For time/simplicity, I ignored the Maine/Nebraska allocation rules (including them doesn't affect the conclusions).

To clarify, I measure the Democratic candidate's share of the two-party vote as opposed to share of the total vote. That is, %Dem/(%Dem + %Rep). This measure is common in the political science literature on elections and allows us to make better comparisons by accounting for uneven variations in the third-party vote. 

Here's what the 2012 election looks like for President Obama. The dotted vertical line represents the observed vote total, the blue line marks the "tie" point and the horizontal red line represents 270 electoral votes.

This is consistent with Nate's argument. There is a space of potential popular vote outcomes where Obama loses the two-party vote but still wins the Electoral College - the "272" firewall.

How about 2008?

Same thing - about a 1 percentage point loss of the popular vote would have still mean an Electoral College victory for Barack Obama - again Colorado is key here. Indeed, the advantage appears to be even larger here than in 2012.


Here the advantage is less perceptible - barely a fifth of a percentage point. Compared to 2008 and 2012, this would suggest a "trend" towards greater pro-Democratic bias in the Electoral College


Bush obviously has the advantage here (and as we know, he lost the popular vote)

How about jumping to 1992?

Again the elder Bush has a slight advantage.

I could keep going. One could infer a story of a growing democratic advantage in the Electoral College from these five data points - it's there in the most recent two cycles and it wasn't there before. But in the end, these graphs are not at all meaningful.

The problem with what I've done above is that it's at best a convoluted summary of the data - the implied inferences about "advantage" are absurd absent a discussion of probability. Consider the assumptions behind the argument. It implicitly assumes that if we re-ran the 2012 election from the beginning, we would get the exact same results. It follows that if we re-ran the election and posited that Pres. Obama received only 50% or 49% of the two-party vote, then the results in each state would shift exactly by 1-2%.

Of course this is crazy. We easily observe that from year to year, changes in two-party vote shares are not constant across all states (this is why the uniform swing assumption is a statistical modeling assumption and not a statement of fact). Without probability our counterfactual observations of the electoral vote are nonsense, since we implicitly assume that there is zero variation in the vote share. This is certainly not the case. If we could hypothetically re-run the election, we would not expect the vote share to be exactly the same. There are a host of elements, from the campaign to the weather on election day, that could shift the results. We would expect them to be close, but we are inherently uncertain about any counterfactual scenario.

However, we cannot reason about the 2012 election without considering counterfactuals - what would the result have been had A happened instead of B. The problem is that we only get to observe the election once - we have to estimate the counterfactual, and all estimates are uncertain.

This is where thinking in terms of inference becomes useful. Political analysts want to move beyond summarizing the data (election returns) and make some meaningful explanatory argument about the election itself. It's difficult to do this in a quantitative sense without accounting for uncertainty in the counterfactuals.

Here is one way of evaluating Nate's argument using the same election result data that incorporates probability. It's very much constrained by the data, but that's kind of the point - looking just at a couple of election results doesn't tell us much.

The core question is: could the vote share gap that Nate identifies be due to chance?

Let's imagine that the Democratic party candidate's two-party vote share in any given state is modeled by the following process

$$v_i = \mu + \delta_i + \epsilon_i$$

$v_i$ represents the two-party vote share in state $i$ received by the Democratic party candidate. $\mu$ is the 'fixed' "national" component of the vote - the component of the vote accounted for by national-level factors like economic growth. It does not vary from state-to-state. $\delta_i$ is the 'fixed' state component of the vote share - it reflects the static attributes of the state like demographics. For a state like California, it would be positive. For Wyoming, negative. This is the attribute that we're interested in. In particular, can we say with confidence from the evidence that Colorado, Obama's "firewall" state, structurally favors the Democrats more than a state like Virginia where the President's electoral performance roughly matched the popular vote? Or, is the gap that we observe attributable to random variation.

That's where $\epsilon_i$ comes in. $\epsilon_i$ represents the component of error that's not unique to the election year. That is, if we were to re-run the election, the differences in the observed $v_i$ will be a function of $\epsilon_i$ taking on a different value. This represents the "idiosyncracies" of the election - weather, turnout, etc... $\epsilon_i$ is what introduces probability into the analysis.

For the sake of the model, we assume that $\epsilon_i$ is distributed normally with mean $0$ and variance $\sigma_i^2$. I'll allow each $\sigma_i$ to be different - that is, some states might exhibit more random variation than others.

Normally the analyst would then estimate the parameters of the model. However, I have neither the data nor the time to gather it (if you're interested in data, see the Gelman et. al. paper). My goal with this toy example is to show that the observed difference in the 2012 vote share between the "270th" state, Colorado, and a comparable state like Virginia, which has mirrored the popular vote share, might be reasonably attributed to random fluctuations within each state. To do so, I generate rough estimates of some parameters while making sensible assumptions about others.

As an aside, I could also have done a comparison between Colorado and a national popular vote statistic. However, since I'm only working with vote shares, I would have to make even more assumptions about the distribution of voters across states in order to correctly weight the national vote. Additionally, I would have to make even more assumptions about the data generating process in the other 48 states + D.C. This approach is a bit easier and demonstrates the same point.

I assume that $\mu$ is equal to President Obama's share of the two-party vote in 2012. The estimate of $\mu$ itself is irrelevant, since we're interested in differences in $\delta_i$ between Colorado and a comparable state (the $\mu$s cancel). But for the sake of the model it's helpful since it allows me to use $0$ as a reasonable "null" estimate of $v_i$.

The question is whether Colorado's $\sigma$ is statistically different from that of Virginia. If it is, then it might make sense to talk about an electoral "advantage" in 2012. Obviously this assumes that the $\sigma$ values for all of the other states in the 272 elector coalition are at least as large as Colorado's, which I'll grant is true for the sake of the model (it only works against me). Colorado is the "weakest link."

The way to answer this question is to test the null hypothesis of no difference. Suppose that $\delta_i$ equals $0$ for Virginia and Colorado - that there is no substantive difference between Virginia (the baseline state) and Colorado (the "firewall" state). What's the probability that we would observe a gap of $.7\%$ between their state-level electoral results?

Estimating the probability requires making a reasonable estimate for the variance of $\epsilon_i$. How much variation in the electoral results can we attribute to random error and how much can we attribute to substantive features of the electoral map.

Unfortunately, we can't re-play the 2012 election, so we have to look at history to calculate variance - in this case the 2008 election. As Nate notes in the post, the party coalitions are relatively stable and demographic changes/realignments typically take many election cycles to complete. As such, we may be able to assume that the state-level "structural" effects are the same in 2012 as they are in 2008. That is, the variation in the popular-vote - state-level vote from year-to-year can be used to estimate the variance of the error terms $\epsilon_{CO}$ and $\epsilon_{VA}$.

But it's hard to get a good estimate of a variance with only two data points. Four is slightly better, but to do so we have to assume that the error terms of Colorado and Virginia's vote shares are identically distributed - that  $\sigma_{CO}^2 = \sigma_{VA}^2$. That's not to say that the observed "error" values are the same, just that they're drawn from the same distribution. Is this a reasonable assumption? The residuals don't appear to be dramatically different, but we just don't know - again, another statistical trade-off. Ultimately the point of this exercise is to demonstrate how one-election-cycle observations can be reasonably explained by random variation, so the aim is plausibility versus absolute precision.

So I estimate the pooled error variance for each state as the sample variance of the 2008 and 2012 democratic two-party share in each state subtracted from the two-party share of the popular vote. In theory, we could add more electoral cycles to the estimate, but the assumption that $\sigma_i$ does not change from year to year becomes weaker. If I restrict myself to just looking at electoral results, then I have to accept data limitations. This is a general problem with any statistical study of elections - if we look only at percentage aggregates at high levels, there just isn't a lot of data to work with.

The next step is simulation. Suppose we repeated the 2012 election a large number of times on the basis of this model. How often would we see a gap of at least .78% between the vote shares in Colorado and
Virginia? The histogram below plots the simulated distribution of that difference. The x-axis gives the percentage differences .01 = 1%). The red dotted line represents a difference of .78 percentage points. Also relevant is the blue dotted line, which represents a difference of -.78 percentage points (Virginia's vote share is greater than Colorado's).

What's the probability of seeing a difference as extreme as the gap seen in 2012? Turns out, it's about 40% - certainly not incredibly high, but also not unlikely.

Suppose we only care about positive differences, that is, we are absolutely sure that it's impossible for Virginia to be structurally more favorable to the Democrats than Colorado. There is either no difference or $\delta_{CO} > \delta_{VA}$. What's the probability of seeing a difference equal to or greater than .78 percentage points? Well, it's still roughly 20%. Statisticians (as a norm, not as a hard rule) tend to use 5% as the cut-off for rejecting the "null hypothesis" and accepting that there is likely some underlying difference in the parameters not attributable to random variation - in this case, we fail to reject.

If the error variance were smaller, that is, if the amount of year-to-year variation that can be ascribed to randomness were lowered, it's possible that a gap of .78% would be surprising. This would lead us to conclude that there indeed may be a structural difference - that Colorado is more advantageous to the Democratic candidate relative to the baseline. The details of the model parameters are really not the point of this exercise. Any estimate of the variance from the sparse data used here is not very reliable. The fact that we are using two elections where the same candidate stood for office suggests non-independence and error correlation which would likely downwardly bias our variance estimates. We could look at the gap in 2008 - two observations are better than one, but in that case, why not include the whole of electoral history - the data exist. Moreover, to make our counterfactual predictions more precise, we need some covariates - independent variables that are good predictors of vote share. In short, we need real statistical models.

This is what the Gelman et. al. paper does. While a quick comparison of the data points hints at a structural bias in 2012, a more in-depth modeling approach suggests that difference is not statistically significant. That is, it's likely that any apparent structural advantage in the Electoral College in 2012 is nothing more than noise.

The problem with Nate's argument, ultimately, is that it posits a counterfactual (Romney winning the popular vote by a slight margin) without describing the uncertainty around that counterfactual. Without talking about uncertainty, it is impossible to discern whether the observed phenomenon is a result of chance or something substantively interesting.

It does appear that I'm spending a lot of time dissecting a rather trivial point, which is true. My goal in this post was not to focus on whether or not the "Electoral College advantage" is true or not - the 2012/2008 election results alone aren't enough data to make that determination. Rather, I wanted to walk through a simple example of inference to demonstrate why it's important to pay attention to probability and randomness when talking about data - to think about data in a statistical manner.

We cannot, just by looking at a set of data points, immediately explain which differences are due to structural factors and which ones are due to randomness. No matter what, we make assumptions to draw inferences about the underlying structure. Ignoring randomness doesn't make it go away - it just means making extremely tenuous assumptions about the data (namely, zero variance).

To summarize, if you get anything out of this post, it should be these three points
1) The data are not the parameters (the things we're interested in).
2) To infer the parameters from the data, you need to think about probability
3) Statistical models are a helpful way of understanding the uncertainty inherent in the data.

I'm not suggesting that political writers need to become statisticians. Journalism isn't academia. It doesn't have the luxury of time spent carefully analyzing the data. I'm not expecting to see regressions in every blog post I read, nor do I want to. The Atlantic, WaPo, TNR, NYT are neither Political Analysis nor the Journal of the Royal Statistical Society.

But conversely, if there is increasingly a demand for "quantitative" or "numbers-oriented" analysis of political events, then writers should make some effort to use those numbers correctly. At the very least, it's valuable to think of any empirical claim - whether retrospective or predictive - in terms of inference. We have things we know and we want to make arguments about things we don't know or cannot observe. At it's core, argumentation is about reasoning from counterfactuals and counterfactuals always carry uncertainty with them.

Even if the goal is just to describe one election cycle, one cannot get very far just comparing electoral returns at various levels of geographic aggregation. Again, election returns are just data - using them to make substantive statements, even if it's only about a single election, relies on implicit inferences which typically ignore the role of uncertainty. And if we want to go beyond just describing an election and identifying trends or features of the electoral landscape, quantitative "inference" without probability is just dart-throwing.