The topic of international trade has received some heightened attention in the news over the past month. Japan will soon be joining negotiations over the Trans-Pacific Partnership (TPP) and the U.S. and European Union are moving forward with plans for a possible transatlantic FTA.
Trade is one of the few international issues that has a significant domestic political dimension. Consider the ongoing political debates over the NAFTA agreement throughout the 1990s or the recent battle over the Korea-U.S. FTA. The distribution of voters' preferences for and against trade influences the extent of trade liberalization.
As a result, a number of political scientists studied the question of what determines individuals' preferences toward international trade. Economic self-interest is one intuitive explanation. By the classic Heckscher-Ohlin model of trade, factors that are abundant in a country relative to the rest of the world gain from trade while scarce factors lose out. If we assume that losers from trade will oppose liberalization and winners will support, then we would expect that an individual's factor endowment would be associated with their trade attitudes. The U.S. is relatively abundant in high-skilled labor but relatively scarce in low-skill labor. The theory suggests that, all else equal, high-skilled U.S. workers will be pro-trade while low-skilled workers will oppose barrier reductions. Kenneth Scheve and Matthew Slaughter (2001) found support for this relationship using survey data for the U.S. (ungated paper). Anna Maria Mayda and Dani Rodrik (2005) provide further evidence from a cross-national dataset of trade attitudes (ungated).
However, there is also strong evidence that factors beyond both self-interest and economics influence individuals' trade preferences. For example, Edward Mansfield and Diana Mutz (2009) (ungated draft) show that support and opposition to trade are influenced more by individuals' perceptions of trade's impact on the overall U.S. economy rather than by how trade affects them personally. Additionally, ethnocentric and isolationist views are strongly connected to hostility toward trade. That is, openness to trade depends on more than individuals perceptions of economic gains and losses; it reflects, in part, their fundamental attitudes towards other nations.
My most recent research is motivated by this last question: how do the characteristics of potential trading partners influence support for trade? To date, research on trade preferences has looked at support for trade in general. Given that most trade liberalization in recent years has been conducted through preferential trade agreements (PTAs), trade agreements between specific countries, it is important to also understand support for trade with specific countries. Indeed, empirically, there is some evidence of meaningful variation in support for trade when respondents are asked to consider different countries. A 2008 survey by the Chicago Council on Global Affairs showed that while 59% of American respondents said they would support a trade agreement with Japan, only 41% supported a trade agreement with China.
Specifically, I'm interested in the role of trading partner democracy on support for trade. I look at whether Americans prefer to trade with other democracies, how large the effect is compared to other relevant variables, and why democracies are favored. Regime type is a particularly interesting variable since it's associated with patterns of PTA formation at the global level. Empirically, democracies tend to sign significantly more preferential trade agreements with other democracies (Mansfield, Milner and Rosendorff, 2002) (ungated). While this research project certainly does not aim to provide an explanation for these macro-level patterns (since diffuse public attitudes are only one of many inputs into trade policy decisions), it does suggest that domestic debates over trade are not simply about the distribution of gains and losses. Trading partners matter.
In the vein of forthcoming research by Mike Tomz and Jessica Weeks on the democratic peace, I use a survey experiment to test whether Americans have a preference for democracies in trade. For my pilot survey, I recruited a sample of U.S.-located respondents through Amazon's Mechanical Turk platform. Each respondent was asked to evaluate a set of hypothetical trade agreements. Each question presented the respondents with a brief vignette describing a bilateral trade agreement that would reduce barriers such as tariffs. The vignette provided information on the trading partner's regime type, level of economic development, existing trade with the United States, and number of trade agreements signed with other countries. Additionally, it provided information on estimated job losses/gains from the trade agreement (in agriculture, manufacturing and service sectors), expected welfare gains, and whether the trade agreement contains a labor rights clause. Each piece of information or "attribute" of the agreement was randomly drawn from a set of pre-defined levels (with 2 to 5 levels per attribute). Respondents were then asked whether they thought the United States government should or should not sign the trade agreement and to briefly explain why. The figure below provides an example of such a question (in this case, a rather unfavorable agreement).
The survey is a type of experiment known as a "conjoint" design. Conjoint experiments ask individuals ask individuals to evaluate profiles composed of multiple, randomly assigned attributes. Respondents are asked to rate individual profiles (as in this experiment) or choose a preferred profile from a random set (a "choice-based conjoint" design). The advantage of this design is that it allows researchers to efficiently run multiple experiments simultaneously and to test competing hypotheses. For example, Jens Hainmueller and Dan Hopkins have recently used such conjoint experiments to study voter preferences over immigration.
My sample contained 363 respondents, each answering 3 questions. Three questions were unanswered by respondents, giving a sample of 1086 approve/disapprove responses. I estimate a binary logistic regression model on the 0-1 outcome variable with all of the attribute levels as dummy variable predictors. The figure below shows the estimated effect of each level on the probability that an individual will support the trade agreement relative to the baseline level (denoted by an asterisk). Lines denote 95% cluster-bootstrapped confidence intervals, clustering on individual. Note that because confidence intervals are bootstrapped rather than computed via normal approximation, they are not necessarily symmetric around the point estimate.
Job gains/losses have the largest effect on approval, which makes sense given that the political debate in the U.S. surrounding trade policy (and foreign economic policy in general) focuses so heavily on prospective employment gains. However, even when accounting for these economic variables, trading partner democracy has a statistically non-zero and rather sizable effect. Democracy increases probability of support by, on average, 9 percentage points, with the simulated 95% confidence interval running from about 4% to 15%. That is, respondents in the sample were 9% more likely to support an agreement with a democracy than with an otherwise equivalent autocracy. This is, interestingly, roughly comparable to the estimated effect of minor job gains/losses. So while trading partner democracy is not voters' overriding concern, it does appear to have a meaningful impact.
But the more interesting question is not whether voters prefer democracies, but why. Do voters use democracy as a signal or "heuristic" to infer other favorable characteristics about a trading partner, or is the observed preference simply a taste for other, similar, states? How much of the effect of democracy is transmitted through some mediating variable? I posit three possible mediators for which democracy serves as a proxy - democracies are perceived as trustworthy or "fair" trading partners, democracies are perceived as politically closer to the United States, and democracies are perceived as respectful of citizens' rights. To estimate these mediator effects, I use a "crossover" experimental design suggested by Kosuke Imai, Dustin Tingley and Teppei Yamamoto (2013). For each of the first three questions (used in estimating the main effects above), I measure the value of one of the mediator variables in addition to the outcome. The figure below shows the estimated effect of democracy on each of the mediators. Democracies are perceived as less likely to engage in unfair trading practices, more likely to have friendly relations with the U.S. and more likely to respect their citizens' rights.
After a brief break, respondents are then asked to evaluate three more trade agreements. The second set of profiles is identical to the first in all respects except that the regime type variable is assigned the opposite value. So if the first question asked about a democracy, the fourth will ask about an autocracy. In addition, I "force" the value of one of the mediators to take on the value measured in the first experiment. For example, if the respondent says that they believe the democracy in the first experiment is a trustworthy and fair trader, they are told, as part of the profile in the fourth experiment, that "experts say this country does not engage in unfair trading practices." This design allows me to identify how much of the effect of democracy can be explained by one of these mediator variables. That is, it identifies the counterfactual level of support for an autocracy when the mediator is held at the value for a democracy and vice-versa. The mediator effect is the effect of a change in the mediator variable from the level measured under autocracy to the level measured under democracy.
The figure above shows the estimates and bootstrapped 95% confidence intervals for these mediator effects. I use logit regression to control for the other attributes since there is likely some imbalance due to smaller sample size and imperfect randomization. There are two mediation effects for each mediating variable - one when the "treatment" is held at democracy and one when it is held at autocracy. A significant difference in mediator effects suggests a possible interaction between the treatment and mediator. That is, the effect of a variable like perceived trade fairness differs when the country in question is a democracy rather than an autocracy.
Indeed, this appears to be the case for some of the mediators. While the estimated mediator effect of trade fairness for democracy is positive and statistically significant at the .05 level, the estimated effect appears to be negative for autocracy, although the 95% confidence band crosses 0. Democracies that are perceived as "unfair" trading partners receive a penalty support, but autocracies do not receive a corresponding "boost" for being trustworthy. That the estimate for autocracies is so negative may just be a function of sampling variability, but it may also hint at possible violations of mediator identification assumptions or profile order effects. These are issues I'll try to address in the next iteration of the experiment.
The most interesting mediator effect appears to be political affinity with the United States. Democracies who have unfriendly relations with the U.S. receive a statistically significant penalty to support while friendly autocracies receive a boost. It's difficult to infer the exact magnitude of the relations bonus since the precision of these estimates is much lower since each respondent only answered one question for each mediator. That is, it's still unclear whether friendly autocracies are, on average, equivalent to democracies as far as voters' trade attitudes are concerned. But the evidence is suggestive that some portion of the estimated democracy effect is transmitted through perceptions of political proximity with the United States. Democracy functions as a heuristic for "ally."
Surprisingly, human rights issues, despite their prominence in domestic trade debates, don't seem to influence voters' trade preferences on average. Neither the presence of a labor rights clause nor perceptions of the human rights record of a trading partner have a statistically discernible effect on the probability of support. It may be that labor rights clauses are not perceived as credible constraints, or that rights concerns are important to specific constituencies. The effect may be heterogeneous and averaging across the full sample misses interesting cross-individual differences.
This is of course all very preliminary research so none of the results should be taken as anything more than suggestive. The sample from Mechanical Turk is not perfectly representative of the U.S. population (MTurk samples tend to skew younger and more liberal). While research suggests that experimental effects estimated on MTurk samples tend to be comparable to effects estimated on nationally representative samples, a better (and larger) sample would be preferred. Indeed, the preliminary results do not have much power in estimating the mediation effects - the confidence bounds are large for all of the estimates. In future revisions of this design, I will likely test the mediation effects jointly (as Tomz and Weeks do for the democratic peace) rather than separately. Additionally, other trading partner characteristics omitted from the descriptions might have even larger effects when included in the profiles. I certainly welcome any suggestions for improving the design.
The preliminary evidence suggests that the type of trading partner matters for domestic debates over liberalization. In my sample, I estimate that trade agreements with democracies receive approximately a 9 +/- 5 % boost in support compared to otherwise comparable agreements with non-democracies. Furthermore, I find that some of the effect of democracy is transmitted through perceptions that democracies are more trustworthy trading partners and more likely to have close relations with the United States. Regime type functions partly as a heuristic for inferring other traits about a prospective trading partner. I certainly cannot make specific statements about how these preferences translate into government actions, but the results do imply that democratic governments (or at least the United States government) is not only constrained by the distributional economic consequences of trade, but also by the characteristics of prospective trading partners.
Thursday, April 18, 2013
Sunday, February 17, 2013
Is "skin in the game" the only way to solve principal-agent problems?
Constantine Sandis and Nassim Nicholas Taleb recently invited comments on a working paper titled "Ethics and Asymmetry: Skin in the Game as a Required Heuristic for Acting Under Uncertainty." This opened up a rather heated discussion on Twitter, so I felt it might be useful to comment in longer-form. And while not articulated explicitly as such, it discusses an issue important to many political economists - the principal-agent problem.
From the abstract:
From the abstract:
We propose a global and mandatory heuristic that anyone involved in an action that can possibly generate harm for others, even probabilistically, should be required to be exposed to some damage, regardless of context. We link the rule to various philosophical approaches to ethics and moral luck.
This heuristic is, bluntly speaking, the mandate that actors have "skin in the game."
Reading the paper, I was reminded of a classic scene in the last episode of Joss Whedon's Firefly, "Objects in Space," where the philosophical bounty hunter Jubal Early converses with his hostage, doctor Simon Tam, about the merits of experience in expert decision-making.
Reading the paper, I was reminded of a classic scene in the last episode of Joss Whedon's Firefly, "Objects in Space," where the philosophical bounty hunter Jubal Early converses with his hostage, doctor Simon Tam, about the merits of experience in expert decision-making.
Jubal Early: You ever been shot?Simon: No.Jubal Early: You oughta be shot. Or stabbed, lose a leg. To be a surgeon, you know? Know what kind of pain you're dealing with. They make psychiatrists get psychoanalyzed before they can get certified, but they don't make a surgeon get cut on. That seem right to you?
Individuals who act for others should be exposed to the same risks. Bankers who manage others' money should know the pain of loss. Politicians who fail to act on behalf of their constituents should be voted out of office. Only when the agent and the principal share the same preferences is timeless principal-agent problem solved.
Although a bit simplified, this is the overarching idea that I took from the paper.
Although a bit simplified, this is the overarching idea that I took from the paper.
I think the article obfuscates a pretty basic concept with a great deal of unrelated discussion of ethics, axiology and morality. Not to say that these are not worth considering, but the core problem that Sandis and Taleb identify is a practical one: when people are tasked with making decisions that have repercussions for others, how do we ensure that they make "good" decisions on behalf of those that they affect? Additionally, how do we limit agents' ability to enrich themselves at the expense of their principal(s)?
Sandis and Taleb's argument is uncompromising, which perhaps makes it more appealing as an ethical claim than as a practical one. By arguing that agents are only justified in acting on behalf of principles when they have "skin-in-the-game," they have assumed away the entire principal-agent problem. If the agent has the exact same preferences as the principal (i.e. they are exposed to the same risks), then there is no problem. The agent will always behave in the manner that the principal prescribes.
This is a nice thought exercise, but agents almost never have preferences identical to their principals. They are rarely exposed to identical risks. So "skin-in-the-game" ends up as a kind of aspirational goal for how principal-agent relations should be managed. Even then, Sandis and Taleb's discussion is still much too simple to be of practical value. According to the paper's logic, the best policy is the one that alters the agent's incentives such that they become aligned with those of the principal.
This is a nice thought exercise, but agents almost never have preferences identical to their principals. They are rarely exposed to identical risks. So "skin-in-the-game" ends up as a kind of aspirational goal for how principal-agent relations should be managed. Even then, Sandis and Taleb's discussion is still much too simple to be of practical value. According to the paper's logic, the best policy is the one that alters the agent's incentives such that they become aligned with those of the principal.
This is obvious.
But yet again, there would be no principal-agent problem if the principal always had the means to perfectly discipline the agent.
In the real world, agents rarely share the same preferences as their principals and principals are almost never in perfect control of their agents. Power is shared and relationships are tense. Yet delegation is a necessary aspect of nearly all human institutions.
Moreover, there is rarely a single principal. Agents face conflicting pressures from a myriad of sources. Politicians do not respond to a unified "constituency" but to a diverse array of "constituents." So when Sandis and Taleb argue that decision-makers need "skin-in-the-game," they raise the question of "whose game are we talking about?"
The paper provides a first-best solution to the principal-agent problem, one where the agent is fully attuned to the risks suffered by the principal. But there are costs to such a solution. Depending on the viability of a "skin in the game" solution, a second-best approach may be more desirable. "Skin in the game" is certainly neither global nor mandatory.
Take one example in the paper
The ancients were fully aware of this incentive to hide risks, and implemented very simple but potent heuristics (for the effectiveness and applicability of fast and frugal heuristics, see Gigerenzer, 2010). About 3,800 years ago, Hammurabi’s code specified that if a builder builds a house and the house collapses and causes the death of the owner of the house, that builder shall be put to death. This is the best risk-management rule ever. The ancients understood that the builder will always know more about the risks than the client, and can hide sources of fragility and improve his profitability by cutting corners. The foundation is the best place to hide such things. The builder can also fool the inspector, for the person hiding risk has a large informational advantage over the one who has to find it. The same absence of personal risk is what motivates people to only appear to be doing good, rather than to actually do it.
As a risk management rule, Hammurabi's code ensures that builders suffer the same costs as the owners. However, it also probably ensures an under-provision of houses. Suppose that even a house built by an expert builder has some risk of collapsing despite the builder's best efforts. Knowing this, a builder suffers an additional lifetime cost of possible death every time he/she constructs a house. If the builder places some non-zero value on his life, he/she will choose to constrain the amount of houses that he builds even if there is demand for more. In economic terms, the death risk is an additional "cost" to production.
Now, if the builder valued the life of an owner to the same extent as his/her own life, then the law would simply enforce an already incentive-compatible relationship - there would be no need to force the builder to have "skin in the game," it already would be. But this is almost never the case for obvious reasons. Indeed, the reason why we care about institutions is that rarely do individuals have incentives aligned with what is "optimal" behavior.
But institutional designs impose costs of their own.
The point of this stylized example is that there are constraints in every principal-agent relationship. How do we weigh the benefit of fewer deaths from shoddy houses against the costs of an under-supply of houses? Should we tolerate some shirking on the part of builders if there is an imminent need for more houses? Sandis and Taleb's heuristic gives no answers.
Moreover, even if "skin in the game" is a "first-best" solution to controlling agent behavior, it is not a necessary condition.
In Sandis and Taleb's article, "bad luck" plays a crucial role in justifying their heuristic. But "bad luck" creates its own issues that make perfect enforcement inexorably costly. To get an agent's "skin in the game," the principal needs to be able to punish the agent for bad behavior. But rarely is behavior observed. Rather, it is some outcome on which the principal conditions their punishment. The outcome is to some extent a function of the agent's effort, but it's also subject to unknown randomness. In the builder example, the probability that a building will fail is related to the builder's effort, but it is not inconceivable that an expertly constructed structure might collapse.
Principals get noisy signals of agent behavior. It is unclear whether an outcome is the result of poor decision-making or bad luck. This distinction may or may not matter, depending on the case. However, in many instances where it is difficult to observe the agent's behavior, the optimal solution to the principal-agent problem still leaves the agent somewhat insulated from the costs of their actions.
Consider a political example, the relationship between a government (agent) and the citizens (principal) of a country.
This example draws on an extensive body of game theoretic work by Ferejohn, Barro, Acemoglu, Robinson, Przeworski and recently Fearon (among many, many others). It is very much a simplification.
The citizens, if collectively powerful enough, can overthrow a government that does not behave in a manner that reflects their interests. Assuming that they are perfectly able to see what the government does, they will always overthrow a government that deviates from their desires. The government, knowing this, and preferring not to get overthrown, will comply perfectly with the wishes of the citizens. The government's "skin is in the game" in that the government faces costs (rebellion) concurrently with the citizens (reduced welfare from government shirking).
But typically, citizens cannot perfectly observe government action. The relationship between policy and outcome is complex, particularly in the economic realm. Citizens only observe their and their comrades' welfare, which is a function of both government policy and unforeseen events (like a war in a neighboring country that leads to reduced export revenue). More generally, citizens are unsure how much to blame government action for their current predicament.
If they adopt the same rule as in the perfect observation case - overthrow the government if we observe outcomes different from our ideal outcome - then they are likely to overthrow governments that did nothing wrong, a costly scenario.
Citizens may therefore tolerate some deviation from perfection, that is, weaken the amount to which the government has "skin in the game," based on the fact that they can't perfectly observe the government's behavior. Rejection becomes a question of expectation - small deviations from perfection might just be bad luck, but a massive downturn in welfare can likely be blamed on government incompetence.
Indeed, having too rigid of an "overthrow rule" may even lead to perverse outcomes. Suppose a government knows that, no matter what it does, it is going to be blamed for an economic downturn beyond its control. It now has strong incentives to ignore the will of the public and be as predatory as possible, knowing that it has no hope of appeasing the public either way. These sorts of twisted incentives are at the heart of much game theoretic work on democratization and why leaders choose to give up power.
Is it worth putting the agent's "skin in the game" if that leads the agent toward predatory behavior?
Returning to the example of ancient Babylon, instead of killing builders of failed houses and running the risk that some good builders will get killed by accident, Hammurabi could have developed building codes. This is different from a "skin the game" solution. The builder's "skin" is no longer in the "game" since he/she does not suffer any costs after the house is built. However, failing to build to code is a clear and observable signal that the builder is shirking their responsibilities. The builder still has more private knowledge and can cut corners, but only within limits. Clear deviations from the code get punished. Some non-compliance is tolerated in order to avoid mistaken executions.
What Sandis and Taleb miss in their brief discussion is that there is a continuum of possible mechanisms for controlling an agent's behavior - "skin in the game" is one extreme, but there are other viable solutions depending on the context.
Importantly, there are trade-offs. Conditional on some level of imperfect observation of the agent's behavior (which is made smaller by increasing quality of monitoring), the principal must weigh the benefits of tight control over the agent's behavior against the costs of punishing a "good" agent. If standards are high, the principal might accidentally punish an compliant agent. If standards are low, the "false positive" risk drops, but the incentive to shirk rises. Sandis and Taleb's argument provides no method of resolving this question.
Moreover, there are a myriad of cases where a principal actually benefits by delegating decisions to an agent who is somewhat insulated from the costs facing the principal. This is often labeled dynamic or time-inconsistent preference problem. Governments often find it in their best long-term interests to delegate decision-making to someone who has absolutely no "skin in the game." International courts are a great example of this, as are domestic courts (imagine the prospects for civil rights in 1960s America if Supreme Court judges faced periodic elections).
Now, if the builder valued the life of an owner to the same extent as his/her own life, then the law would simply enforce an already incentive-compatible relationship - there would be no need to force the builder to have "skin in the game," it already would be. But this is almost never the case for obvious reasons. Indeed, the reason why we care about institutions is that rarely do individuals have incentives aligned with what is "optimal" behavior.
But institutional designs impose costs of their own.
The point of this stylized example is that there are constraints in every principal-agent relationship. How do we weigh the benefit of fewer deaths from shoddy houses against the costs of an under-supply of houses? Should we tolerate some shirking on the part of builders if there is an imminent need for more houses? Sandis and Taleb's heuristic gives no answers.
Moreover, even if "skin in the game" is a "first-best" solution to controlling agent behavior, it is not a necessary condition.
In Sandis and Taleb's article, "bad luck" plays a crucial role in justifying their heuristic. But "bad luck" creates its own issues that make perfect enforcement inexorably costly. To get an agent's "skin in the game," the principal needs to be able to punish the agent for bad behavior. But rarely is behavior observed. Rather, it is some outcome on which the principal conditions their punishment. The outcome is to some extent a function of the agent's effort, but it's also subject to unknown randomness. In the builder example, the probability that a building will fail is related to the builder's effort, but it is not inconceivable that an expertly constructed structure might collapse.
Principals get noisy signals of agent behavior. It is unclear whether an outcome is the result of poor decision-making or bad luck. This distinction may or may not matter, depending on the case. However, in many instances where it is difficult to observe the agent's behavior, the optimal solution to the principal-agent problem still leaves the agent somewhat insulated from the costs of their actions.
Consider a political example, the relationship between a government (agent) and the citizens (principal) of a country.
This example draws on an extensive body of game theoretic work by Ferejohn, Barro, Acemoglu, Robinson, Przeworski and recently Fearon (among many, many others). It is very much a simplification.
The citizens, if collectively powerful enough, can overthrow a government that does not behave in a manner that reflects their interests. Assuming that they are perfectly able to see what the government does, they will always overthrow a government that deviates from their desires. The government, knowing this, and preferring not to get overthrown, will comply perfectly with the wishes of the citizens. The government's "skin is in the game" in that the government faces costs (rebellion) concurrently with the citizens (reduced welfare from government shirking).
But typically, citizens cannot perfectly observe government action. The relationship between policy and outcome is complex, particularly in the economic realm. Citizens only observe their and their comrades' welfare, which is a function of both government policy and unforeseen events (like a war in a neighboring country that leads to reduced export revenue). More generally, citizens are unsure how much to blame government action for their current predicament.
If they adopt the same rule as in the perfect observation case - overthrow the government if we observe outcomes different from our ideal outcome - then they are likely to overthrow governments that did nothing wrong, a costly scenario.
Citizens may therefore tolerate some deviation from perfection, that is, weaken the amount to which the government has "skin in the game," based on the fact that they can't perfectly observe the government's behavior. Rejection becomes a question of expectation - small deviations from perfection might just be bad luck, but a massive downturn in welfare can likely be blamed on government incompetence.
Indeed, having too rigid of an "overthrow rule" may even lead to perverse outcomes. Suppose a government knows that, no matter what it does, it is going to be blamed for an economic downturn beyond its control. It now has strong incentives to ignore the will of the public and be as predatory as possible, knowing that it has no hope of appeasing the public either way. These sorts of twisted incentives are at the heart of much game theoretic work on democratization and why leaders choose to give up power.
Is it worth putting the agent's "skin in the game" if that leads the agent toward predatory behavior?
Returning to the example of ancient Babylon, instead of killing builders of failed houses and running the risk that some good builders will get killed by accident, Hammurabi could have developed building codes. This is different from a "skin the game" solution. The builder's "skin" is no longer in the "game" since he/she does not suffer any costs after the house is built. However, failing to build to code is a clear and observable signal that the builder is shirking their responsibilities. The builder still has more private knowledge and can cut corners, but only within limits. Clear deviations from the code get punished. Some non-compliance is tolerated in order to avoid mistaken executions.
What Sandis and Taleb miss in their brief discussion is that there is a continuum of possible mechanisms for controlling an agent's behavior - "skin in the game" is one extreme, but there are other viable solutions depending on the context.
Importantly, there are trade-offs. Conditional on some level of imperfect observation of the agent's behavior (which is made smaller by increasing quality of monitoring), the principal must weigh the benefits of tight control over the agent's behavior against the costs of punishing a "good" agent. If standards are high, the principal might accidentally punish an compliant agent. If standards are low, the "false positive" risk drops, but the incentive to shirk rises. Sandis and Taleb's argument provides no method of resolving this question.
Moreover, there are a myriad of cases where a principal actually benefits by delegating decisions to an agent who is somewhat insulated from the costs facing the principal. This is often labeled dynamic or time-inconsistent preference problem. Governments often find it in their best long-term interests to delegate decision-making to someone who has absolutely no "skin in the game." International courts are a great example of this, as are domestic courts (imagine the prospects for civil rights in 1960s America if Supreme Court judges faced periodic elections).
My ultimate issue with the paper is that it simply lacks sufficient nuance to be useful or informative. By focusing on what should be the ideal-typical relationship between principal and agent, it ignores the trade-offs that plague real world principal-agent problems. Either "skin in the game" is such a broad standard that it is absolutely meaningless, or it represents a specific solution to principal-agent problems, namely subjecting the agent and principal to identical rewards/punishments, that is neither feasible nor, in many instances, desirable.
Thursday, January 31, 2013
Why Inference Matters
This is a post inspired by a recent exchange on twitter between myself and friend, colleague, and TNR writer Nate Cohn (@electionate). The initial exchange was pretty superfluous, but it got me thinking about a broader question how writers about politics should approach quantitative data. That spiraled into this rather long-winded post. Indeed, the original conversation gets lost, but I think a more important broader point gets made. As political journalists and analysts incorporate more data into their writing, they would benefit by thinking in terms of statistical inference. It's just not possible to meaningfully talk about "real-world" data - electoral, or otherwise - without also talking about probability.
First, a quick summary of the discussion that led me to write this: Gary King tweeted a link to this post by Andrew C. Thomas covering a paper written by himself, King, Andrew Gelman and Jonathan Katz that examined "bias" in the Electoral College system. The authors found that despite the winner-take-all allocation of votes, there was no statistically significant evidence of bias in recent election years. Essentially, they modeled the vote share of each candidate at the district level and estimated the Electoral College results for a hypothetical election where the popular vote was tied. Andrew made a graph of their "bias" estimates - all of the 95% confidence intervals in recent years contain 0.
Nate shot back a quick tweet response asking why the results suggest no bias in 2012 when the electorally "decisive" states were 1.6 percentage points more democratic than the national popular vote.
This led to a longer and rather interesting discussion between Nate and myself on how to evaluate Electoral College bias (for what it's worth, my twitter arguments were pretty bad). Nate made some good points about differences in Obama's margin-of-victory in the states needed to win the Electoral College versus his overall popular vote margin-of-victory. He notes that Obama won the states needed to reach 272 by a minimum of 5.4% while his popular vote margin was only 3.9%. If the existence of the Electoral College exaggerated Obama's vote advantage relative to a popular vote system, then it's possible to conclude that the college was "biased" in Obama's favor.
Nate has since turned this into part of an article, arguing:
The argument is interesting (and could in fact be true), but the evidence presented doesn't really say anything. The big problem is that it ignores probability. You can't make a credible argument about the "nature" of an election cycle just by comparing election results data points without a discussion of uncertainty in the data. The results of an election are not the election itself - they are data. The data are what we use to make inferences about certain aspects of one election (or many elections). This distinction is essential. We don't observe whether the Democrats have an Electoral College advantage in 2012 or whether Colorado was more favorable to the Democrats than Virginia or Ohio. We can't observe these things because all we have are the output - the results.
Studying elections is like standing outside an automotive plant and watching cars roll off the assembly line. We never get to see how the car is put together; we only see the final product.
If we knew exactly what went into the final vote tally - that is, if we were the plant managers and not just passive observers - then we wouldn't need statistics. But reality is complicated and we're not omniscient. This is what makes statistical inference so valuable - it lets us quantify our uncertainty about complex processes.
Just for curiosity, I decided to grab the 2012 election data to take a closer look at Nate's argument. I estimated the number of electoral votes that each candidate would receive for a given share of the two-party popular vote. I altered the state-level vote shares assuming a uniform swing - each state was shifted by the difference between the "true" and "hypothesized" popular vote - and computed the corresponding "electoral vote" that the candidate would receive. For time/simplicity, I ignored the Maine/Nebraska allocation rules (including them doesn't affect the conclusions).
To clarify, I measure the Democratic candidate's share of the two-party vote as opposed to share of the total vote. That is, %Dem/(%Dem + %Rep). This measure is common in the political science literature on elections and allows us to make better comparisons by accounting for uneven variations in the third-party vote.
Here's what the 2012 election looks like for President Obama. The dotted vertical line represents the observed vote total, the blue line marks the "tie" point and the horizontal red line represents 270 electoral votes.
This is consistent with Nate's argument. There is a space of potential popular vote outcomes where Obama loses the two-party vote but still wins the Electoral College - the "272" firewall.
How about 2008?
Same thing - about a 1 percentage point loss of the popular vote would have still mean an Electoral College victory for Barack Obama - again Colorado is key here. Indeed, the advantage appears to be even larger here than in 2012.
2004?
Here the advantage is less perceptible - barely a fifth of a percentage point. Compared to 2008 and 2012, this would suggest a "trend" towards greater pro-Democratic bias in the Electoral College
2000?
Bush obviously has the advantage here (and as we know, he lost the popular vote)
How about jumping to 1992?
Again the elder Bush has a slight advantage.
I could keep going. One could infer a story of a growing democratic advantage in the Electoral College from these five data points - it's there in the most recent two cycles and it wasn't there before. But in the end, these graphs are not at all meaningful.
The problem with what I've done above is that it's at best a convoluted summary of the data - the implied inferences about "advantage" are absurd absent a discussion of probability. Consider the assumptions behind the argument. It implicitly assumes that if we re-ran the 2012 election from the beginning, we would get the exact same results. It follows that if we re-ran the election and posited that Pres. Obama received only 50% or 49% of the two-party vote, then the results in each state would shift exactly by 1-2%.
Of course this is crazy. We easily observe that from year to year, changes in two-party vote shares are not constant across all states (this is why the uniform swing assumption is a statistical modeling assumption and not a statement of fact). Without probability our counterfactual observations of the electoral vote are nonsense, since we implicitly assume that there is zero variation in the vote share. This is certainly not the case. If we could hypothetically re-run the election, we would not expect the vote share to be exactly the same. There are a host of elements, from the campaign to the weather on election day, that could shift the results. We would expect them to be close, but we are inherently uncertain about any counterfactual scenario.
However, we cannot reason about the 2012 election without considering counterfactuals - what would the result have been had A happened instead of B. The problem is that we only get to observe the election once - we have to estimate the counterfactual, and all estimates are uncertain.
This is where thinking in terms of inference becomes useful. Political analysts want to move beyond summarizing the data (election returns) and make some meaningful explanatory argument about the election itself. It's difficult to do this in a quantitative sense without accounting for uncertainty in the counterfactuals.
Here is one way of evaluating Nate's argument using the same election result data that incorporates probability. It's very much constrained by the data, but that's kind of the point - looking just at a couple of election results doesn't tell us much.
The core question is: could the vote share gap that Nate identifies be due to chance?
Let's imagine that the Democratic party candidate's two-party vote share in any given state is modeled by the following process
$$v_i = \mu + \delta_i + \epsilon_i$$
$v_i$ represents the two-party vote share in state $i$ received by the Democratic party candidate. $\mu$ is the 'fixed' "national" component of the vote - the component of the vote accounted for by national-level factors like economic growth. It does not vary from state-to-state. $\delta_i$ is the 'fixed' state component of the vote share - it reflects the static attributes of the state like demographics. For a state like California, it would be positive. For Wyoming, negative. This is the attribute that we're interested in. In particular, can we say with confidence from the evidence that Colorado, Obama's "firewall" state, structurally favors the Democrats more than a state like Virginia where the President's electoral performance roughly matched the popular vote? Or, is the gap that we observe attributable to random variation.
That's where $\epsilon_i$ comes in. $\epsilon_i$ represents the component of error that's not unique to the election year. That is, if we were to re-run the election, the differences in the observed $v_i$ will be a function of $\epsilon_i$ taking on a different value. This represents the "idiosyncracies" of the election - weather, turnout, etc... $\epsilon_i$ is what introduces probability into the analysis.
For the sake of the model, we assume that $\epsilon_i$ is distributed normally with mean $0$ and variance $\sigma_i^2$. I'll allow each $\sigma_i$ to be different - that is, some states might exhibit more random variation than others.
Normally the analyst would then estimate the parameters of the model. However, I have neither the data nor the time to gather it (if you're interested in data, see the Gelman et. al. paper). My goal with this toy example is to show that the observed difference in the 2012 vote share between the "270th" state, Colorado, and a comparable state like Virginia, which has mirrored the popular vote share, might be reasonably attributed to random fluctuations within each state. To do so, I generate rough estimates of some parameters while making sensible assumptions about others.
As an aside, I could also have done a comparison between Colorado and a national popular vote statistic. However, since I'm only working with vote shares, I would have to make even more assumptions about the distribution of voters across states in order to correctly weight the national vote. Additionally, I would have to make even more assumptions about the data generating process in the other 48 states + D.C. This approach is a bit easier and demonstrates the same point.
I assume that $\mu$ is equal to President Obama's share of the two-party vote in 2012. The estimate of $\mu$ itself is irrelevant, since we're interested in differences in $\delta_i$ between Colorado and a comparable state (the $\mu$s cancel). But for the sake of the model it's helpful since it allows me to use $0$ as a reasonable "null" estimate of $v_i$.
The question is whether Colorado's $\sigma$ is statistically different from that of Virginia. If it is, then it might make sense to talk about an electoral "advantage" in 2012. Obviously this assumes that the $\sigma$ values for all of the other states in the 272 elector coalition are at least as large as Colorado's, which I'll grant is true for the sake of the model (it only works against me). Colorado is the "weakest link."
The way to answer this question is to test the null hypothesis of no difference. Suppose that $\delta_i$ equals $0$ for Virginia and Colorado - that there is no substantive difference between Virginia (the baseline state) and Colorado (the "firewall" state). What's the probability that we would observe a gap of $.7\%$ between their state-level electoral results?
Estimating the probability requires making a reasonable estimate for the variance of $\epsilon_i$. How much variation in the electoral results can we attribute to random error and how much can we attribute to substantive features of the electoral map.
Unfortunately, we can't re-play the 2012 election, so we have to look at history to calculate variance - in this case the 2008 election. As Nate notes in the post, the party coalitions are relatively stable and demographic changes/realignments typically take many election cycles to complete. As such, we may be able to assume that the state-level "structural" effects are the same in 2012 as they are in 2008. That is, the variation in the popular-vote - state-level vote from year-to-year can be used to estimate the variance of the error terms $\epsilon_{CO}$ and $\epsilon_{VA}$.
But it's hard to get a good estimate of a variance with only two data points. Four is slightly better, but to do so we have to assume that the error terms of Colorado and Virginia's vote shares are identically distributed - that $\sigma_{CO}^2 = \sigma_{VA}^2$. That's not to say that the observed "error" values are the same, just that they're drawn from the same distribution. Is this a reasonable assumption? The residuals don't appear to be dramatically different, but we just don't know - again, another statistical trade-off. Ultimately the point of this exercise is to demonstrate how one-election-cycle observations can be reasonably explained by random variation, so the aim is plausibility versus absolute precision.
So I estimate the pooled error variance for each state as the sample variance of the 2008 and 2012 democratic two-party share in each state subtracted from the two-party share of the popular vote. In theory, we could add more electoral cycles to the estimate, but the assumption that $\sigma_i$ does not change from year to year becomes weaker. If I restrict myself to just looking at electoral results, then I have to accept data limitations. This is a general problem with any statistical study of elections - if we look only at percentage aggregates at high levels, there just isn't a lot of data to work with.
The next step is simulation. Suppose we repeated the 2012 election a large number of times on the basis of this model. How often would we see a gap of at least .78% between the vote shares in Colorado and
Virginia? The histogram below plots the simulated distribution of that difference. The x-axis gives the percentage differences .01 = 1%). The red dotted line represents a difference of .78 percentage points. Also relevant is the blue dotted line, which represents a difference of -.78 percentage points (Virginia's vote share is greater than Colorado's).
What's the probability of seeing a difference as extreme as the gap seen in 2012? Turns out, it's about 40% - certainly not incredibly high, but also not unlikely.
Suppose we only care about positive differences, that is, we are absolutely sure that it's impossible for Virginia to be structurally more favorable to the Democrats than Colorado. There is either no difference or $\delta_{CO} > \delta_{VA}$. What's the probability of seeing a difference equal to or greater than .78 percentage points? Well, it's still roughly 20%. Statisticians (as a norm, not as a hard rule) tend to use 5% as the cut-off for rejecting the "null hypothesis" and accepting that there is likely some underlying difference in the parameters not attributable to random variation - in this case, we fail to reject.
If the error variance were smaller, that is, if the amount of year-to-year variation that can be ascribed to randomness were lowered, it's possible that a gap of .78% would be surprising. This would lead us to conclude that there indeed may be a structural difference - that Colorado is more advantageous to the Democratic candidate relative to the baseline. The details of the model parameters are really not the point of this exercise. Any estimate of the variance from the sparse data used here is not very reliable. The fact that we are using two elections where the same candidate stood for office suggests non-independence and error correlation which would likely downwardly bias our variance estimates. We could look at the gap in 2008 - two observations are better than one, but in that case, why not include the whole of electoral history - the data exist. Moreover, to make our counterfactual predictions more precise, we need some covariates - independent variables that are good predictors of vote share. In short, we need real statistical models.
This is what the Gelman et. al. paper does. While a quick comparison of the data points hints at a structural bias in 2012, a more in-depth modeling approach suggests that difference is not statistically significant. That is, it's likely that any apparent structural advantage in the Electoral College in 2012 is nothing more than noise.
The problem with Nate's argument, ultimately, is that it posits a counterfactual (Romney winning the popular vote by a slight margin) without describing the uncertainty around that counterfactual. Without talking about uncertainty, it is impossible to discern whether the observed phenomenon is a result of chance or something substantively interesting.
It does appear that I'm spending a lot of time dissecting a rather trivial point, which is true. My goal in this post was not to focus on whether or not the "Electoral College advantage" is true or not - the 2012/2008 election results alone aren't enough data to make that determination. Rather, I wanted to walk through a simple example of inference to demonstrate why it's important to pay attention to probability and randomness when talking about data - to think about data in a statistical manner.
We cannot, just by looking at a set of data points, immediately explain which differences are due to structural factors and which ones are due to randomness. No matter what, we make assumptions to draw inferences about the underlying structure. Ignoring randomness doesn't make it go away - it just means making extremely tenuous assumptions about the data (namely, zero variance).
To summarize, if you get anything out of this post, it should be these three points
1) The data are not the parameters (the things we're interested in).
2) To infer the parameters from the data, you need to think about probability
3) Statistical models are a helpful way of understanding the uncertainty inherent in the data.
I'm not suggesting that political writers need to become statisticians. Journalism isn't academia. It doesn't have the luxury of time spent carefully analyzing the data. I'm not expecting to see regressions in every blog post I read, nor do I want to. The Atlantic, WaPo, TNR, NYT are neither Political Analysis nor the Journal of the Royal Statistical Society.
But conversely, if there is increasingly a demand for "quantitative" or "numbers-oriented" analysis of political events, then writers should make some effort to use those numbers correctly. At the very least, it's valuable to think of any empirical claim - whether retrospective or predictive - in terms of inference. We have things we know and we want to make arguments about things we don't know or cannot observe. At it's core, argumentation is about reasoning from counterfactuals and counterfactuals always carry uncertainty with them.
Even if the goal is just to describe one election cycle, one cannot get very far just comparing electoral returns at various levels of geographic aggregation. Again, election returns are just data - using them to make substantive statements, even if it's only about a single election, relies on implicit inferences which typically ignore the role of uncertainty. And if we want to go beyond just describing an election and identifying trends or features of the electoral landscape, quantitative "inference" without probability is just dart-throwing.
First, a quick summary of the discussion that led me to write this: Gary King tweeted a link to this post by Andrew C. Thomas covering a paper written by himself, King, Andrew Gelman and Jonathan Katz that examined "bias" in the Electoral College system. The authors found that despite the winner-take-all allocation of votes, there was no statistically significant evidence of bias in recent election years. Essentially, they modeled the vote share of each candidate at the district level and estimated the Electoral College results for a hypothetical election where the popular vote was tied. Andrew made a graph of their "bias" estimates - all of the 95% confidence intervals in recent years contain 0.
Nate shot back a quick tweet response asking why the results suggest no bias in 2012 when the electorally "decisive" states were 1.6 percentage points more democratic than the national popular vote.
This led to a longer and rather interesting discussion between Nate and myself on how to evaluate Electoral College bias (for what it's worth, my twitter arguments were pretty bad). Nate made some good points about differences in Obama's margin-of-victory in the states needed to win the Electoral College versus his overall popular vote margin-of-victory. He notes that Obama won the states needed to reach 272 by a minimum of 5.4% while his popular vote margin was only 3.9%. If the existence of the Electoral College exaggerated Obama's vote advantage relative to a popular vote system, then it's possible to conclude that the college was "biased" in Obama's favor.
Nate has since turned this into part of an article, arguing:
The easiest way to judge the Democrats’ newfound Electoral College advantage is by comparing individual states to the popular vote. Last November, Obama won states worth 285 electoral votes by a larger margin than the country as a whole, suggesting that Obama would have had the advantage if the popular vote were tied. But the current system appears even more troubling for Republicans when you consider the wide gap between the “tipping point” state and the national popular vote. Obama’s 270th electoral vote came from Colorado, which voted for the president by nearly 5.4 points—almost 1.8 points more than his popular vote victory. Simply put, the GOP is probably better off trying to win the national popular vote the state contests in Pennsylvania or Colorado, since the national popular vote was much closer in 2012 than the vote in those tipping point states. Obama enjoyed a similar Electoral College advantage in 2008.Here the wording is slightly different - the term is Electoral College "advantage" rather than Electoral College "bias," but the argument is essentially the same - currently the electoral geographic landscape is such that the Democrats benefit disproportionately - a shift to a popular vote system would help the Republicans.
The argument is interesting (and could in fact be true), but the evidence presented doesn't really say anything. The big problem is that it ignores probability. You can't make a credible argument about the "nature" of an election cycle just by comparing election results data points without a discussion of uncertainty in the data. The results of an election are not the election itself - they are data. The data are what we use to make inferences about certain aspects of one election (or many elections). This distinction is essential. We don't observe whether the Democrats have an Electoral College advantage in 2012 or whether Colorado was more favorable to the Democrats than Virginia or Ohio. We can't observe these things because all we have are the output - the results.
Studying elections is like standing outside an automotive plant and watching cars roll off the assembly line. We never get to see how the car is put together; we only see the final product.
If we knew exactly what went into the final vote tally - that is, if we were the plant managers and not just passive observers - then we wouldn't need statistics. But reality is complicated and we're not omniscient. This is what makes statistical inference so valuable - it lets us quantify our uncertainty about complex processes.
Just for curiosity, I decided to grab the 2012 election data to take a closer look at Nate's argument. I estimated the number of electoral votes that each candidate would receive for a given share of the two-party popular vote. I altered the state-level vote shares assuming a uniform swing - each state was shifted by the difference between the "true" and "hypothesized" popular vote - and computed the corresponding "electoral vote" that the candidate would receive. For time/simplicity, I ignored the Maine/Nebraska allocation rules (including them doesn't affect the conclusions).
To clarify, I measure the Democratic candidate's share of the two-party vote as opposed to share of the total vote. That is, %Dem/(%Dem + %Rep). This measure is common in the political science literature on elections and allows us to make better comparisons by accounting for uneven variations in the third-party vote.
Here's what the 2012 election looks like for President Obama. The dotted vertical line represents the observed vote total, the blue line marks the "tie" point and the horizontal red line represents 270 electoral votes.
This is consistent with Nate's argument. There is a space of potential popular vote outcomes where Obama loses the two-party vote but still wins the Electoral College - the "272" firewall.
How about 2008?
Same thing - about a 1 percentage point loss of the popular vote would have still mean an Electoral College victory for Barack Obama - again Colorado is key here. Indeed, the advantage appears to be even larger here than in 2012.
2004?
Here the advantage is less perceptible - barely a fifth of a percentage point. Compared to 2008 and 2012, this would suggest a "trend" towards greater pro-Democratic bias in the Electoral College
2000?
Bush obviously has the advantage here (and as we know, he lost the popular vote)
How about jumping to 1992?
Again the elder Bush has a slight advantage.
I could keep going. One could infer a story of a growing democratic advantage in the Electoral College from these five data points - it's there in the most recent two cycles and it wasn't there before. But in the end, these graphs are not at all meaningful.
The problem with what I've done above is that it's at best a convoluted summary of the data - the implied inferences about "advantage" are absurd absent a discussion of probability. Consider the assumptions behind the argument. It implicitly assumes that if we re-ran the 2012 election from the beginning, we would get the exact same results. It follows that if we re-ran the election and posited that Pres. Obama received only 50% or 49% of the two-party vote, then the results in each state would shift exactly by 1-2%.
Of course this is crazy. We easily observe that from year to year, changes in two-party vote shares are not constant across all states (this is why the uniform swing assumption is a statistical modeling assumption and not a statement of fact). Without probability our counterfactual observations of the electoral vote are nonsense, since we implicitly assume that there is zero variation in the vote share. This is certainly not the case. If we could hypothetically re-run the election, we would not expect the vote share to be exactly the same. There are a host of elements, from the campaign to the weather on election day, that could shift the results. We would expect them to be close, but we are inherently uncertain about any counterfactual scenario.
However, we cannot reason about the 2012 election without considering counterfactuals - what would the result have been had A happened instead of B. The problem is that we only get to observe the election once - we have to estimate the counterfactual, and all estimates are uncertain.
This is where thinking in terms of inference becomes useful. Political analysts want to move beyond summarizing the data (election returns) and make some meaningful explanatory argument about the election itself. It's difficult to do this in a quantitative sense without accounting for uncertainty in the counterfactuals.
Here is one way of evaluating Nate's argument using the same election result data that incorporates probability. It's very much constrained by the data, but that's kind of the point - looking just at a couple of election results doesn't tell us much.
The core question is: could the vote share gap that Nate identifies be due to chance?
Let's imagine that the Democratic party candidate's two-party vote share in any given state is modeled by the following process
$$v_i = \mu + \delta_i + \epsilon_i$$
$v_i$ represents the two-party vote share in state $i$ received by the Democratic party candidate. $\mu$ is the 'fixed' "national" component of the vote - the component of the vote accounted for by national-level factors like economic growth. It does not vary from state-to-state. $\delta_i$ is the 'fixed' state component of the vote share - it reflects the static attributes of the state like demographics. For a state like California, it would be positive. For Wyoming, negative. This is the attribute that we're interested in. In particular, can we say with confidence from the evidence that Colorado, Obama's "firewall" state, structurally favors the Democrats more than a state like Virginia where the President's electoral performance roughly matched the popular vote? Or, is the gap that we observe attributable to random variation.
That's where $\epsilon_i$ comes in. $\epsilon_i$ represents the component of error that's not unique to the election year. That is, if we were to re-run the election, the differences in the observed $v_i$ will be a function of $\epsilon_i$ taking on a different value. This represents the "idiosyncracies" of the election - weather, turnout, etc... $\epsilon_i$ is what introduces probability into the analysis.
For the sake of the model, we assume that $\epsilon_i$ is distributed normally with mean $0$ and variance $\sigma_i^2$. I'll allow each $\sigma_i$ to be different - that is, some states might exhibit more random variation than others.
Normally the analyst would then estimate the parameters of the model. However, I have neither the data nor the time to gather it (if you're interested in data, see the Gelman et. al. paper). My goal with this toy example is to show that the observed difference in the 2012 vote share between the "270th" state, Colorado, and a comparable state like Virginia, which has mirrored the popular vote share, might be reasonably attributed to random fluctuations within each state. To do so, I generate rough estimates of some parameters while making sensible assumptions about others.
As an aside, I could also have done a comparison between Colorado and a national popular vote statistic. However, since I'm only working with vote shares, I would have to make even more assumptions about the distribution of voters across states in order to correctly weight the national vote. Additionally, I would have to make even more assumptions about the data generating process in the other 48 states + D.C. This approach is a bit easier and demonstrates the same point.
I assume that $\mu$ is equal to President Obama's share of the two-party vote in 2012. The estimate of $\mu$ itself is irrelevant, since we're interested in differences in $\delta_i$ between Colorado and a comparable state (the $\mu$s cancel). But for the sake of the model it's helpful since it allows me to use $0$ as a reasonable "null" estimate of $v_i$.
The question is whether Colorado's $\sigma$ is statistically different from that of Virginia. If it is, then it might make sense to talk about an electoral "advantage" in 2012. Obviously this assumes that the $\sigma$ values for all of the other states in the 272 elector coalition are at least as large as Colorado's, which I'll grant is true for the sake of the model (it only works against me). Colorado is the "weakest link."
The way to answer this question is to test the null hypothesis of no difference. Suppose that $\delta_i$ equals $0$ for Virginia and Colorado - that there is no substantive difference between Virginia (the baseline state) and Colorado (the "firewall" state). What's the probability that we would observe a gap of $.7\%$ between their state-level electoral results?
Estimating the probability requires making a reasonable estimate for the variance of $\epsilon_i$. How much variation in the electoral results can we attribute to random error and how much can we attribute to substantive features of the electoral map.
Unfortunately, we can't re-play the 2012 election, so we have to look at history to calculate variance - in this case the 2008 election. As Nate notes in the post, the party coalitions are relatively stable and demographic changes/realignments typically take many election cycles to complete. As such, we may be able to assume that the state-level "structural" effects are the same in 2012 as they are in 2008. That is, the variation in the popular-vote - state-level vote from year-to-year can be used to estimate the variance of the error terms $\epsilon_{CO}$ and $\epsilon_{VA}$.
But it's hard to get a good estimate of a variance with only two data points. Four is slightly better, but to do so we have to assume that the error terms of Colorado and Virginia's vote shares are identically distributed - that $\sigma_{CO}^2 = \sigma_{VA}^2$. That's not to say that the observed "error" values are the same, just that they're drawn from the same distribution. Is this a reasonable assumption? The residuals don't appear to be dramatically different, but we just don't know - again, another statistical trade-off. Ultimately the point of this exercise is to demonstrate how one-election-cycle observations can be reasonably explained by random variation, so the aim is plausibility versus absolute precision.
So I estimate the pooled error variance for each state as the sample variance of the 2008 and 2012 democratic two-party share in each state subtracted from the two-party share of the popular vote. In theory, we could add more electoral cycles to the estimate, but the assumption that $\sigma_i$ does not change from year to year becomes weaker. If I restrict myself to just looking at electoral results, then I have to accept data limitations. This is a general problem with any statistical study of elections - if we look only at percentage aggregates at high levels, there just isn't a lot of data to work with.
The next step is simulation. Suppose we repeated the 2012 election a large number of times on the basis of this model. How often would we see a gap of at least .78% between the vote shares in Colorado and
Virginia? The histogram below plots the simulated distribution of that difference. The x-axis gives the percentage differences .01 = 1%). The red dotted line represents a difference of .78 percentage points. Also relevant is the blue dotted line, which represents a difference of -.78 percentage points (Virginia's vote share is greater than Colorado's).
What's the probability of seeing a difference as extreme as the gap seen in 2012? Turns out, it's about 40% - certainly not incredibly high, but also not unlikely.
Suppose we only care about positive differences, that is, we are absolutely sure that it's impossible for Virginia to be structurally more favorable to the Democrats than Colorado. There is either no difference or $\delta_{CO} > \delta_{VA}$. What's the probability of seeing a difference equal to or greater than .78 percentage points? Well, it's still roughly 20%. Statisticians (as a norm, not as a hard rule) tend to use 5% as the cut-off for rejecting the "null hypothesis" and accepting that there is likely some underlying difference in the parameters not attributable to random variation - in this case, we fail to reject.
If the error variance were smaller, that is, if the amount of year-to-year variation that can be ascribed to randomness were lowered, it's possible that a gap of .78% would be surprising. This would lead us to conclude that there indeed may be a structural difference - that Colorado is more advantageous to the Democratic candidate relative to the baseline. The details of the model parameters are really not the point of this exercise. Any estimate of the variance from the sparse data used here is not very reliable. The fact that we are using two elections where the same candidate stood for office suggests non-independence and error correlation which would likely downwardly bias our variance estimates. We could look at the gap in 2008 - two observations are better than one, but in that case, why not include the whole of electoral history - the data exist. Moreover, to make our counterfactual predictions more precise, we need some covariates - independent variables that are good predictors of vote share. In short, we need real statistical models.
This is what the Gelman et. al. paper does. While a quick comparison of the data points hints at a structural bias in 2012, a more in-depth modeling approach suggests that difference is not statistically significant. That is, it's likely that any apparent structural advantage in the Electoral College in 2012 is nothing more than noise.
The problem with Nate's argument, ultimately, is that it posits a counterfactual (Romney winning the popular vote by a slight margin) without describing the uncertainty around that counterfactual. Without talking about uncertainty, it is impossible to discern whether the observed phenomenon is a result of chance or something substantively interesting.
It does appear that I'm spending a lot of time dissecting a rather trivial point, which is true. My goal in this post was not to focus on whether or not the "Electoral College advantage" is true or not - the 2012/2008 election results alone aren't enough data to make that determination. Rather, I wanted to walk through a simple example of inference to demonstrate why it's important to pay attention to probability and randomness when talking about data - to think about data in a statistical manner.
We cannot, just by looking at a set of data points, immediately explain which differences are due to structural factors and which ones are due to randomness. No matter what, we make assumptions to draw inferences about the underlying structure. Ignoring randomness doesn't make it go away - it just means making extremely tenuous assumptions about the data (namely, zero variance).
To summarize, if you get anything out of this post, it should be these three points
1) The data are not the parameters (the things we're interested in).
2) To infer the parameters from the data, you need to think about probability
3) Statistical models are a helpful way of understanding the uncertainty inherent in the data.
I'm not suggesting that political writers need to become statisticians. Journalism isn't academia. It doesn't have the luxury of time spent carefully analyzing the data. I'm not expecting to see regressions in every blog post I read, nor do I want to. The Atlantic, WaPo, TNR, NYT are neither Political Analysis nor the Journal of the Royal Statistical Society.
But conversely, if there is increasingly a demand for "quantitative" or "numbers-oriented" analysis of political events, then writers should make some effort to use those numbers correctly. At the very least, it's valuable to think of any empirical claim - whether retrospective or predictive - in terms of inference. We have things we know and we want to make arguments about things we don't know or cannot observe. At it's core, argumentation is about reasoning from counterfactuals and counterfactuals always carry uncertainty with them.
Even if the goal is just to describe one election cycle, one cannot get very far just comparing electoral returns at various levels of geographic aggregation. Again, election returns are just data - using them to make substantive statements, even if it's only about a single election, relies on implicit inferences which typically ignore the role of uncertainty. And if we want to go beyond just describing an election and identifying trends or features of the electoral landscape, quantitative "inference" without probability is just dart-throwing.
Wednesday, January 23, 2013
Public Opposition to Drones in Pakistan - A Question of Wording
Professors C. Christine Fair, Karl Kaltenthaler, and William J. Miller have a new article in the Atlantic on public attitudes toward U.S. drone attacks in Pakistan. The piece is a shortened version of a longer working paper that looks at the factors affecting knowledge of and opposition to the drone program among Pakistanis. They argue that Pakistani citizens are not as universally opposed to the drone program as is commonly believed and that public opinion is fragmented. According to Pew Research, only a slim majority report that they know anything about the drone program, and of those who do know "a lot" or "a little," only 44% say that they oppose the attacks.
Fair, Kaltenthaler and Miller valuably point out that Pakistani attitudes are not homogeneous and that only a minority of Pakistanis even know about the drone program. Making broad and sweeping claims about Pakistani opinion on the drone program is difficult because the average citizen tends to know little about foreign policy issues (this is just as true in the United States as it is in Pakistan)
However, I think Fair, Kaltenthaler and Miller may be going too far in asserting that "the conventional wisdom is wrong." Their argument that only 44% of Pakistanis oppose the drone program is highly sensitive to the choice of survey question. While Pew asks a series of questions on attitudes toward drones, Fair et. al. choose to focus on only one of these questions. In doing so, they place a lot of faith in the reliability of that question as an indicator of respondents' opposition to the drone program. This is a persistent issue in all forms of survey research. Scholars are interested in some unobservable quantity (public opinion) and have to use proxies (survey responses) to infer what cannot be directly seen. They must assume that their proxy is a good one.
Looking at the other survey questions suggests that this faith may be misplaced. Although only 44% of respondents say that they "oppose" drone attacks, 74% of respondents think that the drone attacks are "very bad" and 97% think that they are either "bad" or "very bad". If both questions were proxying for the same latent variable we would not expect this extreme gap. If 17% of respondents "support" drone strikes and only 2% of respondents think that drone attacks are a good thing, then a significant majority of those who say they "support" strikes also think that they are "bad" or "very bad" - a strange puzzle. While it's not inconceivable for people to say that they support policies that they think are bad, a more likely explanation is that respondents' answers are strongly affected by the way the questions are worded and that the question used by Fair et. al. may not be a good proxy for the quantity of interest.
So what's the problem with the question? The main issue is that it asks respondents to evaluate a hypothetical future scenario rather than reflect on the existing drone program.
First, it states that these drone attacks will be conducted in "conjunction" with the Pakistani government. While I don't know how this term was translated, it certainly suggests a lot more involvement by the Pakistani government than currently exists. The Pakistani government may "tacitly approve" the existing US drone program, but it is difficult to characterize drone attacks as being conducted in "conjunction" with Islamabad. Pew's survey respondents seem to agree - a significant majority (69%) believe that the U.S. is exclusively "conducting" the drone attacks, but a plurality (47%) believe that the attacks carry the approval of the government of Pakistan. Given that the unilateral nature of the strikes is often cited as a reason for their unpopularity, a respondent may support (or at least not oppose) the proposal in the question while still opposing the drone program as it is currently conducted.
Second, the question makes no mention of possible civilian casualties or other drawbacks while highlighting the benefits of combating extremists, a threat that many Pakistanis are concerned about. This may be minor, but it matters a lot. As Fair et. al. point out, the drone debate is a low-information environment. When respondents' attitudes about a policy are not well crystallized, subtle differences in question wording that highlight different costs or benefits can have a major impact on the responses given. For example, Michael Hiscox found [gated/ungated draft] that question framing had a sizable effect on Americans' support or opposition to international trade liberalization - another case where the average respondent is typically not well-informed. Respondents exposed to a brief anti-trade introduction were 17% less likely to support free trade.
Certainly, the question that Fair et. al. use is not explicitly presenting respondents with a pro-drone viewpoint. However, it is still very likely that framing effects are skewing the responses. Consider another question asked by Pew that is simpler and more direct
I'm not arguing that this question is a superior measure. For example, it may not be measuring approval of drone strikes themselves, but rather a general sentiment toward the US (among the heavily male/internet-savvy subset interviewed). It may overstate "true" opposition, suggesting discontent with the way the program has been conducted but support for the idea of drone attacks. It might even be a consequence of social desirability bias. The point is that we don't know from the data that we have. Question framing has a substantial impact on survey responses and researchers should be careful about drawing conclusions without clarifying their assumptions about what the survey questions are measuring.
The question used in the study by Fair et. al. to measure support/opposition to the US drone program is not simply asking whether Pakistanis support or oppose the US drone program. It is asking whether Pakistanis would support a hypothetical drone program coordinated with the Pakistani government. Moreover, it exclusively highlights the benefits of such a program rather than the costs. This would not be a problem if support were invariant to question wording, but this is decidedly not the case. And even with the rather favorable framing, only 17% of respondents said that they would approve of drone strikes against extremists.
So I'm a little skeptical of the claim that commentators have been grossly overestimating the level of opposition to drone strikes. I certainly would not call Pakistani opposition to the drone program a "vocal plurality." This isn't to say that improved transparency and better PR on the part of the US would do nothing to improve perceptions, but it is a very difficult task that is constrained at multiple levels. Even if the Pakistani government were to become directly involved in the drone program (very unlikely given the political consequences), the survey results suggest that it would gather the support of about 17% of the subset of Pakistanis who are currently informed about the drone program.
The most important takeaway is that we simply don't know enough from the survey data. It's just too blunt. However, it does point to issue framing and elite rhetoric as important elements of opinion-formation on drones, suggesting interesting avenues for survey experiment work in the future.
h/t to Phil Arena for tweeting the link to the Atlantic article.
Fair, Kaltenthaler and Miller valuably point out that Pakistani attitudes are not homogeneous and that only a minority of Pakistanis even know about the drone program. Making broad and sweeping claims about Pakistani opinion on the drone program is difficult because the average citizen tends to know little about foreign policy issues (this is just as true in the United States as it is in Pakistan)
However, I think Fair, Kaltenthaler and Miller may be going too far in asserting that "the conventional wisdom is wrong." Their argument that only 44% of Pakistanis oppose the drone program is highly sensitive to the choice of survey question. While Pew asks a series of questions on attitudes toward drones, Fair et. al. choose to focus on only one of these questions. In doing so, they place a lot of faith in the reliability of that question as an indicator of respondents' opposition to the drone program. This is a persistent issue in all forms of survey research. Scholars are interested in some unobservable quantity (public opinion) and have to use proxies (survey responses) to infer what cannot be directly seen. They must assume that their proxy is a good one.
Looking at the other survey questions suggests that this faith may be misplaced. Although only 44% of respondents say that they "oppose" drone attacks, 74% of respondents think that the drone attacks are "very bad" and 97% think that they are either "bad" or "very bad". If both questions were proxying for the same latent variable we would not expect this extreme gap. If 17% of respondents "support" drone strikes and only 2% of respondents think that drone attacks are a good thing, then a significant majority of those who say they "support" strikes also think that they are "bad" or "very bad" - a strange puzzle. While it's not inconceivable for people to say that they support policies that they think are bad, a more likely explanation is that respondents' answers are strongly affected by the way the questions are worded and that the question used by Fair et. al. may not be a good proxy for the quantity of interest.
So what's the problem with the question? The main issue is that it asks respondents to evaluate a hypothetical future scenario rather than reflect on the existing drone program.
I'm going to read you a list of things the US might do to combat extremist groups in Pakistan. Please tell me whether you would support or oppose...conducting drone attacks in conjunction with the Pakistani government against leaders of extremist groups.The question is not about what the US is currently doing to combat extremist groups, it is asking instead whether respondents would support a course of action that the US might take. Importantly, this course is framed in a way that is likely more appealing than the status quo.
First, it states that these drone attacks will be conducted in "conjunction" with the Pakistani government. While I don't know how this term was translated, it certainly suggests a lot more involvement by the Pakistani government than currently exists. The Pakistani government may "tacitly approve" the existing US drone program, but it is difficult to characterize drone attacks as being conducted in "conjunction" with Islamabad. Pew's survey respondents seem to agree - a significant majority (69%) believe that the U.S. is exclusively "conducting" the drone attacks, but a plurality (47%) believe that the attacks carry the approval of the government of Pakistan. Given that the unilateral nature of the strikes is often cited as a reason for their unpopularity, a respondent may support (or at least not oppose) the proposal in the question while still opposing the drone program as it is currently conducted.
Second, the question makes no mention of possible civilian casualties or other drawbacks while highlighting the benefits of combating extremists, a threat that many Pakistanis are concerned about. This may be minor, but it matters a lot. As Fair et. al. point out, the drone debate is a low-information environment. When respondents' attitudes about a policy are not well crystallized, subtle differences in question wording that highlight different costs or benefits can have a major impact on the responses given. For example, Michael Hiscox found [gated/ungated draft] that question framing had a sizable effect on Americans' support or opposition to international trade liberalization - another case where the average respondent is typically not well-informed. Respondents exposed to a brief anti-trade introduction were 17% less likely to support free trade.
Certainly, the question that Fair et. al. use is not explicitly presenting respondents with a pro-drone viewpoint. However, it is still very likely that framing effects are skewing the responses. Consider another question asked by Pew that is simpler and more direct
Do you think these drone attacks are a very good thing, good thing, bad thing, or very bad thing?1% said "very good", 1% said "good", 23% said "bad" and 74% said "very bad."
I'm not arguing that this question is a superior measure. For example, it may not be measuring approval of drone strikes themselves, but rather a general sentiment toward the US (among the heavily male/internet-savvy subset interviewed). It may overstate "true" opposition, suggesting discontent with the way the program has been conducted but support for the idea of drone attacks. It might even be a consequence of social desirability bias. The point is that we don't know from the data that we have. Question framing has a substantial impact on survey responses and researchers should be careful about drawing conclusions without clarifying their assumptions about what the survey questions are measuring.
The question used in the study by Fair et. al. to measure support/opposition to the US drone program is not simply asking whether Pakistanis support or oppose the US drone program. It is asking whether Pakistanis would support a hypothetical drone program coordinated with the Pakistani government. Moreover, it exclusively highlights the benefits of such a program rather than the costs. This would not be a problem if support were invariant to question wording, but this is decidedly not the case. And even with the rather favorable framing, only 17% of respondents said that they would approve of drone strikes against extremists.
So I'm a little skeptical of the claim that commentators have been grossly overestimating the level of opposition to drone strikes. I certainly would not call Pakistani opposition to the drone program a "vocal plurality." This isn't to say that improved transparency and better PR on the part of the US would do nothing to improve perceptions, but it is a very difficult task that is constrained at multiple levels. Even if the Pakistani government were to become directly involved in the drone program (very unlikely given the political consequences), the survey results suggest that it would gather the support of about 17% of the subset of Pakistanis who are currently informed about the drone program.
The most important takeaway is that we simply don't know enough from the survey data. It's just too blunt. However, it does point to issue framing and elite rhetoric as important elements of opinion-formation on drones, suggesting interesting avenues for survey experiment work in the future.
h/t to Phil Arena for tweeting the link to the Atlantic article.
Thursday, October 4, 2012
Who Said What in Last Night's Debate?
Last night's Presidential debate in Denver seems to have ended with a media-declared Romney victory. Buzzfeed's Ben Smith called it 42 minutes in, Chris Matthews was livid and Big Bird is fearing for his job.
I listened to the debate primarily audio-only and watched my twitter feed more than the actual two candidates and I got the sense that Obama performed a bit better than the immediate internet consensus, though I generally agree that Romney came off as more confident and on-message. It would be a mistake to read too much into the content of the debate (as Intrade has) and I agree with John Sides that Romney's win won't do much to move the polls.
What did interest me post-debate was what divisions between Obama and Romney could be seen in the types of words they used in their speeches. Governor Romney seemed particularly focused on the economy (as the fundamentals would suggest) and the President seemed generally aloof and unfocused on any one particular issue or line of attack, apart from a somewhat extended discussion of Medicare late in the debate. More generally, what did the debate reveal about partisan divisions in rhetoric?
I made a plot of all of the important words used in the debate (taken from ABC News' transcript) and computed a value for each one based on how likely a given word was to appear in Governor Romney's speech vs. how likely it was to appear in President Obama's speech. For more details on the method I used, see the bottom of the post. The words on the left are words that tended to be used more by the President and the words on the right are more common in Romney's speeches. I've also re-sized each word based on how often it appeared in both candidates' speech - larger words are generally more frequent.
It's a bit hard to make out a lot of the words since there are a lot of irrelevant or only incidentally partisan words clustered around the middle. I re-did the same plot two more times, first only including the words that were used more than five times overall and again with words that were used more than ten times (the most common words in the debate).
What can we take away from this?
Intuitively, the results seem to reflect clear divisions in rhetoric between the two parties, though the distinction is surprisingly less sharp than the much more heated rhetoric of the campaign would suggest. Governor Romney tended to focus more on economic issues (as expected), while President Obama focused on issues that he generally "owns" (health care and Medicaid in particular). Some partisan divides in rhetoric are evident - for example, tax policy: words like "wall street", "loophole", and "profit" and "corporation" are more frequent in President Obama's speech.
Moreover, the data seem to confirm the general takeaway that Governor Romney was smoother and more focused in his message than President Obama. Of the most frequent words, most appear to cluster either around the center or the right. Governor Romney's rhetoric was also markedly more generic than President Obama's, which reflects his newfound shift towards the center.
Substantively, the candidates spent the debate discussing the same issues and largely on the same terms. Most words, and particularly the most common ones, cluster around the center, meaning that both candidates are roughly equally likely to use them in their speeches. Apart from tax policy, neither candidate is looking to reframe an issue in a particularly unique way. Rather, each discusses the same issue (like the deficit), using very similar frames. This is just as expected - the candidates are not looking to differentiate themselves ideologically.
But despite the overarching similarities in the two candidates' rhetoric, it's interesting to note the subtle partisanship in some of the candidates' word choices. As expected, both candidates spent a lot of time talking about the "middle class," but the phrase "middle class" is almost exclusively an "Obama"-word. Governor Romney prefers to use "middle income" rather than "middle class," perhaps to avoid the more "leftist" connotations of the term "class."
Likewise, we see a rather marked difference when the two candidates discuss education - President Obama tends to focus on college while Governor Romney talks about the K-12 system.
It's also very clear that President Obama is a huge fan of using the word "folks" rather than "people."
Ultimately, though, what's really interesting is not the words that were used, but the words that were not used. The word "women" was entirely absent from the debate (giving a new meaning to the "just for men" meme that started trending on twitter). "Immigration" was also nowhere to be found. Trade received a passing mention from President Obama and the closest this debate got to "wonky" was a rather vague discussion of health care reform where neither candidate seemed entirely comfortable. For those wanting an actual debate over issues (i.e. people who have already decided who they are voting for), this, like every debate, was lacking.
Given both the role of the debates in the election process and the overall incentives facing both Obama and Romney at this point in the campaign, this is all entirely expected - campaigns are meant precisely for people who don't pay attention to campaigns.
Thoughts?
---------------------
Note on methods
I used a relatively straightforward technique to generate partisan scores for each word in the debate. After splitting the debate transcript into three separate documents for Obama, Romney and Lehrer, I removed all punctuation and capitalization from the texts along with any uninformative "stop" words (the, a, an, as, etc...) from each. I then applied a "stemming" algorithm to consolidate similar words into a single root ("regulate", "regulating", "regulation" all reduce to "regul").
I counted the number of times each word occurred in each speech, adding 0.5 to all of the zeroes (to prevent division by zero in the next step). That is, if a word appeared 5 times in one candidate's speeches and zero times in another, I treated that 0 as 0.5 words.
I then normalized the data by dividing each word count by the total number of words used by the candidate. This gives the relative frequency of each word/word stem. I converted the frequencies to odds (frequency/(1-frequency)) and for each word in the data, divided Romney's odds by Obama's odds. This generated an odds ratio, with ratios greater than 1 representing words that were more likely to be used by Romney and ratios less than 1 representing more "Obama"-oriented words. Finally, I logged the odds ratio to get a linearized variable that I plot on the x-axis. Words taking positive values (on the right) are more likely to appear in Governor Romney's speech and words taking negative values (on the left) are more likely to appear in President Obama's speech. Words clustering around 0 are equally likely to appear in both candidates' rhetoric.
Each word is re-sized by the log of its overall count in the dataset, and colored red-to-blue based on the x-axis variable.
A better (though more complex) way to visualize this sort of data was also developed by Monroe, Colaresi and Quinn (2009).
I listened to the debate primarily audio-only and watched my twitter feed more than the actual two candidates and I got the sense that Obama performed a bit better than the immediate internet consensus, though I generally agree that Romney came off as more confident and on-message. It would be a mistake to read too much into the content of the debate (as Intrade has) and I agree with John Sides that Romney's win won't do much to move the polls.
What did interest me post-debate was what divisions between Obama and Romney could be seen in the types of words they used in their speeches. Governor Romney seemed particularly focused on the economy (as the fundamentals would suggest) and the President seemed generally aloof and unfocused on any one particular issue or line of attack, apart from a somewhat extended discussion of Medicare late in the debate. More generally, what did the debate reveal about partisan divisions in rhetoric?
I made a plot of all of the important words used in the debate (taken from ABC News' transcript) and computed a value for each one based on how likely a given word was to appear in Governor Romney's speech vs. how likely it was to appear in President Obama's speech. For more details on the method I used, see the bottom of the post. The words on the left are words that tended to be used more by the President and the words on the right are more common in Romney's speeches. I've also re-sized each word based on how often it appeared in both candidates' speech - larger words are generally more frequent.
It's a bit hard to make out a lot of the words since there are a lot of irrelevant or only incidentally partisan words clustered around the middle. I re-did the same plot two more times, first only including the words that were used more than five times overall and again with words that were used more than ten times (the most common words in the debate).
What can we take away from this?
Intuitively, the results seem to reflect clear divisions in rhetoric between the two parties, though the distinction is surprisingly less sharp than the much more heated rhetoric of the campaign would suggest. Governor Romney tended to focus more on economic issues (as expected), while President Obama focused on issues that he generally "owns" (health care and Medicaid in particular). Some partisan divides in rhetoric are evident - for example, tax policy: words like "wall street", "loophole", and "profit" and "corporation" are more frequent in President Obama's speech.
Moreover, the data seem to confirm the general takeaway that Governor Romney was smoother and more focused in his message than President Obama. Of the most frequent words, most appear to cluster either around the center or the right. Governor Romney's rhetoric was also markedly more generic than President Obama's, which reflects his newfound shift towards the center.
Substantively, the candidates spent the debate discussing the same issues and largely on the same terms. Most words, and particularly the most common ones, cluster around the center, meaning that both candidates are roughly equally likely to use them in their speeches. Apart from tax policy, neither candidate is looking to reframe an issue in a particularly unique way. Rather, each discusses the same issue (like the deficit), using very similar frames. This is just as expected - the candidates are not looking to differentiate themselves ideologically.
But despite the overarching similarities in the two candidates' rhetoric, it's interesting to note the subtle partisanship in some of the candidates' word choices. As expected, both candidates spent a lot of time talking about the "middle class," but the phrase "middle class" is almost exclusively an "Obama"-word. Governor Romney prefers to use "middle income" rather than "middle class," perhaps to avoid the more "leftist" connotations of the term "class."
Likewise, we see a rather marked difference when the two candidates discuss education - President Obama tends to focus on college while Governor Romney talks about the K-12 system.
It's also very clear that President Obama is a huge fan of using the word "folks" rather than "people."
Ultimately, though, what's really interesting is not the words that were used, but the words that were not used. The word "women" was entirely absent from the debate (giving a new meaning to the "just for men" meme that started trending on twitter). "Immigration" was also nowhere to be found. Trade received a passing mention from President Obama and the closest this debate got to "wonky" was a rather vague discussion of health care reform where neither candidate seemed entirely comfortable. For those wanting an actual debate over issues (i.e. people who have already decided who they are voting for), this, like every debate, was lacking.
Given both the role of the debates in the election process and the overall incentives facing both Obama and Romney at this point in the campaign, this is all entirely expected - campaigns are meant precisely for people who don't pay attention to campaigns.
Thoughts?
---------------------
Note on methods
I used a relatively straightforward technique to generate partisan scores for each word in the debate. After splitting the debate transcript into three separate documents for Obama, Romney and Lehrer, I removed all punctuation and capitalization from the texts along with any uninformative "stop" words (the, a, an, as, etc...) from each. I then applied a "stemming" algorithm to consolidate similar words into a single root ("regulate", "regulating", "regulation" all reduce to "regul").
I counted the number of times each word occurred in each speech, adding 0.5 to all of the zeroes (to prevent division by zero in the next step). That is, if a word appeared 5 times in one candidate's speeches and zero times in another, I treated that 0 as 0.5 words.
I then normalized the data by dividing each word count by the total number of words used by the candidate. This gives the relative frequency of each word/word stem. I converted the frequencies to odds (frequency/(1-frequency)) and for each word in the data, divided Romney's odds by Obama's odds. This generated an odds ratio, with ratios greater than 1 representing words that were more likely to be used by Romney and ratios less than 1 representing more "Obama"-oriented words. Finally, I logged the odds ratio to get a linearized variable that I plot on the x-axis. Words taking positive values (on the right) are more likely to appear in Governor Romney's speech and words taking negative values (on the left) are more likely to appear in President Obama's speech. Words clustering around 0 are equally likely to appear in both candidates' rhetoric.
Each word is re-sized by the log of its overall count in the dataset, and colored red-to-blue based on the x-axis variable.
A better (though more complex) way to visualize this sort of data was also developed by Monroe, Colaresi and Quinn (2009).
Tuesday, June 26, 2012
The Fundamental Uncertainty of Science
While I have not had much time for the mini data-gathering/research projects that I usually try to post on this blog, I found the recent flurry over Professor Jacqueline Stevens' New York Times editorial "Political Scientists are Lousy Forecasters" (and the follow-up on her blog) worth commenting on a bit more.
The political science blogosphere has since responded in full-force (and snark). I agree entirely with the already stated criticisms and will try not to repeat them too much here. The editorial is at best a highly flawed and under-researched critique of quantitative political science and at worst a rather cynical endorsement of de-funding all NSF political science programs on the grounds that the NSF tends to fund studies using a methodological paradigm that Professor Stevens does not favor. I'll err on the side of the former.
But one quote from the piece did irk me quite a bit:
But putting aside that highly inaccurate picture, Professor Stevens' definition of what constitutes scientific knowledge is remarkably limiting. Prof. Stevens is staking out a very extreme position by implying that the existence of randomness - "messy realities" as she calls it - makes all attempts at quantification meaningless. She argues for a very radical version of Popper's philosophy of science, positing that any theory should be considered falsified if it is contradicted by a single counter-example. It's unfortunate that Prof. Stevens glosses over the extensive philosophical debate that has followed in the eight-or-so decades after Popper, but this is inevitable given the space of a typical NYT column. Nevertheless, it is very disappointing that the OpEd gives the impression that Popperian falibilism is the gold standard of scientific method and the philosophy of science, when in fact, the scientific community has moved far beyond such a strict standard for what constitutes knowledge. While I won't go into a full dissection of Popper, Kuhn, Lakatos, Bayesian probability theory, and so on, it suffices to say that Stevens' reading of Popper would discount not only political science, but most modern sciences. Accounting for and dealing with randomness is at the heart of what so many scientists in all disciplines do.
By rejecting the idea that probabilistic hypotheses could be considered "scientific," Professor Stevens is perpetuating another caricature - one of science as a bastion of certitude. It's a depiction that resonates well with the popular image of science, but it is far from the truth. I'm reminded of a quote by Irish comedian Dara O Briain:
Or to take an example from the recent blog debates about the value of election forecasting models. Just because Douglas Hibbs' "Bread and Peace" model (among other Presidential election models) does not perfectly predict President Obama's vote percentage in November, does not mean that we can learn nothing from it. One of the most valuable contributions of this literature is that systemic factors like the economy are significantly more relevant to the final outcome than the day-to-day "horserace" of political pundits.
What should be said, then, about Prof. Stevens concluding suggestion that NSF funds be allocated by lottery rather than by a rigorous screening process? Such an argument could only be justified if there were no objective means to distinguish what is and is not "scientific" research. If the criteria for what passes for real political science is simply the consensus of one group of elites, then from the standpoint of "knowledge," there is no difference between peer review and random allocation. This in fact would be the argument made that Thomas Kuhn, Popper's philosophical adversary, made about all science. But while Kuhn's criticism of a truly "objective" science was a useful corrective to 20th century scientific hubris, it too goes too far in this case, justifying an anything goes attitude towards scientific knowledge that is all too dangerous. Penn State Literature Professor Michael Bérubé' wrote a rather interesting article on this exact topic as applied to science at-large, noting the worrying congruence of the highly subjectivist approach to "science studies" adopted by some in leftist academia and the anti-science rhetoric of the far-right.
The political science blogosphere has since responded in full-force (and snark). I agree entirely with the already stated criticisms and will try not to repeat them too much here. The editorial is at best a highly flawed and under-researched critique of quantitative political science and at worst a rather cynical endorsement of de-funding all NSF political science programs on the grounds that the NSF tends to fund studies using a methodological paradigm that Professor Stevens does not favor. I'll err on the side of the former.
But one quote from the piece did irk me quite a bit:
...the government — disproportionately — supports research that is amenable to statistical analyses and models even though everyone knows the clean equations mask messy realities that contrived data sets and assumptions don’t, and can’t, capture. (emphasis mine)This statement is on-face contradictory. The entire point of statistical analysis is that we are uncertain about the world. That's why statisticians use confidence levels and significance tests. The existence of randomness does not make all attempts at analyzing data meaningless, it just means that there is always some inconclusiveness to the findings that scientists make. We speak of degrees of certainty. Those who use statistical methods to analyze data are pretty clear that none of their conclusions are capital-T truths and the best political science tends to refrain from any absolute statements. Indeed, this is a reason for why a gap tends to exist between the political science and the policymaking communities. Those who enact policy want exact and determinate guidance while political scientists are cautious about making such absolute and declarative statements. It is depressing to see these sorts of caricatures of quantitative methods being used to denounce the entire field. Simply put, just because physics is very quantitative and physics appears to describe very clean and determinate relationships does not mean that all uses of math in social science result in only simple, exact and absolutely certain conclusions.
But putting aside that highly inaccurate picture, Professor Stevens' definition of what constitutes scientific knowledge is remarkably limiting. Prof. Stevens is staking out a very extreme position by implying that the existence of randomness - "messy realities" as she calls it - makes all attempts at quantification meaningless. She argues for a very radical version of Popper's philosophy of science, positing that any theory should be considered falsified if it is contradicted by a single counter-example. It's unfortunate that Prof. Stevens glosses over the extensive philosophical debate that has followed in the eight-or-so decades after Popper, but this is inevitable given the space of a typical NYT column. Nevertheless, it is very disappointing that the OpEd gives the impression that Popperian falibilism is the gold standard of scientific method and the philosophy of science, when in fact, the scientific community has moved far beyond such a strict standard for what constitutes knowledge. While I won't go into a full dissection of Popper, Kuhn, Lakatos, Bayesian probability theory, and so on, it suffices to say that Stevens' reading of Popper would discount not only political science, but most modern sciences. Accounting for and dealing with randomness is at the heart of what so many scientists in all disciplines do.
By rejecting the idea that probabilistic hypotheses could be considered "scientific," Professor Stevens is perpetuating another caricature - one of science as a bastion of certitude. It's a depiction that resonates well with the popular image of science, but it is far from the truth. I'm reminded of a quote by Irish comedian Dara O Briain:
"Science knows it doesn't know everything, otherwise, it would stop."All science is fundamentally about uncertainty and ignorance. Knowledge is always partial and incomplete. There was actually an interesting interview with neuroscientist Stuart Firestein on NPR's Science Friday on this topic a few weeks back, where he offered this valuable quote:
...the answers that count - not that answers and facts aren't important in science, of course - but the ones that we want, the ones that we care about the most, are the ones that create newer and better questions because it's really the questions that it's about.Ultimately, I would argue that probabilistic hypotheses in the social sciences still have scientific value. Events tend to have multiple causes and endogeneity is an ever-present problem. This does not automatically make systematic, scientific and quantitative, inquiry into social phenomena a futile endeavor. Making perfect predictions the standard for what is "science" would dramatically constrain the sphere scientific research. (See Jay Ulfelder's post for more on predictions). Climate scientists constantly debate the internal mechanics of their models of global warming - some predict faster rates, some slower. Does this mean that the underlying relationships described by those models (such as between CO2 concentration and temperature) should be ignored because the research is too "unsettled"? While deniers of climate change would argue yes, the answer here is a definite no.
Or to take an example from the recent blog debates about the value of election forecasting models. Just because Douglas Hibbs' "Bread and Peace" model (among other Presidential election models) does not perfectly predict President Obama's vote percentage in November, does not mean that we can learn nothing from it. One of the most valuable contributions of this literature is that systemic factors like the economy are significantly more relevant to the final outcome than the day-to-day "horserace" of political pundits.
What should be said, then, about Prof. Stevens concluding suggestion that NSF funds be allocated by lottery rather than by a rigorous screening process? Such an argument could only be justified if there were no objective means to distinguish what is and is not "scientific" research. If the criteria for what passes for real political science is simply the consensus of one group of elites, then from the standpoint of "knowledge," there is no difference between peer review and random allocation. This in fact would be the argument made that Thomas Kuhn, Popper's philosophical adversary, made about all science. But while Kuhn's criticism of a truly "objective" science was a useful corrective to 20th century scientific hubris, it too goes too far in this case, justifying an anything goes attitude towards scientific knowledge that is all too dangerous. Penn State Literature Professor Michael Bérubé' wrote a rather interesting article on this exact topic as applied to science at-large, noting the worrying congruence of the highly subjectivist approach to "science studies" adopted by some in leftist academia and the anti-science rhetoric of the far-right.
But now the climate-change deniers and the young-Earth creationists are coming after the natural scientists, just as I predicted–and they’re using some of the very arguments developed by an academic left that thought it was speaking only to people of like mind. Some standard left arguments, combined with the left-populist distrust of “experts” and “professionals” and assorted high-and-mighty muckety-mucks who think they’re the boss of us, were fashioned by the right into a powerful device for delegitimating scientific research. For example, when Andrew Ross asked in Strange Weather, “How can metaphysical life theories and explanations taken seriously by millions be ignored or excluded by a small group of powerful people called ‘scientists’?,” everyone was supposed to understand that he was referring to alternative medicine, and that his critique of “scientists” was meant to bring power to the people. The countercultural account of “metaphysical life theories” that gives people a sense of dignity in the face of scientific authority sounds good–until one substitutes “astrology” or “homeopathy” or “creationism” (all of which are certainly taken seriously by millions) in its place.
The right’s attacks on climate science, mobilizing a public distrust of scientific expertise, eventually led science-studies theorist Bruno Latour to write in Critical Inquiry:
Entire Ph.D. programs are still running to make sure that good American kids are learning the hard way that facts are made up, that there is no such thing as natural, unmediated, unbiased access to truth…while dangerous extremists are using the very same argument of social construction to destroy hard-won evidence that could save our lives. Was I wrong to participate in the invention of this field known as science studies? Is it enough to say that we did not really mean what we meant? Why does it burn my tongue to say that global warming is a fact whether you like it or not?
Why can’t I simply say that the argument is closed for good? Why, indeed? Why not say, definitively, that anthropogenic climate change is real, that vaccines do not cause autism, that the Earth revolves around the Sun, and that Adam and Eve did not ride dinosaurs to church?In the end, Bérubé calls for some sort of commensurability between the humanities and sciences, and I think this kind of coming together that is actually becoming the norm in political science academia, particularly as political theorists and quantitative political scientists still tend to fall under the same departmental umbrella:
So these days, when I talk to my scientist friends, I offer them a deal. I say: I’ll admit that you were right about the potential for science studies to go horribly wrong and give fuel to deeply ignorant and/or reactionary people. And in return, you’ll admit that I was right about the culture wars, and right that the natural sciences would not be held harmless from the right-wing noise machine. And if you’ll go further, and acknowledge that some circumspect, well-informed critiques of actually existing science have merit (such as the criticism that the postwar medicalization of pregnancy and childbirth had some ill effects), I’ll go further too, and acknowledge that many humanists’ critiques of science and reason are neither circumspect nor well-informed. Then perhaps we can get down to the business of how to develop safe, sustainable energy and other social practices that will keep the planet habitable.
Tuesday, June 5, 2012
Is There a Role for International Institutions in Regulating "Cyberweapons"?
David Sanger's extensive New York Times piece about the United States and Israel's covert cyberwarfare operations on Iran's nuclear facilities is the first article I've seen that explicitly confirms the two countries' involvement in Stuxnet's development. But this revelation isn't particularly surprising. Given the virus' complexity and purpose, the list of possible developers was rather short. Rather, what I found most interesting was this section towards the end:
But the good luck did not last. In the summer of 2010, shortly after a new variant of the worm had been sent into Natanz, it became clear that the worm, which was never supposed to leave the Natanz machines, had broken free, like a zoo animal that found the keys to the cage. It fell to Mr. Panetta and two other crucial players in Olympic Games — General Cartwright, the vice chairman of the Joint Chiefs of Staff, and Michael J. Morell, the deputy director of the C.I.A. — to break the news to Mr. Obama and Mr. Biden.
An error in the code, they said, had led it to spread to an engineer’s computer when it was hooked up to the centrifuges. When the engineer left Natanz and connected the computer to the Internet, the American- and Israeli-made bug failed to recognize that its environment had changed. It began replicating itself all around the world. Suddenly, the code was exposed, though its intent would not be clear, at least to ordinary computer users.
...
The question facing Mr. Obama was whether the rest of Olympic Games was in jeopardy, now that a variant of the bug was replicating itself “in the wild,” where computer security experts can dissect it and figure out its purpose.
“I don’t think we have enough information,” Mr. Obama told the group that day, according to the officials. But in the meantime, he ordered that the cyberattacks continue. They were his best hope of disrupting the Iranian nuclear program unless economic sanctions began to bite harder and reduced Iran’s oil revenues.
Within a week, another version of the bug brought down just under 1,000 centrifuges. Olympic Games was still on.The excerpt highlights one of the unique and troubling aspects of "cyberweapons" - their use against adversaries permits their proliferation. Despite all of the effort at keeping Stuxnet both hidden and narrowly tailored, the virus escaped into the public and its source code is open to be analyzed by pretty much everyone. While competent coding can make it difficult to reverse engineer and re-deploy the virus against other targets without a significant investment of time and resources, it's still a distinct possibility. Cyberweapons create externalities - side-effects that don't directly affect the militaries using them, but can have spill-over consequences on other sectors of society. For example, a SCADA worm like Stuxnet which targets industrial control systems could theoretically be re-targeted at civilian infrastructure like power or manufacturing plants.
Certainly most governments using cyberwarfare will likely want to limit these externalities since they do create an indirect threat (such as non-state actor attacks on critical infrastructure). This is evidenced by the fact that the U.S. and Israel not only tried to designed Stuxnet and its ilk to be difficult to detect, but also to have very tailored aims. The virus was designed to work on the specific reactor designs possessed by Iran, thereby somewhat limiting the initial damage of a leak (imagine what would have happened were Stuxnet to deploy its "payload" on all computer systems that it lands on). Nevertheless, these externalities exist so long as governments with the capacity to do so continue to use cyber espionage and attacks. The logic of collective action suggests that governments are also unlikely to unilaterally refrain altogether from utilizing these technologies since a blanket ban would be both impossible and entirely unverifiable due to the dual-use nature of the weapons.
This got me thinking a bit about what sorts of institutions could help mitigate some of the consequences of "leaks". Proposals for an international cyberweapons convention have been thrown around, but most have been very vague and poorly defined. Kaspersky Labs founder Evgeny Kaspersky recently suggested a treaty along the lines of the Biological Weapons Convention or the Nuclear Non-proliferation Treaty (the Russian government has also floated similar proposals). However, an out-right ban on "cyberweapons" would be highly unlikely and generally impractical. As I mentioned, verifying compliance would be substantially more difficult than it has been for either the BWC or the NPT. Given that both have been violated by a number of states party to them via clandestine programs, a "cyberweapons" ban would be toothless, even if it only banned particular types of attacks (such as those on SCADA systems). Moreover, states find cyber-capabilities significantly more versatile and useful than either biological or nuclear weapons. The category of "cyberweapon" is broad enough to include highly-developed viral sabotage (Stuxnet) to simple distributed denial of service (DDOS) and these sorts of technologies are useful not only to militaries, but also to intelligence services. Finally, the dual-use nature of information technology and its globalization make locking-in a "cyberwarfare oligopoly" a-la the nuclear monopoly of the NPT near-impossible. The "haves" cannot credibly promise disarmament to the "have-nots" and the "have-nots" face significantly lower barriers to developing basic cyber-espionage or warfare capabilities.
Subscribe to:
Posts (Atom)













