Thursday, March 22, 2012

Electoral Fraud and the Russian Presidential Election - Part 2

In the previous post, I examined some of the more basic graphical indicators of electoral fraud in the Russian presidential election.

How else can election data be analyzed for evidence of fraud? One of the most common approaches is to study the distribution of a particular digit in the results. The Guardian posted a brief article that evaluated the first digits of electoral returns using Benford's Law, which posits that numbers arising from certain natural processes will have leading digits that are distributed logarithmically (1 is more common as a leading digit than 9). While the election results for Putin don't appear to conform to Benford's law, it is unlikely that the law is a relevant metric of voter fraud. Since precinct or region-level returns do not encompass enough orders of magnitude, the method is a poor indicator. However, Walter Mebane has done extensive work applying Benford's law to the distributions of the second digit in electoral data and this method may be more fruitful in detecting fraud in Russia.

Conversely, one can examine the last digits of the election returns. Bernd Beber and Alexandra Scacco used such an approach to reveal likely electoral malfeasance in Nigeria and in the 2009 Iranian presidential election. They posit that, in a "clean" election, the final digit in the raw vote or turnout counts at the precinct-level should be uniformly distributed. Since a single vote is inconsequential in deciding an electoral outcome, the last digit is essentially an error term (the full, more complicated, proof is in the first link above). However, if electoral results are tampered with and the result sheets are being filled in arbitrarily, the distribution of last digits may deviate from uniformity. This is because humans tend to be terrible at generating truly random sequences of numbers. Beber and Scacco cite a number of studies of that suggest cognitive biases toward smaller over larger numbers, avoidance of repetitive sequences (like 333), and preference for adjacent numbers. Comparing the results of the Swedish parliamentary elections to electoral data from Nigeria's Plateau state, they find strong uniformity in the former and significant deviations in the latter.

I applied Beber and Scacco's method to election returns from Russia, looking specifically at the last digit distribution in reported numbers of registered voters at the precinct and district levels. If election officials are not out-right fabricating candidates' vote totals, but instead votes are being inflated via ballot-stuffing, then by necessity, registered voter counts still would need to be altered slightly in order to accommodate these “artificial” ballots. In order to avoid impossible and embarrassing reports of greater than 100% turnout, some fudging of the numbers might be needed.

A quick aside on terminology/method. The Russian election commission reports results aggregated at three levels – the republic/province level (equivalent to states), the “sub-republic” level (essentially, city/county subdivisions in each province) and at the precinct level (with data from each local polling center or uchastkovaya izbiratel'naya komissia (UIK)). I use the “sub-republic” data for the Russia-wide test and precinct level data in testing individual provinces. I also exclude any turnout figure that has less than 3 digits to ensure that the last digit is sufficiently "irrelevant".

First, at the national level, there is some evidence that the last digits for registered voter counts do not follow a uniform distribution. The graph below shows that the data contain significantly more 2s than expected (outside the 95% confidence bound). Additionally, a chi-squared test returns a p-value of .029, suggesting statistically significant (at alpha = .05) deviation from uniformity.



Is there variation across regions? Anecdotal evidence suggests so. The most egregious reports of fraud tend to come from “peripheral” regions, particularly Chechnya and Dagestan which consistently report absurd levels of support for Putin/Medvedev and United Russia. Reports from Moscow and St. Petersburg (the centers of the protest movement) tend to be more subdued. Indeed, Moscow City was the only region where Putin was only able to obtain a plurality of the vote as opposed to a majority.

Conducting the last-digit test on registered voting data from each polling-place in these four regions seems to confirm that fraud levels vary significantly within Russia. The graphs below suggest that neither Moscow nor St. Petersburg show any significant deviation from uniformity. Chi-squared tests for both regions are also not statistically significant.


Chechnya, where Putin received 99% of the total vote, and Dagestan, where Putin's numbers were slightly lower (93%), tell a different story. Both show dramatic deviation from uniformity with a tendency to emphasize lower numbers, particularly zero and five. Chi-squared tests for both are also significant at the 1% level.



Obviously this is very cursory analysis, but it does suggest that the last-digit method is a pretty good tool for finding hints of fraud in raw election returns. Any thoughts?

Thanks to Bernd Beber for making available the R code for running the last-digit tests and generating the graphs.

Saturday, March 10, 2012

Electoral Fraud and the Russian Presidential Election - Part 1

To no one's surprise, Russian President Prime Minister President-elect Vladimir Putin won last week's election with a sizable (reported) 63.6% of the vote.

As with pretty much any Russian election over the past decade, evidence of electoral fraud has begun to surface. Reports of "carousel voting," paid voters being shuttled by bus to vote at multiple polling stations, ballot stuffing, and an impossible figure of "107% turnout" in a Chechen precinct all suggest some degree of manipulation. Was this fraud then systematic or idiosyncratic? In the wake of the 2011 Duma elections, many Russian bloggers used statistical analysis techniques to uncover strange patterns in the reported results for United Russia. Scott Gelbach posted an English summary of some of these findings. Do the same observations hold for the presidential election?

I gathered the precinct-level data reported by the Russian Election Commission and looked at the three main "problematic" distributions. The first is vote shares across precincts:



The distribution is certainly less skewed than it was for United Russia, which can be attributed to the fact that the genuine popularity of Putin compared to his party decreased the necessity of much falsification. Nevertheless, one again sees the distribution suspiciously widen at the right end and the existence of a significant number of precincts where essentially everyone voted for Putin is likewise odd, particularly given the anecdotes from regions like Chechnya. However, the non-normality of this distribution is not necessarily conclusive evidence of fraud. It does, however, illustrate a heavily skewed and non-competitive electoral system. More odd are the spikes in the precinct counts at what appear to be the round numbers and simple fractions in the 60 to 80 percent range. Gelbach notes a similar phenomenon in the results for United Russia, though it is certainly less pronounced here

What of the distribution of turnout*? Gelbach argues that this should be roughly normal "to the extent that voters are making idiosyncratic decisions about whether to vote rather than do something else." Yet again, one sees an upward sloping tail on the right end and a huge spike at 100.



Grouping the turnout data into smaller intervals, one sees some spikiness at crucial benchmarks, though it is less pronounced compared to the results from December:


Finally, turnout and percentage vote for Putin are highly correlated:



As with United Russia's voting percentage, Putin's results are strongly associated with turnout. As a number of people have pointed out, this does not necessarily indicate fraud. For example, a strong GOTV campaign may mean that a candidate tends to get more voters as turnout increases. Yet as Gelbach noted: "the magnitude of the relationship in Russia is such that United Russia is scooping up essentially all of the marginal votes over a certain level." In the case of Putin's results, the relationship is not as strong (again, owing to the fact that Putin's actual level of popularity is still relatively high). Nevertheless, the correlation at the upper levels of turnout is such that it's difficult to conclude that manipulation was insignificant.

Again, the patterns in the electoral results are suggestive of some fraud, but the degree appears to be lower than in December. This may be partly due to the decreased necessity of boosting Putin's results. Indeed, the curious decision to install web-cams at all polling places may be evidence that the Kremlin, knowing that the incumbent would win handily, wanted to keep overt reports of fraud to a minimum. As Josh Tucker commented, if the election was meant as a signal to the public that Putin remains popular, compromising footage of ballot stuffing and ham-handed manipulation would weaken the message. So while there is a disincentive to commit visible fraud, there is still a logic behind committing less clearly observable fraud (Andrew Little wrote a good post recently on this point). So the statistical evidence combined with anecdotal reports from observers strongly suggests that systematic cheating, while much less blatant than in the Duma elections, likely occurred.

Part 2 of this post will apply some more advanced statistical techniques to examine the variation in fraud levels across the different regions.


*I compute turnout as (Number of Valid Votes + Number of Invalid Votes)/(Number of Registered Voters). The Russian electoral commission site does not give a clear percentage figure of turnout.