Thursday, October 4, 2012

Who Said What in Last Night's Debate?

Last night's Presidential debate in Denver seems to have ended with a media-declared Romney victory. Buzzfeed's Ben Smith called it 42 minutes in, Chris Matthews was livid and Big Bird is fearing for his job.

I listened to the debate primarily audio-only and watched my twitter feed more than the actual two candidates and I got the sense that Obama performed a bit better than the immediate internet consensus, though I generally agree that Romney came off as more confident and on-message. It would be a mistake to read too much into the content of the debate (as Intrade has) and I agree with John Sides that Romney's win won't do much to move the polls.

What did interest me post-debate was what divisions between Obama and Romney could be seen in the types of words they used in their speeches. Governor Romney seemed particularly focused on the economy (as the fundamentals would suggest) and the President seemed generally aloof and unfocused on any one particular issue or line of attack, apart from a somewhat extended discussion of Medicare late in the debate. More generally, what did the debate reveal about partisan divisions in rhetoric?

I made a plot of all of the important words used in the debate (taken from ABC News' transcript) and computed a value for each one based on how likely a given word was to appear in Governor Romney's speech vs. how likely it was to appear in President Obama's speech. For more details on the method I used, see the bottom of the post. The words on the left are words that tended to be used more by the President and the words on the right are more common in Romney's speeches. I've also re-sized each word based on how often it appeared in both candidates' speech - larger words are generally more frequent.

It's a bit hard to make out a lot of the words since there are a lot of irrelevant or only incidentally partisan words clustered around the middle. I re-did the same plot two more times, first only including the words that were used more than five times overall and again with words that were used more than ten times (the most common words in the debate).

What can we take away from this?

Intuitively, the results seem to reflect clear divisions in rhetoric between the two parties, though the distinction is surprisingly less sharp than the much more heated rhetoric of the campaign would suggest. Governor Romney tended to focus more on economic issues (as expected), while President Obama focused on issues that he generally "owns" (health care and Medicaid in particular). Some partisan divides in rhetoric are evident - for example, tax policy: words like "wall street", "loophole", and "profit" and "corporation" are more frequent in President Obama's speech.

Moreover, the data seem to confirm the general takeaway that Governor Romney was smoother and more focused in his message than President Obama. Of the most frequent words, most appear to cluster either around the center or the right. Governor Romney's rhetoric was also markedly more generic than President Obama's, which reflects his newfound shift towards the center.

Substantively, the candidates spent the debate discussing the same issues and largely on the same terms. Most words, and particularly the most common ones, cluster around the center, meaning that both candidates are roughly equally likely to use them in their speeches. Apart from tax policy, neither candidate is looking to reframe an issue in a particularly unique way. Rather, each discusses the same issue (like the deficit), using very similar frames. This is just as expected - the candidates are not looking to differentiate themselves ideologically.

But despite the overarching similarities in the two candidates' rhetoric, it's interesting to note the subtle partisanship in some of the candidates' word choices. As expected, both candidates spent a lot of time talking about the "middle class," but the phrase "middle class" is almost exclusively an "Obama"-word. Governor Romney prefers to use "middle income" rather than "middle class," perhaps to avoid the more "leftist" connotations of the term "class."

Likewise, we see a rather marked difference when the two candidates discuss education - President Obama tends to focus on college while Governor Romney talks about the K-12 system.

It's also very clear that President Obama is a huge fan of using the word "folks" rather than "people."

Ultimately, though, what's really interesting is not the words that were used, but the words that were not used. The word "women" was entirely absent from the debate (giving a new meaning to the "just for men" meme that started trending on twitter). "Immigration" was also nowhere to be found. Trade received a passing mention from President Obama and the closest this debate got to "wonky" was a rather vague discussion of health care reform where neither candidate seemed entirely comfortable. For those wanting an actual debate over issues (i.e. people who have already decided who they are voting for), this, like every debate, was lacking.

Given both the role of the debates in the election process and the overall incentives facing both Obama and Romney at this point in the campaign, this is all entirely expected - campaigns are meant precisely for people who don't pay attention to campaigns.


Note on methods

I used a relatively straightforward technique to generate partisan scores for each word in the debate. After splitting the debate transcript into three separate documents for Obama, Romney and Lehrer, I removed all punctuation and capitalization from the texts along with any uninformative "stop" words (the, a, an, as, etc...) from each. I then applied a "stemming" algorithm to consolidate similar words into a single root ("regulate", "regulating", "regulation" all reduce to "regul"). 

I counted the number of times each word occurred in each speech, adding 0.5 to all of the zeroes (to prevent division by zero in the next step). That is, if a word appeared 5 times in one candidate's speeches and zero times in another, I treated that 0 as 0.5 words.

I then normalized the data by dividing each word count by the total number of words used by the candidate. This gives the relative frequency of each word/word stem. I converted the frequencies to odds (frequency/(1-frequency)) and for each word in the data, divided Romney's odds by Obama's odds. This generated an odds ratio, with ratios greater than 1 representing words that were more likely to be used by Romney and ratios less than 1 representing more "Obama"-oriented words. Finally, I logged the odds ratio to get a linearized variable that I plot on the x-axis. Words taking positive values (on the right) are more likely to appear in Governor Romney's speech and words taking negative values (on the left) are more likely to appear in President Obama's speech. Words clustering around 0 are equally likely to appear in both candidates' rhetoric.

Each word is re-sized by the log of its overall count in the dataset, and colored red-to-blue based on the x-axis variable.

A better (though more complex) way to visualize this sort of data was also developed by Monroe, Colaresi and Quinn (2009).