Lexical Analysis of 2012 Presidential Debates — Obama vs Romney Martin Krzywinski projects contact

Lexical Analysis of 2012 Presidential Debates — Obama vs Romney

Introduction

Analysis of Debates

Obama vs Romney (1st debate) 3 Oct 2012
Obama vs Romney (2nd debate) 16 Oct 2012
Obama vs Romney (3nd debate) 22 Oct 2012
Biden vs Ryan, 11 Oct 2012

Extended Analysis

Obama vs Romney (combined debates)
Obama in 2008 vs 2012 (1st debate)

2012 Debate Resources

What Romney's and Obama's Body Language Says to Voters. Watch them cut, point and tilt-and-nod.

2008 Debate Resources

2008 Debate lexical analysis

He counts your words (even those pronouns), an article in the NYT about Pennebaker's approach to analysis of debates and Al Qaeda communication

Presidential word use in State of the Union addresses by Jonathan Corum.

Naming Names, a NYT article about candidates' reference to each other during debates (uses Circos)

Lexical Analysis of Obama's and McCain's Speeches by Jacques Savoy

This analysis explores word usage and lexical content of the 2012 US Presidential and Vice-Presidential debates. It is based on the same approach I used to analyze the 2008 debates.

The purpose is to explore the structure of speech, as characterized by the use of nouns, verbs, adjectives and adverbs, pronouns and noun phrases. The speech patterns of opposing candidates are compared in an effort to identify priorities, perspectives, characteristic values and personality traits.

I analyze the debate for the following

A formal debate serves as a great text for this kind of analysis. The format is somewhat controlled: each speaker is subjected to the same stimulus (question) and is given the same amount of time to respond. Reduced is the variation that would appear in analysis of interviews and other unscripted speech.

Methods

The transcript for each debate is parsed to identify the speaker, tag stop words with their part of speech (tagging), and identify noun phrases (chunking).

The tagged and chunked transcripts are analyzed to determine

I attempt to quantify the overall complexity of speech by a metric I call the Windbag Index, which is a product of 8 terms each measuring uniqueness in different aspects of speech (more about Windbag Index).

A full description of each of the steps in the analysis is available in the detailed methods section.

The analysis has some limitations.

Results and Commentary

Detailed results and comments are available for each debate.

Analysis of Barack Obama vs Mitt Romney (1st debate)

Analysis of Barack Obama vs Mitt Romney (2nd debate)

Analysis of Barack Obama vs Mitt Romney (3nd debate)

Analysis of Joe Biden vs Paul Ryan

Analysis of Barack Obama vs Mitt Romney (combined debates)

Analysis of Barack Obama (2008 vs 2012)

Each debate analysis report contains a great deal of data. Every debate report is shown in exactly the same format, which should help you with making comparisons. To start, you may find these elements the most interesting

Visualizing the Debates

tables & basic word clouds

^ Word usage tables describe the structural characteristics of speech by frequency of words, sentence size, proportion of unique and exclusive words and breakdown of words by part-of-speech • see example
^ Word clouds for each candidate, categorized by parts of speech. Obama promises "folks" "opportunity" • see example
^ Word clouds, categorized by ownership. Romney loves using "middle-income" • see example
^ Word clouds for concepts based on part-of-speech pairs. Obama focuses on "middle-class families" and "small business", to Romney's "federal tax". • see example

word clouds with Wordle


^ Exclusive words for Romney, categorized by part-of-speech • goto table.

You can generate Wordles directly from most data tables.

The word clouds shown above and included in each analysis were generated with my own code. Because these images are static, I thought it would be useful to provide a means for you to tweak your own versions.


Candidates's Lexical Profiles

Word Usage Summary

Below are two summary tables from the full analysis of the first debate.

Table 1
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
speaker number of sentences sentence size
all stop non-stop
Barack Obama
391
391
18.6 26 50
18.61926.00050.000
10.6 15 29
10.63615.00029.000
8.4 12 23
8.36012.00023.000
Mitt Romney
579
579
13.5 18 37
13.45618.00037.000
7.8 10 22
7.77010.00022.000
6.0 8 18
6.0448.00018.000
total
970
970
17.5 23 46
17.53723.00046.000
10.9 13 28
10.94213.00028.000
9.0 11 22
8.96911.00022.000

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links. See analysis.

Table 2
part of speech count
Count of words categorized by part of speech (POS).
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
2,949 1,061
40.5% 36.0%
9045434813562592529262
1,447 543
49.1% 37.5%
904543
837 356
28.4% 42.5%
481356
511 252
17.3% 49.3%
259252
154 62
5.2% 40.3%
9262
Mitt Romney
3,112 1,025
39.9% 32.9%
10305504833302992739255
1,580 550
50.8% 34.8%
1030550
813 330
26.1% 40.6%
483330
572 273
18.4% 47.7%
299273
147 55
4.7% 37.4%
9255
total
6,061 1,600
40.2% 26.4%
2182845109555567141221586
3,027 845
49.9% 27.9%
2182845
1,650 555
27.2% 33.6%
1095555
1,083 412
17.9% 38.0%
671412
301 86
5.0% 28.6%
21586

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links. See analysis.

Windbag Index

^ The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts (details).

Word Clouds

Word clouds below are colored by part of speech:   noun   verb   adjective   adverb  

^ Words exclusive to Barack Obama (not spoken by Romney) in the first debate, colored by part of speech. Note the repeated use of "folks" and "opportunity".
^ Words exclusive to Mitt Romney (not spoken by Obama) in the first debate, colored by part of speech: "always", "lose" and "hurt". Ouch.
^ All nouns in debates, colored by contributing speaker (green = Obama, blue = Romney, grey = spoken by both).
^ All verbs in debates, colored by contributing speaker (green = Obama, blue = Romney, grey = spoken by both).

Discussion

...to be added

Downloads

Content of word list archive and data structure syntax is described in the methods section.

Barack Obama vs Mitt Romney (1st debate) transcript word lists tag clouds data structure

Barack Obama vs Mitt Romney (2nd debate) transcript word lists tag clouds data structure

Barack Obama vs Mitt Romney (3nd debate) transcript word lists tag clouds data structure

Joe Biden vs Paul Ryan transcript word lists tag clouds data structure

Barack Obama vs Mitt Romney (combined debates) transcript word lists tag clouds data structure

Barack Obama (2008 vs 2012) transcript word lists tag clouds data structure

updates

13 Oct 2012, 4:09pm. Exploring expanding the analysis to include details about pronoun use.

11 Oct 2012, 9:23pm. Vice-presidential debate analysis complete. Commentary will be completed tomorrow.

11 Oct 2012, 6:23pm. Analysis of currently airing vice-presidential debate coming later tonight.

10 Oct 2012, 4:04pm. Added a comparison of Obama's 2008 vs 2012 performance.

9 Oct 2012, 5:34pm. Analysis pipeline has been redesigned. First debate results are complete.

5 Oct 2012, 10:39am. Working to add file and wordle creation popups to values in analysis tables.

4 Oct 2012, 5:05pm. I have rewritten the word tag code to snuggle the words more like Wordle. I may have yelled little at times.

3 Oct 2012, 11:52pm. The initial analysis for the first debate is complete. Obama maintains hope and promises "folks" lots of "opportunity", reminding the electorate of "consequence" and being personable with "grandmother". Romney tries to connect to "middle-income" segment and fearmongers with "kill", "hurt" and "lose" and demonstrates geopolitical awareness with "China" and "Spain".

3 Oct 2012, 8:20pm. I have downloaded the first debate transcript from NPR and am analyzing it now.