Lexical Analysis of 2012 Presidential Debates — Obama vs Romney
This analysis explores word usage and lexical content of the 2012
US Presidential and Vice-Presidential debates. It is based on the same
approach I used to analyze the 2008 debates.
The purpose is to explore the structure of speech, as characterized
by the use of nouns, verbs, adjectives and adverbs, pronouns and noun
phrases. The speech patterns of opposing candidates are compared in an
effort to identify priorities, perspectives, characteristic values and personality traits.
I analyze the debate for the following
- • word frequency and distribution for different parts of speech
- • words exclusive to a candidate, and those shared by both candidates
- • complexity of noun phrases, which relate to independent concepts
- • a general measure of complexity and repetition in speech, nicknamed the Windbag Index.
A formal debate serves as a great text for this kind of
analysis. The format is somewhat controlled: each speaker is subjected
to the same stimulus (question) and is given the same amount of time
to respond. Reduced is the variation that would appear in analysis of
interviews and other unscripted speech.
The transcript for each debate is parsed to identify the speaker, tag stop words with their part of speech (tagging), and identify noun phrases (chunking).
The tagged and chunked transcripts are analyzed to determine
- • word frequency distribution for each candidate
- • sentence size and proportion of unique words
- • words exclusive to a candidate and those shared by both candidates
- • frequency of concepts, as defined by part of speech pairings (e.g. noun/verb)
- • complexity of noun phrases
- • word clouds for a variety of word lists extracted from the transcripts (e.g. all nouns unique to Obama)
I attempt to quantify the overall complexity of speech by a metric
I call the Windbag Index, which is a product of 8 terms
each measuring uniqueness in different aspects of speech (more about Windbag Index).
A full description of each of the steps in the analysis is
available in the detailed methods section.
The analysis has some limitations.
Results and Commentary
Detailed results and comments are available for each debate.
Analysis of Barack Obama vs Mitt Romney (1st debate)
Analysis of Barack Obama vs Mitt Romney (2nd debate)
Analysis of Barack Obama vs Mitt Romney (3nd debate)
Analysis of Joe Biden vs Paul Ryan
Analysis of Barack Obama vs Mitt Romney (combined debates)
Analysis of Barack Obama (2008 vs 2012)
Each debate analysis report contains a great deal of data. Every debate report is shown in exactly the same format, which should help you with making comparisons. To start, you may find these elements the most interesting
Visualizing the Debates
tables & basic word clouds
^ Word usage tables describe the structural characteristics of speech by frequency of words, sentence size, proportion of unique and exclusive words and breakdown of words by part-of-speech • see example
^ Word clouds for each candidate, categorized by parts of speech. Obama promises "folks" "opportunity" • see example
^ Word clouds, categorized by ownership. Romney loves using "middle-income" • see example
^ Word clouds for concepts based on part-of-speech pairs. Obama focuses on "middle-class families" and "small business", to Romney's "federal tax". • see example
word clouds with Wordle
^ Exclusive words for Romney, categorized by part-of-speech • goto table
You can generate Wordles directly from most data tables.
The word clouds shown above and included in each analysis were generated with my own code. Because these images are static, I thought it would be useful to provide a means for you to tweak your own versions.
Candidates's Lexical Profiles
Word Usage Summary
Below are two summary tables from the full analysis of the first debate.
^ The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts (details
Word clouds below are colored by part of speech:
^ Words exclusive to Barack Obama (not spoken by Romney) in the first debate, colored by part of speech. Note the repeated use of "folks" and "opportunity".
^ Words exclusive to Mitt Romney (not spoken by Obama) in the first debate, colored by part of speech: "always", "lose" and "hurt". Ouch.
^ All nouns in debates, colored by contributing speaker (green = Obama, blue = Romney, grey = spoken by both).
^ All verbs in debates, colored by contributing speaker (green = Obama, blue = Romney, grey = spoken by both).
...to be added
Content of word list archive and data structure syntax is described in the methods section.
Barack Obama vs Mitt Romney (1st debate) transcript word lists tag clouds data structure
Barack Obama vs Mitt Romney (2nd debate) transcript word lists tag clouds data structure
Barack Obama vs Mitt Romney (3nd debate) transcript word lists tag clouds data structure
Joe Biden vs Paul Ryan transcript word lists tag clouds data structure
Barack Obama vs Mitt Romney (combined debates) transcript word lists tag clouds data structure
Barack Obama (2008 vs 2012) transcript word lists tag clouds data structure
13 Oct 2012, 4:09pm. Exploring expanding the analysis to include details about pronoun use.
11 Oct 2012, 9:23pm. Vice-presidential debate analysis complete. Commentary will be completed tomorrow.
11 Oct 2012, 6:23pm. Analysis of currently airing vice-presidential debate coming later tonight.
10 Oct 2012, 4:04pm. Added a comparison of Obama's 2008 vs 2012 performance.
9 Oct 2012, 5:34pm. Analysis pipeline has been redesigned. First debate results are complete.
5 Oct 2012, 10:39am. Working to add file and wordle creation popups to values in analysis tables.
4 Oct 2012, 5:05pm. I have rewritten the word tag code to snuggle the words more like Wordle. I may have yelled little at times.
3 Oct 2012, 11:52pm. The initial analysis for the first debate is complete. Obama maintains hope and promises "folks" lots of "opportunity", reminding the electorate of "consequence" and being personable with "grandmother". Romney tries to connect to "middle-income" segment and fearmongers with "kill", "hurt" and "lose" and demonstrates geopolitical awareness with "China" and "Spain".
3 Oct 2012, 8:20pm. I have downloaded the first debate transcript from NPR and am analyzing it now.