Word Analysis of 2016 Presidential Debates — Clinton vs Trump by Martin Krzywinski | projects contact

Chrome no longer supports Java NPAPI. To create Wordles directly from debate analysis tables, use Firefox or Safari or Explorer.

Word Analysis of 2016 Presidential Debates — Clinton vs Trump


2016 Debate Analysis

Clinton vs Trump (1st debate) 26 Sep 2016

Clinton vs Trump (2nd debate) 9 Oct 2016

Clinton vs Trump (3rd debate) 19 Oct 2016

Clinton vs Trump (combined debate)

Kaine vs Pence 4 Oct 2016

Randomly Generated Trump Transcripts

If you want more, get more. The debate continues endlessly with Tripsum: Trump Lorem Ipsum—randomly generated text based on actual transcripts.

2012 Debate Analysis & Resources

2012 Obama vs Romney Debate lexical analysis

What Romney's and Obama's Body Language Says to Voters. Watch them cut, point and tilt-and-nod.

2008 Debate Analysis & Resources

2008 Obama vs McCain Debate lexical analysis

He counts your words (even those pronouns), an article in the NYT about Pennebaker's approach to analysis of debates and Al Qaeda communication

Lexical Analysis of Obama's and McCain's Speeches by Jacques Savoy

Other Political Debate Analyses

Presidential word use in State of the Union addresses by Jonathan Corum.

Naming Names, a NYT article about candidates' reference to each other during debates (uses Circos).

This analysis explores word usage in the 2016 US Presidential debates between Hillary Clinton and Donald Trump and the Vice-Presidential debate between Tim Kaine and Mike Pence. I use transcripts by the Washington Post and the same analysis methods used in the 2008 debate analysis and 2012 debate analysis.

All data and word lists (tagged and chunked) are available for download in plain-text format. This should make it easy to run your own analysis.

I examine word usage based on parts of speech (nouns, verbs, adjectives, adverbs and pronouns) as well as the use of concepts (noun phrases). The speech patterns of opposing candidates are compared in an effort to identify priorities, perspectives, characteristic values and personality traits. Specifically, I analyze

Formal debates such as this are useful input for this kind of analysis. The format is controlled—if somewhat unruly this year. Each speaker is subjected to the same question and is given—in principle—the same amount of time to respond. The variation that would appear in analysis of interviews and other unscripted speech is reduced.


Transcripts by the Washington Post for each debate were parsed to identify the speaker, tag stop words with their part of speech (tagging), and identify noun phrases (chunking).

The tagged and chunked transcripts are analyzed to determine

I attempt to quantify the overall complexity of speech by a metric I call the Windbag Index, which is a product of 8 terms each measuring uniqueness in different aspects of speech (more about Windbag Index).

A full description of each of the steps in the analysis is available in the detailed methods section.

The analysis has some limitations.

Results and Commentary

Each debate analysis report contains a lot of data. Every debate report is shown in exactly the same format, which should help you with making comparisons. To start, you may find these elements the most interesting

From each table, you can download the word list used to generate it. This makes it easy to, for example, grab all the adjectives used by Clinton or all the verbs that Trump used that Clinton did not use.

Analysis of Hillary Clinton vs Donald Trump (1st debate)

Analysis of Hillary Clinton vs Donald Trump (2nd debate)

Analysis of Hillary Clinton vs Donald Trump (2nd debate)

Analysis of Hillary Clinton vs Donald Trump (combined debates)

Analysis of Tim Kaine vs Mike Pence

Visualizing the Debates

Each debate is visualized using tables and word clouds. The word clouds visually show the words and their frequency and tables provide detailed statistics. You can download each word list directly from the tables.

tables & basic word clouds

^ Word usage tables describe the structural characteristics of speech by frequency of words, sentence size, proportion of unique and exclusive words and breakdown of words by part-of-speech • see example
^ Word clouds for each candidate, categorized by parts of speech. Clinton calls for "families" and "hope" • see example
^ Word clouds, categorized by ownership. Trump (red) exclusively uses "tremendous", "cities" and "inner" • see example
^ Word clouds for concepts based on part-of-speech pairs. Clinton (blue) focuses on "new jobs" to Trump's (red) "inner cities" and "bad people" • see example

word clouds with Wordle

^ Exclusive words for Clinton, categorized by part-of-speech • goto table.

You can generate Wordles directly from most data tables.

The word clouds shown above and included in each analysis were generated with my own code. Because these images are static, I thought it would be useful to provide a means for you to tweak your own versions.

Unfortunately, you cannot use Chrome to create Wordles because this browser no longers supports Java NPAPI, which is used by the Wordle app. Use Firefox, Safari or Internet Explorer.

Candidates's Word Usage Profiles

Word Usage Summary

Below are two summary tables from the full analysis of the combined debate trascripts. In this analysis, all three debates were grouped together.

Table 1
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
speaker number of sentences sentence size
all stop non-stop
Hillary Clinton
15.7 21 43
9.0 12 26
7.0 9 19
Donald Trump
10.9 15 36
6.6 9 22
4.6 6 16
14.8 19 40
9.5 11 25
7.5 9 19

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links. See analysis.

Table 2
part of speech count
Count of words categorized by part of speech (POS).
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Hillary Clinton
7,636 2,173
40.5% 28.5%
3,581 1,117
46.9% 31.2%
2,322 799
30.4% 34.4%
1,306 543
17.1% 41.6%
427 97
5.6% 22.7%
Donald Trump
8,158 1,752
37.9% 21.5%
3,639 953
44.6% 26.2%
2,375 588
29.1% 24.8%
1,621 550
19.9% 33.9%
523 83
6.4% 15.9%
15,794 3,008
39.1% 19.0%
7,220 1,635
45.7% 22.6%
4,697 1,102
29.7% 23.5%
2,927 886
18.5% 30.3%
950 127
6.0% 13.4%

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links. See analysis.

Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts. A large number—ooh,look at Trump—corresponds to a stream of repeating words.

^Windbag Index for all candidates in 2016 and 2012 debates. (details).

Word Clouds

Word clouds below are colored by part of speech:   noun   verb   adjective   adverb  

^ Words exclusive to Hillary Clinton (not spoken by Donald Trump) in the first debate, colored by part of speech. Note the repeated use of "families" and "hope". Remember: these are words that Trump did not use. Trump never said "hope".
^ Words exclusive to Donald Trump (not spoken by Hillary Clinton) in the first debate, colored by part of speech: "tremendous", "totally" and "endorsed".

Word clouds below are colored by speaker:   clinton   trump   both  

^ All nouns in debates, colored by contributing speaker (Clinton: blue, Trump: red, spoken by both: grey).
^ All verbs in debates, colored by contributing speaker (Clinton: blue, Trump: red, spoken by both: grey).

Interruptions and verbal dynamics

To attempt to capture the verbal dynamics in the debate, I created a plot that tells you how long one candidate's response was to the other candidate.

Each point on the plot at coordinates (x,y) represents a verbal exchange in which one candidate said x words that were followed by the other candidate saying y words. Number of words is expressed as the square root in order to compress the dynamic range.

Here, only exchanges between the candidates are shown. Not shown are instances when a candidate responds to a moderator or is interrupted by a moderator.

The color of the data point represents which candidate spoke first. The red points tell you about Trump’s response to Clinton. The blue points tell you about Clinton’s response to Trump. For example, if Clinton said 25 words and Trump’s response was 4 words, the exchange would be a red point at coordinate (5,2).

^Length of Clinton's and Trump's responses to each other. If we consider a response of fewer than 5 words as an interruption, Trump interrupts Clinton 29 times and Clinton interrupts Trump 8 times.


The word analysis quantifies the extent to which the way Trump speaks differs from Clinton. I comment on individual tables and word clouds in the analysis of each of the debates, as well as the combined debate. If you want the details, a good place to start is the analysis results of the combined debate, in which the transcripts of all three debates were concatenated and analyzed as one. Below, I draw attention to some of the highlights.

Vocabulary size

Trump and Clinton speak very differently. This is obvious. But their differences are greater than what I've seen in other debates.

For example, in the 2012 debates Obama/Romney debates, both candidates used a simliar number of different words. For example, Obama spoke combined 2,372 different words for a total of 22,029 words. Romney combined 2,349 different words into a delivery of 24,024 words. In other words, the candidates' unique words made up for about 10% of their total words (10.8% for Obama and 9.8% for Romney).

This year, the difference in this number is much larger. Clinton used 2,403 different words for a total of 18,874 and Trump used 1,977 for a total of 21,507. These represent fractions of 12.7% for Clinton and 9.2% for Trump—quite a significant difference.

Trump's functional vocabulary is significantly lower than Clinton's. This may be one of the reasons why Trump appeals to his base supporters who may find Clinton's language too complex.

Owning the issues—words exclusive to a candidate

It is extremely interesting to look at words that one candidate used that were not used by the other. For example, Clinton never says "tremendous" but Trump uses the word 29 times. Similarly, Trump never says "families", which Clinton uses 19 times.

Clinton distinguished herself with words more than Trump. She used 1,290 different words that Trump didn't say. In contrast, Trump only used 864 words that Clinton didn't say. These words are important because they help the candidate distinguish themselves. What were these words?

Clinton's most frequently used exlusive words where information (11), stand (12), try (13), hope (14), clear (19) and families (19). Trump's most frequently used exlusive words were endorsed (14), excuse (14), cities (18), inner (18), tremendous (29) and hillary (51).

What is fascinating here is that "donald" doesn't appear in Clinton's exclusive word list but "Hillary" appears in Trump's list. Why? That's because Trump said "Donald" 3 times across the debates but Clinton never said "Hillary". The three sentences in which he referred to himself were:

"But you will learn more about Donald Trump by going down to the federal elections, where I filed a 104-page essentially financial statement of sorts, the forms that they have."

"She complains that Donald Trump took advantage of the tax code."

"But you wouldn't change it, because all of these people gave you the money so you can take negative ads on Donald Trump."

If you consider the most frequently used exclusive words by parts of speech, it gets even more interesting. Clinton's most frequently used exclusive noun, verb, adjective and adverb were families (14), try (13), clear (17) and forth (5).

Trump's most frequently used exclusive noun, verb, adjective and adverb were hillary (51), endorsed (14), tremendous (28) and totally (10). The verb he chose to use most often that was exclusvie to him was the self-aggrandizing "endorsed".

Sentence structure

Trump delivered 1,970 sentences across all the debates and Clinton only 1,206. How is this possible? His sentences were much shorter.

Trump's sentences had an average of only 10.9 words, which is –30.6% (10.9 vs 15.7) shorter than Clinton. And if you consider only the non-stop words in a sentence, his had only 4.6, which is –34.3% (4.6 vs 7) lower than Clinton. Trump's median sentence only had 6 non-stop words—Clinton's had 9.

It's also interesting to see what the longest sentence was for each candidate. For Clinton's, it was this one delivered in the second debate (town hall), which has 39 non-stop words.

That's why the slogan of my campaign is "Stronger Together," because I think if we work together, if we overcome the divisiveness that sometimes sets Americans against one another, and instead we make some big goals -- and I've set forth some big goals, getting the economy to work for everyone, not just those at the top, making sure that we have the best education system from preschool through college and making it affordable, and so much else.

Trump's longest sentence was also delivered in the second debate and had 45 non-stop words and was

I watch the deals being made, when I watch what's happening with some horrible things like Obamacare, where your health insurance and health care is going up by numbers that are astronomical, 68 percent, 59 percent, 71 percent, when I look at the Iran deal and how bad a deal it is for us, it's a one-sided transaction where we're giving back $150 billion to a terrorist state, really, the number one terror state, we've made them a strong country from really a very weak country just three years ago.


Content of word list archive and data structure syntax is described in the methods section.

Hillary Clinton vs Donald Trump (1st debate) transcript word lists and tag clouds data structure

Hillary Clinton vs Donald Trump (2nd debate) transcript word lists and tag clouds data structure

Hillary Clinton vs Donald Trump (2nd debate) transcript word lists and tag clouds data structure

Hillary Clinton vs Donald Trump (combined debates) transcript word lists and tag clouds data structure

Tim Kaine vs Mike Pence transcript word lists and tag clouds data structure


20 Oct 2016. All debates analyzed including combined Clinton/Trump debate.

5 Oct 2016. 11:19am CEST Kaine vs Paine debate analysis posted. Changed speaker colors to blue (Democrat) and red (Republican).

26 Sep 2016. 21:03pm PST Washinton Post's transcript is complete. Analysis is running.

26 Sep 2016. 20:42pm PST Waiting for Washinton Post's transcript to be complete.

24 Sep 2016. Working on preparing the system for Monday's debate.