home > results and commentary > Clinton vs Trump (combined)

Word Analysis of 2016 U.S. Presidential Debates

Hillary Clinton vs Donald Trump (combined debates)



Word Statistics

Debate Word Count

Summary Word Count

The summary word count reports the total number of words and the number of unique, non-stop words used by each candidate. Word number is expressed as both absolute and relative values.

Table 1a
all words
Number of all words and unique words used by each speaker.
set word count
Hillary Clinton
18,874 2,403
46.7% 12.7%
164712403
Donald Trump
21,507 1,977
53.3% 9.2%
195301977
total
40,381 3,267
100.0% 8.1%
371143267

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 1b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Hillary Clinton
2,227 1,290
11.8% 57.9%
9371290
Donald Trump
1,859 864
8.6% 46.5%
995864
both candidates
36,295 1,113
89.9% 3.1%
351821113

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 1
legend
a c
b d
3010

a :: word count

b :: word count, as fraction in total in debate

c :: unique words in (a)

d :: unique words in (a), as fraction in (a) bar :: proportion of (a-c):c

Table 1
commentary

The combined debate represents all the words delivered by both candidates across all three debates. The total volume of the words across all debates is roughly 3 times that of the average of the debates, as expected.

The fraction of unique words is about half of each debate. For example, of Clinton's 18,874 words across all three debates, 12.7% were unique. This is –40.4% (12.7 vs 21.3) lower than in the first debate. For Trump, 9.2% words were unique, which is –37.8% (9.2 vs 14.8) lower than in his first debate.

The number of words exlusive to a candidate across the three debates roughly doubled when compared to the first debate. Clinton delivered 2,227 exclusive words (words not spoken by Trump) across the three debates and Trump delivered 1,859 exclusive words. These represent a small fraction (89.9%) of the words used by both candidates, which totalled 36,295.

The number of exlusive words is mostly in keeping with the combined debate of Obama vs Romney. There the candidates shared 42,331 words and the fraction of exlusive words were 8.1% for Obama and 8.0% for Romney. What is interesting this year is that Clinton's fraction of exclusive words was 11.8%, which is +45.7% (11.8 vs 8.1) higher than for Obama. This speaks to the larger rift between the candidates approaches this year.

Stop Word Contribution

In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.

Table 2a
non-stop words
Counts of stop and non-stop words.
speaker all stop non-stop
Hillary Clinton
18,874 2,403
100.0% 12.7%
164712403
10,692 151
56.6% 1.4%
10541151
8,182 2,252
43.4% 27.5%
59302252
Donald Trump
21,507 1,977
100.0% 9.2%
195301977
12,571 157
58.5% 1.2%
12414157
8,936 1,820
41.5% 20.4%
71161820
total
40,381 3,267
100.0% 8.1%
371143267
23,263 163
57.6% 0.7%
23100163
17,118 3,104
42.4% 18.1%
140143104

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 2b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Hillary Clinton
2,212 1,284
27.0% 58.0%
9281284
Donald Trump
1,829 852
20.5% 46.6%
977852
both candidates
13,077 968
76.4% 7.4%
12109968

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 2
legend
a c
b d
3010

a :: total number of words, for a given category (all, stop, non-stop)

b :: (a) relative to words in the debate if category=all, otherwise relative to words by the candidate

c :: number of unique words with set (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 2
commentary

Clinton delivered relatively more, +4.6% (43.4 vs 41.5), non-stop words than Trump. These proportions and their difference is very similar to that of the first debate.

Word frequency

The word frequency table summarizes the frequency with which words were used. I show the average word frequency and the weighted cumulative frequencies at 50 and 90 percentile. The average word frequency indicates how many times, on average, a word is used. For a given fraction of the entire delivery, the weighted cumulative frequency indicates the largest word frequency within this fraction (details about weighted cumulative distribution).

Table 3a
word use frequency
Average and 50%/90% percentile word frequencies.
speaker word frequency
all stop non-stop
Hillary Clinton
7.9 55 656
7.85455.000656.000
70.8 219 722
70.808219.000722.000
3.6 8 54
3.6338.00054.000
Donald Trump
10.9 74 521
10.87974.000521.000
80.1 222 696
80.070222.000696.000
4.9 12 66
4.91012.00066.000
total
12.4 121 1,335
12.360121.0001335.000
142.7 333 1,352
142.718333.0001352.000
5.5 16 114
5.51516.000114.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 3b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word frequency
Hillary Clinton
1.72 2 7
1.7232.0007.000
Donald Trump
2.15 3 12
2.1473.00012.000
total
5.51 16 114
5.51516.000114.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 3
legend
a b c
51025

a :: average word frequency

b :: largest word frequency in 50% of content

c :: largest word frequency in 90% of content

bar :: proportion of a:b:c

Table 3
commentary

These word frequency table highlight the extent to which Trump's tendency to repeat words.

On average, Trump repeated his words 10.9 times, which is +38.0% (10.9 vs 7.9) higher than Clinton. The frequency of non-stop words is what is interesting here, though. For this subset, Trump repeated his non-stop words 4.9 times on average, which is +36.1% (4.9 vs 3.6) higher than Clinton.

The top 5 repeated non-stop words by Clinton were: well (89), will (89), Donald (93), think (107) and people (114). For Trump, these were know (95), look (100), will (100), country (116) and people (133). Trump loves using "look", as part of the phrase "look here".

Trump also tended to repeat the words that were exlusive to him more by an average of 2.14 times, which is +25.0% (2.15 vs 1.72) higher than Clinton.

Clinton's most frequently used exlusive words where information (11), stand (12), try (13), hope (14), clear (19) and families (19). Trump's most frequently used exlusive words were endorsed (14), excuse (14), cities (18), inner (18), tremendous (29) and hillary (51).

What is fascinating here is that "donald" doesn't appear in Clinton's exclusive word list but "Hillary" appears in Trump's list. Why? That's because Trump said "Donald" 3 times across the debates but Clinton never said "Hillary". The three sentences in which he referred to himself were:

"But you will learn more about Donald Trump by going down to the federal elections, where I filed a 104-page essentially financial statement of sorts, the forms that they have."

"She complains that Donald Trump took advantage of the tax code."

"But you wouldn't change it, because all of these people gave you the money so you can take negative ads on Donald Trump."

Sentence Size

Table 4
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
speaker number of sentences sentence size
all stop non-stop
Hillary Clinton
1,206
1206
15.7 21 43
15.71321.00043.000
9.0 12 26
8.97412.00026.000
7.0 9 19
6.9559.00019.000
Donald Trump
1,970
1970
10.9 15 36
10.94215.00036.000
6.6 9 22
6.6139.00022.000
4.6 6 16
4.6316.00016.000
total
3,176
3176
14.8 19 40
14.75419.00040.000
9.5 11 25
9.52311.00025.000
7.5 9 19
7.5139.00019.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 4
legend
a b c
51025

a :: average sentence size

b :: largest sentence size for 50% of content

c :: largest sentence size for 90% of content

bar :: proportion of a:b:c

Table 4
commentary

Trump delivered 1,970 sentences across all the debates, which is +63.3% (1,970 vs 1,206) more than Clinton. Given that he only delivered +14.0% (21,507 vs 18,874) more words than Clinton, this suggests that his sentences were much shorter.

Indeed, Trump's sentences had an average of only 10.9 words, which is –30.6% (10.9 vs 15.7) shorter than Clinton. And if you consider only the non-stop words in a sentence, his had only 4.6, which is –34.3% (4.6 vs 7) lower than Clinton.

Trump's median sentence only had 6 non-stop words—Clinton's had 9.

Clinton's longest sentence was delivered in the second debate (town hall) had 39 non-stop words and was

That's why the slogan of my campaign is "Stronger Together," because I think if we work together, if we overcome the divisiveness that sometimes sets Americans against one another, and instead we make some big goals -- and I've set forth some big goals, getting the economy to work for everyone, not just those at the top, making sure that we have the best education system from preschool through college and making it affordable, and so much else.

Trump's longest sentence was also delivered in the second debate and had 45 non-stop words and was

I watch the deals being made, when I watch what's happening with some horrible things like Obamacare, where your health insurance and health care is going up by numbers that are astronomical, 68 percent, 59 percent, 71 percent, when I look at the Iran deal and how bad a deal it is for us, it's a one-sided transaction where we're giving back $150 billion to a terrorist state, really, the number one terror state, we've made them a strong country from really a very weak country just three years ago.

Still undecided?

All further word use statistics represent content that has been filtered for stop words, unless explicitly indicated.

Part of Speech Analysis

In this section, word frequency is broken down by their part of speech (POS). The four POS groups examined are nouns, verbs, adjectives and adverbs. Conjunctions and prepositions are not considered. The first category (n+v+adj+adv) is composed of all four POS groups.

Part of Speech Count

Table 5
part of speech count
Count of words categorized by part of speech (POS).
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Hillary Clinton
7,636 2,173
40.5% 28.5%
24641117152379976354333097
3,581 1,117
46.9% 31.2%
24641117
2,322 799
30.4% 34.4%
1523799
1,306 543
17.1% 41.6%
763543
427 97
5.6% 22.7%
33097
Donald Trump
8,158 1,752
37.9% 21.5%
26869531787588107155044083
3,639 953
44.6% 26.2%
2686953
2,375 588
29.1% 24.8%
1787588
1,621 550
19.9% 33.9%
1071550
523 83
6.4% 15.9%
44083
total
15,794 3,008
39.1% 19.0%
55851635359511022041886823127
7,220 1,635
45.7% 22.6%
55851635
4,697 1,102
29.7% 23.5%
35951102
2,927 886
18.5% 30.3%
2041886
950 127
6.0% 13.4%
823127

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 5
legend
a c
b d
1535

a :: total number of words for a given POS (all, noun, verb, adjective, adverb, pronoun)

b :: (a) relative to all words by candidate

c :: unique words in (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 5
commentary

Both candidates used roughly the same fraction of nouns and verbs in their deliveries. About 45% of words were nouns and another 30% were verbs. Clinton delivered 799 unique verbs, which is +35.9% (799 vs 588) more than Trump. Given that verbs are action words, this is interesting to contrast against Trump's statements that Clinton's is "all words and no action". Her words speak much more to "action" than Trump's.

Trump used proportionately more adjectives and adverbs than Clinton. For example, 19.9% and 6.4% of Trump's words were adjectives and adverbs, respectively, which is (+16.4% (19.9 vs 17.1) and +14.3% (6.4 vs 5.6) higher than Clinton.

The total number of unique adjectives was very similar (543 for Clinton and 550 for Trump). Clinton edged Trump in unique adverbs. She used 97 and Trump used 83.

Part of Speech Frequency

Table 5
part of speech frequency
Frequency of words categorized by part of speech (POS).
part of speech frequency
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Hillary Clinton
3.51 7 57
3.5147.00057.000
3.21 6 39
3.2066.00039.000
2.91 5 76
2.9065.00076.000
2.40 4 20
2.4054.00020.000
4.40 14 57
4.40214.00057.000
Donald Trump
4.66 11 74
4.65611.00074.000
3.82 8 40
3.8188.00040.000
4.04 10 73
4.03910.00073.000
2.95 5 37
2.9475.00037.000
6.30 26 88
6.30126.00088.000
total
5.25 14 105
5.25114.000105.000
4.42 11 65
4.41611.00065.000
4.26 13 147
4.26213.000147.000
3.30 7 43
3.3047.00043.000
7.48 34 145
7.48034.000145.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 5
legend
a b c
51025

a :: average word frequency

b :: largest word frequency in 50% of content

c :: largest word frequency in 90% of content

bar :: proportion of a:b:c

Table 5
commentary

This table tells you how each part of speech contributes to the overall repetition in the candidates' delivery.

We already know that Trump repeats himself more than Trump. But for which parts of speech is this more pronounced (pun intended).

Trump repeats himself +19.0% (3.82 vs 3.21) more than Clinton for nouns, +38.8% (4.04 vs 2.91) for verbs, +22.4% (2.95 vs 2.41) for adjectives and +43.2% (6.3 vs 4.4) for adverbs.

Clinton's most frequently used noun, verb, adjective and adverb were people (105), think (107), good (27) and just (44).

Trump's most frequently used noun, verb, adjective and adverb were people (113), will (98), great (51) and just (88).

A huge difference in style and substance can be seen by looking at the parts of speech that are most commonly used among the words exlusive to a candidate.

Clinton's most frequently used exclusive (those not said by Trump) noun, verb, adjective and adverb were families (14), try (13), clear (17) and forth (5).

Trump's most frequently used exclusive (those not said by Clinton) noun, verb, adjective and adverb were hillary (51), endorsed (14), tremendous (28) and totally (10).

Again, still "totally" undecided?

Part of Speech Pairing

Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.

Table 6a
part of speech pairing — Hillary Clinton
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Hillary Clinton
noun verb adjective adverb
noun
8,504 7,154
  84.1%
13507154
verb
10,589 9,165
  86.6%
14249165
2,973 2,557
  86.0%
4162557
adjective
5,270 4,631
  87.9%
6394631
3,223 2,861
  88.8%
3622861
863 777
  90.0%
86777
adverb
1,946 1,711
  87.9%
2351711
1,306 1,143
  87.5%
1631143
641 590
  92.0%
51590
117 101
  86.3%
16101

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 6b
part of speech pairing — Donald Trump
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Donald Trump
noun verb adjective adverb
noun
6,635 5,042
  76.0%
15935042
verb
7,411 5,738
  77.4%
16735738
2,001 1,506
  75.3%
4951506
adjective
4,396 3,610
  82.1%
7863610
2,516 2,087
  82.9%
4292087
762 644
  84.5%
118644
adverb
1,678 1,312
  78.2%
3661312
1,039 819
  78.8%
220819
586 500
  85.3%
86500
130 98
  75.4%
3298

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 6c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
unique part of speech pairings
noun (n) verb (v) adjective (adj) adverb (adv)
noun
7,154 5,042
  70.5%
7154
5042
verb
9,165 5,738
  62.6%
9165
5738
2,557 1,506
  58.9%
2557
1506
adjective
4,631 3,610
  78.0%
4631
3610
2,861 2,087
  72.9%
2861
2087
777 644
  82.9%
777
644
adverb
1,711 1,312
  76.7%
1711
1312
1,143 819
  71.7%
1143
819
590 500
  84.7%
590
500
101 98
  97.0%
101
98

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 6 a,b
legend
a c
  d
3010

a :: total number of pairs, for a given category (e.g. verb/noun)

c :: number of unique pairs within set (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 6c
legend
a c
  d
50
45

a :: unique pairs for Hillary Clinton

c :: unique pairs for Donald Trump

d :: (c) relative to (a) (i.e. Donald Trump relative to Hillary Clinton)

bars :: (a) and (c)

Table 6
commentary

Because Clinton's sentences were longer, her numbers for word pairings are larger.

She delivered 9,165 unique verb/noun combinations, which is +59.7% (9,165 vs 5,738) more than Trump. But the pairing for which she had the largest difference from Trump was the verb/verb pairing. She delivered 2,557 unique verb/verb pairs, which was +69.8% (2,557 vs 1,506) more than Trump.

Exclusive and Shared Usage

This section enumerates words that were exclusive to a candidate (e.g. used by one candidate but not the other). This content provides insight into what the candidates' priorities are and reveals differences in perspective on similar topics.

For a given part of speech, the table breaks down the number of words that were spoken by only one of the candidates or both candidates (intersection). The last row includes words spoken by either candidate (union).

Table 7
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Hillary Clinton
2,119 1,256
100.0% 59.3%
13.4% 41.8%
8631256
3906112114371312841936
1,001 611
47.2% 61.0%
13.9% 37.4%
390611
390611
648 437
30.6% 67.4%
13.8% 39.7%
211437
211437
415 284
19.6% 68.4%
14.2% 32.1%
131284
131284
55 36
2.6% 65.5%
5.8% 28.3%
1936
1936
Donald Trump
1,771 835
100.0% 47.1%
11.2% 27.8%
936835
4004352112451972362324
835 435
47.1% 52.1%
11.6% 26.6%
400435
400435
456 245
25.7% 53.7%
9.7% 22.2%
211245
211245
433 236
24.4% 54.5%
14.8% 26.6%
197236
197236
47 24
2.7% 51.1%
4.9% 18.9%
2324
2324
both candidates
11,904 917
100.0% 7.7%
75.4% 30.5%
10987917
46514353116285159320776453
5,086 435
42.7% 8.6%
70.4% 26.6%
4651435
4651435
3,401 285
28.6% 8.4%
72.4% 25.9%
3116285
3116285
1,800 207
15.1% 11.5%
61.5% 23.4%
1593207
1593207
817 53
6.9% 6.5%
86.0% 41.7%
76453
76453
total
15,794 3,008
100.0% 19.0%
100.0% 100.0%
127863008
55851635359511022041886823127
7,220 1,635
45.7% 22.6%
100.0% 100.0%
55851635
55851635
4,697 1,102
29.7% 23.5%
100.0% 100.0%
35951102
35951102
2,927 886
18.5% 30.3%
100.0% 100.0%
2041886
2041886
950 127
6.0% 13.4%
100.0% 100.0%
823127
823127

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 7c
legend
a d
b e
c f
4030
40302015105

a :: total number of words in set (e.g. obama \ romney, obama ∩ romney, obama ∪ romney , for a given part of speech

b :: (a) relative to all exclusive words in n+v+adj+adv

c :: (a) relative to all words in n+v+adj+adv

d :: unique words in (a)

e :: (d) relative to (a)

f :: (d) relative to all unique words in n+v+adj+adv

bar1 :: normalized ratio of (a-d):d

bar2 :: absolute ratio of (a-d):d for all POS groups (first column) or POS group (other columns)

Table 7
commentary

This is a fun table. It breaks down the part of speech categorization for the words exclusive to a candidate. In other words, nouns, verbs, adjectives and adverbs spoken by Clinton and not Trump, and so on.

Clinton used 611 nouns that Trump didn't use, 437 verbs, 284 adjectives and 36 adverbs. In contrast, Trump's numbers for his exclusive parts of speech were uniformly lower across all categories: –28.8% (435 vs 611) lower for nouns, –43.9% (245 vs 437) lower for verbs, –16.9% (236 vs 284) lower for adjectives and –33.3% (24 vs 36) lower for adverbs.

The greatest difference here was for verbs. This is ironic, since it is Trump's assertion that Clinton lacks initative and action.

Noun Phrase Usage

Noun phrases were extracted from the text and analyzed for frequency, word count, unique word count and richness. Single-word phrases were not counted.

Top-level noun phrases are those without a parent noun phrase (a parent phrase is one that a similar, longer phrase). Derived noun phrases are those with a parent (more details about noun phrase analysis).

The top-level noun phrases can be interpreted as independent concepts. Derived noun phrases can be interpreted as variants on concepts embodied by the top-level phrases.

Noun Phrase Count and length

This table reports the absolute number of noun phrases, which is related to the number of nouns, and their length.

Table 8a
noun phrase count
Counts of noun phrases in words and per noun.
speaker noun phrase count
all top-level
Hillary Clinton
1,230 502
100.0% 40.8%
0.34 0.45
728502
991 478
80.6% 48.2%
0.28 0.43
513478
Donald Trump
1,292 440
100.0% 34.1%
0.36 0.46
852440
973 418
75.3% 43.0%
0.27 0.44
555418

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 8b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Hillary Clinton
2.33 2 3
2.3302.0003.000
2.39 2 4
2.3882.0004.000
Donald Trump
2.27 2 3
2.2702.0003.000
2.35 2 4
2.3472.0004.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 8a
legend
a d
b e
c f
1070

a :: number of noun phrases

b :: (a) relative to number of all noun phrases

c :: number of noun phrases per noun

d :: number of unique phrases

e :: (c) relative to (a)

f :: number of unique noun phrases per unique noun

bar :: normalized ratio of (a-c):c

Table 8b
legend
a b c
102080

a :: average noun phrase size, in words

b :: largest noun phrase size in 50% of content

c :: largest noun phrase size in 90% of content

bar :: proportion of a:b:c


Table 8
commentary

Noun phrases can be used to identify concepts longer than a word. In total, both candidates delivered roughly the same number of noun phrases, which were of similar length.

Exclusive and Shared Noun Phrase Count and length

Table 9a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
speaker noun phrase count
all top-level
Hillary Clinton
1,044 482
41.4% 46.2%
562482
946 471
90.6% 49.8%
475471
Donald Trump
1,105 423
43.8% 38.3%
682423
929 412
84.1% 44.3%
517412
both candidates
373 57
14.8% 15.3%
31657
89 25
23.9% 28.1%
6425

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 9b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Hillary Clinton
2.38 2 4
2.3822.0004.000
2.40 2 4
2.4002.0004.000
Donald Trump
2.31 2 3
2.3092.0003.000
2.35 2 4
2.3552.0004.000
both candidates
2.04 2 2
2.0402.0002.000
2.17 2 3
2.1692.0003.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 9a
legend
a c
b d
1070

a :: number of noun phrases

b :: (a) relative to number of all noun phrases

c :: number of unique phrases

d :: (c) relative to (a)

bar :: normalized ratio of (a-c):c

Table 9b
legend
a b c
102080

a :: average noun phrase size, in words

b :: largest noun phrase size in 50% of content

c :: largest noun phrase size in 90% of content

bar :: proportion of a:b:c


Table 9
commentary

Clinton's longest three noun phrases were

military civilian intelligence professionals
mexican immigrants rapists criminals drug dealers
dishwashers painters architects glass installers marble installers drapery installers

For Trump, these were

great general four-star general today
104-page essentially financial statement
new roads new tunnels new bridges new airports new schools new hospitals

Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.

Table 10
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
speaker Windbag Index
index value index terms
Hillary Clinton
2,127
-79.7%
2127.34952894342
0.434 0.275 0.312 0.344 0.416 0.227 0.408 0.952
+4.3% +35.1% +19.1% +39.0% +22.5% +43.1% +19.8% +0.2%
0.4335064109356790.2752383280371550.311924043563250.3440999138673560.4157733537519140.2271662763466040.4081300813008130.952191235059761
Donald Trump
10,461
+391.8%
10461.9838115629
0.415 0.204 0.262 0.248 0.339 0.159 0.341 0.950
-4.2% -26.0% -16.0% -28.1% -18.4% -30.1% -16.6% -0.2%
0.4154926303064120.203670546105640.2618851332783730.2475789473684210.3392967304133250.1586998087954110.3405572755417960.95
Table 10
legend
The Windbag Index is 1/(t1*t2*...*t9) where t1,t2,...,t8 are

t1 :: fraction of words that are non-stop

t2 :: fraction of non-stop words that are unique

t3 :: fraction of nouns that are unique

t4 :: fraction of verbs that are unique

t5 :: fraction of adjectives that are unique

t6 :: fraction of adverbs that are unique

t7 :: fraction of noun phrases that are unique

t8 :: fraction of noun phrases that are top-level


Large individual terms t1...t9 contribute to a smaller index.

The percentage values below the index and each term are relative differences to the other speaker's corresponding term (i.e. 100*(a-b)/b where a is the value for one speaker and b for the other).
Table 10
commentary

Trump's index is insanely high, +391.8% (10,461 vs 2,127) higher than Clinton.

For perspective, the Windbag Index the combined debates of Obama vs Romney shows Obama with 3,844 and Romney with 5,170.

It's important to realize that the index is going to be a function of the total number of words delivered. However, comparisons between numbers can be made if the length and format of the delivery offered to each candidate is the same.

Word Clouds

In the word clouds below, the size of the word is proportional to the number of times it was used by a candidate (method details).

Not all words from a group used to draw the cloud fit in the image — less frequently used words for large word groups may fall outside the image.

All Words for Each Candidate

Each candidate's debate portion was extracted and frequencies were compiled for each part of speech (noun, verb, adjective, adverb), with words colored by their part of speech category.

The distribution of sizes within a tag cloud follows the frequency distribution of words. However, word size cannot be compared between clouds, since the minimum and maximum size of the words is fixed.

Debate Word Cloud for Hillary Clinton - all words

Debate tag cloud for Hillary Clinton

Debate Word Cloud for Donald Trump - all words

Debate tag cloud for Donald Trump
commentary

Clinton's word cloud has a larger proportion of larger text because she repeats herself less, as we've seen from the tables above.

It's interesting to see Clinton's use of "good" compared to Trump's use of "great". Both these words are the center of their cloud and although both are positive, they have quite a different feel to them.

Trump's use of "great" is quite vernacular. It's an emotional word that is much stronger and persuasive word than "good". Do you want a "good" meal or a "great" meal? The word is part of Trump's "Make America Great Again" slogan and is more of a sales pitch than a statement of quality.

Exclusive Words for Each Candidate

The clouds below show words used exlusively by a candidate. For example, if candidate A used the word "invest" (any number of times), but candidate B did not, then the word will appear in the exclusive word tag cloud for candidate A.

Words exclusive to Hillary Clinton

Debate tag cloud for Hillary Clinton

Words exclusive to Donald Trump

Debate tag cloud for Donald Trump
commentary

The exlusive word clouds are really fun. Trump's use of "tremendous", "endorsed", "cities" and "totally" occupies center stage. Clinton's exlusive words are more what you expect from a traditional political discourse in which issues are discussed dispassionately: "families", "clear", "forth" and "hope".

Note that Trump's use of "endorsed" is always for the purpose of elevating his authority. He's using the word to tell us how great he is and, for this reason, that we should agree with him because, after all, all these other people have agreed with him. His mechanism of persuation is building the momentum of a mob.

Part of Speech Word Clouds

In these clouds, words from each major part of speech were colored based on whether they were exclusive to a candidate or shared by the candidates.

The size of the word is relative to the frequency for the candidate — word sizes between candidates should not be used to indicate difference in absolute frequency.

Cloud of noun words, by speaker

commentary

If we relate the use of words back to the candidates' slogans, Clinton's use of "families" is in keeping with her "Stronger Together" slogan.

Trump's attempt to terrify the electorate with "Chicago" and "cities", all mentioned in the context of violence and guns. These words suggest that violence is one of the reasons why he doesn't think America is "great" right now is because of the violence.

Cloud of verb words, by speaker

commentary

The verb cloud is a little scary because Trump's primary exclusive verb is the self-aggrandizing "endorsed". Clinton wants us to "try", "hope" and "invest".

Cloud of adjective words, by speaker

commentary

Tremendous. Just tremendous.

But, I think I'll go with "clear".

Cloud of adverb words, by speaker

commentary

America will "essentially" be "totally" great again "soon".

"horribly" said.

Cloud of all words, by speaker

commentary

When we combine the exclusive words for each candidate across all parts of speech, Clinton's (blue) words dominate, except for Trump's "tremendous".

Word Pair Clouds for Each Candidate

word pairs for Hillary Clinton

adjective/adjective by Hillary Clinton
adjective/adverb by Hillary Clinton
adjective/noun by Hillary Clinton
adjective/verb by Hillary Clinton
adverb/adverb by Hillary Clinton
adverb/noun by Hillary Clinton
adverb/verb by Hillary Clinton
noun/noun by Hillary Clinton
noun/verb by Hillary Clinton
verb/verb by Hillary Clinton

word pairs for Donald Trump

adjective/adjective by Donald Trump
adjective/adverb by Donald Trump
adjective/noun by Donald Trump
adjective/verb by Donald Trump
adverb/adverb by Donald Trump
adverb/noun by Donald Trump
adverb/verb by Donald Trump
noun/noun by Donald Trump
noun/verb by Donald Trump
verb/verb by Donald Trump
commentary

Trump insists that we "take look" and "look just". His adjective/noun combinations like "inner cities", "bad people" and "law order" pander to fears.

And so it's interesting that Clinton, whose job policy is criticized by Trump, should be the one who has "new jobs" as the top adjective/noun pair.

Downloads

Debate transcript

Parsed word lists and word clouds (word lists, part of speech lists, noun phrases, sentences) (word clouds)

Raw data structure

Please see the methods section for details about these files.