Word Analysis of 2016 Presidential Debates — Clinton vs Trump by Martin Krzywinski | projects contact

Chrome no longer supports Java NPAPI. To create Wordles directly from debate analysis tables, use Firefox or Safari or Explorer.

home > results and commentary > Clinton vs Trump (2nd debate)

Word Analysis of Hillary Clinton vs Donald Trump (2nd debate)

Introduction

Word Statistics

Debate Word Count

Summary Word Count

The summary word count reports the total number of words and the number of unique, non-stop words used by each candidate. Word number is expressed as both absolute and relative values.

Table 1a
all words
Number of all words and unique words used by each speaker.
set word count
Hillary Clinton
6,004 1,225
45.7% 20.4%
47791225
Donald Trump
7,139 1,122
54.3% 15.7%
60171122
total
13,143 1,786
100.0% 13.6%
113571786

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 1b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Hillary Clinton
981 664
16.3% 67.7%
317664
Donald Trump
1,045 561
14.6% 53.7%
484561
both candidates
11,117 561
84.6% 5.0%
10556561

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 1
legend
a c
b d
3010

a :: word count

b :: word count, as fraction in total in debate

c :: unique words in (a)

d :: unique words in (a), as fraction in (a) bar :: proportion of (a-c):c

Table 1
commentary

Stop Word Contribution

In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.

Table 2a
non-stop words
Counts of stop and non-stop words.
speaker all stop non-stop
Hillary Clinton
6,004 1,225
100.0% 20.4%
47791225
3,424 135
57.0% 3.9%
3289135
2,580 1,090
43.0% 42.2%
14901090
Donald Trump
7,139 1,122
100.0% 15.7%
60171122
4,127 146
57.8% 3.5%
3981146
3,012 976
42.2% 32.4%
2036976
total
13,143 1,786
100.0% 13.6%
113571786
7,551 153
57.5% 2.0%
7398153
5,592 1,633
42.5% 29.2%
39591633

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 2b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Hillary Clinton
973 657
37.7% 67.5%
316657
Donald Trump
1,000 543
33.2% 54.3%
457543
both candidates
3,619 433
64.7% 12.0%
3186433

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 2
legend
a c
b d
3010

a :: total number of words, for a given category (all, stop, non-stop)

b :: (a) relative to words in the debate if category=all, otherwise relative to words by the candidate

c :: number of unique words with set (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 2
commentary

Word frequency

The word frequency table summarizes the frequency with which words were used. I show the average word frequency and the weighted cumulative frequencies at 50 and 90 percentile. The average word frequency indicates how many times, on average, a word is used. For a given fraction of the entire delivery, the weighted cumulative frequency indicates the largest word frequency within this fraction (details about weighted cumulative distribution).

Table 3a
word use frequency
Average and 50%/90% percentile word frequencies.
speaker word frequency
all stop non-stop
Hillary Clinton
4.9 21 223
4.90121.000223.000
25.4 67 238
25.36367.000238.000
2.4 4 22
2.3674.00022.000
Donald Trump
6.4 24 165
6.36324.000165.000
28.3 77 232
28.26777.000232.000
3.1 5 24
3.0865.00024.000
total
7.4 40 420
7.35940.000420.000
49.4 139 473
49.353139.000473.000
3.4 7 40
3.4247.00040.000

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 3b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word frequency
Hillary Clinton
1.48 2 5
1.4812.0005.000
Donald Trump
1.84 2 8
1.8422.0008.000
total
3.42 7 40
3.4247.00040.000

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 3
legend
a b c
51025

a :: average word frequency

b :: largest word frequency in 50% of content

c :: largest word frequency in 90% of content

bar :: proportion of a:b:c

Table 3
commentary

Sentence Size

Table 4
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
speaker number of sentences sentence size
all stop non-stop
Hillary Clinton
348
348
17.3 24 46
17.29024.00046.000
10.0 13 27
9.96213.00027.000
7.5 11 22
7.52911.00022.000
Donald Trump
651
651
11.0 15 38
10.98915.00038.000
6.6 9 24
6.6209.00024.000
4.7 7 17
4.7467.00017.000
total
999
999
15.2 19 44
15.18419.00044.000
9.8 12 28
9.80812.00028.000
7.7 9 20
7.7229.00020.000

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 4
legend
a b c
51025

a :: average sentence size

b :: largest sentence size for 50% of content

c :: largest sentence size for 90% of content

bar :: proportion of a:b:c

Table 4
commentary

All further word use statistics represent content that has been filtered for stop words, unless explicitly indicated.

Part of Speech Analysis

In this section, word frequency is broken down by their part of speech (POS). The four POS groups examined are nouns, verbs, adjectives and adverbs. Conjunctions and prepositions are not considered. The first category (n+v+adj+adv) is composed of all four POS groups.

Part of Speech Count

Table 5
part of speech count
Count of words categorized by part of speech (POS).
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Hillary Clinton
2,407 1,039
40.1% 43.2%
6435413543471582249248
1,184 541
49.2% 45.7%
643541
701 347
29.1% 49.5%
354347
382 224
15.9% 58.6%
158224
140 48
5.8% 34.3%
9248
Donald Trump
2,729 928
38.2% 34.0%
77151047426727825112355
1,281 510
46.9% 39.8%
771510
741 267
27.2% 36.0%
474267
529 251
19.4% 47.4%
278251
178 55
6.5% 30.9%
12355
total
5,136 1,565
39.1% 30.5%
161585093051251239924078
2,465 850
48.0% 34.5%
1615850
1,442 512
28.1% 35.5%
930512
911 399
17.7% 43.8%
512399
318 78
6.2% 24.5%
24078

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 5
legend
a c
b d
1535

a :: total number of words for a given POS (all, noun, verb, adjective, adverb, pronoun)

b :: (a) relative to all words by candidate

c :: unique words in (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 5
commentary

Part of Speech Frequency

Table 5
part of speech frequency
Frequency of words categorized by part of speech (POS).
part of speech frequency
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Hillary Clinton
2.32 3 24
2.3173.00024.000
2.19 3 18
2.1893.00018.000
2.02 3 24
2.0203.00024.000
1.71 2 8
1.7052.0008.000
2.92 4 22
2.9174.00022.000
Donald Trump
2.94 5 23
2.9415.00023.000
2.51 4 17
2.5124.00017.000
2.77 5 24
2.7755.00024.000
2.11 3 12
2.1083.00012.000
3.24 7 22
3.2367.00022.000
total
3.28 6 40
3.2826.00040.000
2.90 5 26
2.9005.00026.000
2.82 6 52
2.8166.00052.000
2.28 3 16
2.2833.00016.000
4.08 11 44
4.07711.00044.000

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 5
legend
a b c
51025

a :: average word frequency

b :: largest word frequency in 50% of content

c :: largest word frequency in 90% of content

bar :: proportion of a:b:c

Table 5
commentary

Part of Speech Pairing

Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.

Table 6a
part of speech pairing — Hillary Clinton
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Hillary Clinton
noun verb adjective adverb
noun
3,185 2,781
  87.3%
4042781
verb
3,586 3,253
  90.7%
3333253
959 864
  90.1%
95864
adjective
1,759 1,622
  92.2%
1371622
977 897
  91.8%
80897
238 230
  96.6%
8230
adverb
771 681
  88.3%
90681
499 450
  90.2%
49450
227 216
  95.2%
11216
47 37
  78.7%
1037

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 6b
part of speech pairing — Donald Trump
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Donald Trump
noun verb adjective adverb
noun
2,653 2,072
  78.1%
5812072
verb
2,547 2,024
  79.5%
5232024
596 492
  82.6%
104492
adjective
1,497 1,273
  85.0%
2241273
764 661
  86.5%
103661
227 207
  91.2%
20207
adverb
657 552
  84.0%
105552
349 308
  88.3%
41308
190 179
  94.2%
11179
61 52
  85.2%
952

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 6c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
unique part of speech pairings
noun (n) verb (v) adjective (adj) adverb (adv)
noun
2,781 2,072
  74.5%
2781
2072
verb
3,253 2,024
  62.2%
3253
2024
864 492
  56.9%
864
492
adjective
1,622 1,273
  78.5%
1622
1273
897 661
  73.7%
897
661
230 207
  90.0%
230
207
adverb
681 552
  81.1%
681
552
450 308
  68.4%
450
308
216 179
  82.9%
216
179
37 52
  140.5%
37
52

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 6 a,b
legend
a c
  d
3010

a :: total number of pairs, for a given category (e.g. verb/noun)

c :: number of unique pairs within set (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 6c
legend
a c
  d
50
45

a :: unique pairs for Hillary Clinton

c :: unique pairs for Donald Trump

d :: (c) relative to (a) (i.e. Donald Trump relative to Hillary Clinton)

bars :: (a) and (c)

Table 6
commentary

Exclusive and Shared Usage

This section enumerates words that were exclusive to a candidate (e.g. used by one candidate but not the other). This content provides insight into what the candidates' priorities are and reveals differences in perspective on similar topics.

For a given part of speech, the table breaks down the number of words that were spoken by only one of the candidates or both candidates (intersection). The last row includes words spoken by either candidate (union).

Table 7
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Hillary Clinton
937 637
100.0% 68.0%
18.2% 40.7%
300637
1293149522134119619
443 314
47.3% 70.9%
18.0% 36.9%
129314
129314
316 221
33.7% 69.9%
21.9% 43.2%
95221
95221
153 119
16.3% 77.8%
16.8% 29.8%
34119
34119
25 19
2.7% 76.0%
7.9% 24.4%
619
619
Donald Trump
962 526
100.0% 54.7%
18.7% 33.6%
436526
164271961431021421727
435 271
45.2% 62.3%
17.6% 31.9%
164271
164271
239 143
24.8% 59.8%
16.6% 27.9%
96143
96143
244 142
25.4% 58.2%
26.8% 35.6%
102142
102142
44 27
4.6% 61.4%
13.8% 34.6%
1727
1727
both candidates
3,237 402
100.0% 12.4%
63.0% 25.7%
2835402
12892017321023467621025
1,490 201
46.0% 13.5%
60.4% 23.6%
1289201
1289201
834 102
25.8% 12.2%
57.8% 19.9%
732102
732102
422 76
13.0% 18.0%
46.3% 19.0%
34676
34676
235 25
7.3% 10.6%
73.9% 32.1%
21025
21025
total
5,136 1,565
100.0% 30.5%
100.0% 100.0%
35711565
161585093051251239924078
2,465 850
48.0% 34.5%
100.0% 100.0%
1615850
1615850
1,442 512
28.1% 35.5%
100.0% 100.0%
930512
930512
911 399
17.7% 43.8%
100.0% 100.0%
512399
512399
318 78
6.2% 24.5%
100.0% 100.0%
24078
24078

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 7c
legend
a d
b e
c f
4030
40302015105

a :: total number of words in set (e.g. obama \ romney, obama ∩ romney, obama ∪ romney , for a given part of speech

b :: (a) relative to all exclusive words in n+v+adj+adv

c :: (a) relative to all words in n+v+adj+adv

d :: unique words in (a)

e :: (d) relative to (a)

f :: (d) relative to all unique words in n+v+adj+adv

bar1 :: normalized ratio of (a-d):d

bar2 :: absolute ratio of (a-d):d for all POS groups (first column) or POS group (other columns)

Table 7
commentary

Noun Phrase Usage

Noun phrases were extracted from the text and analyzed for frequency, word count, unique word count and richness. Single-word phrases were not counted.

Top-level noun phrases are those without a parent noun phrase (a parent phrase is one that a similar, longer phrase). Derived noun phrases are those with a parent (more details about noun phrase analysis).

The top-level noun phrases can be interpreted as independent concepts. Derived noun phrases can be interpreted as variants on concepts embodied by the top-level phrases.

Noun Phrase Count and length

This table reports the absolute number of noun phrases, which is related to the number of nouns, and their length.

Table 8a
noun phrase count
Counts of noun phrases in words and per noun.
speaker noun phrase count
all top-level
Hillary Clinton
398 232
100.0% 58.3%
0.00 0.00
166232
348 225
87.4% 64.7%
0.00 0.00
123225
Donald Trump
447 219
100.0% 49.0%
0.00 0.00
228219
376 207
84.1% 55.1%
0.00 0.00
169207

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 8b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Hillary Clinton
2.33 2 3
2.3292.0003.000
2.37 2 3
2.3712.0003.000
Donald Trump
2.30 2 3
2.3002.0003.000
2.35 2 4
2.3542.0004.000

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 8a
legend
a d
b e
c f
1070

a :: number of noun phrases

b :: (a) relative to number of all noun phrases

c :: number of noun phrases per noun

d :: number of unique phrases

e :: (c) relative to (a)

f :: number of unique noun phrases per unique noun

bar :: normalized ratio of (a-c):c

Table 8b
legend
a b c
102080

a :: average noun phrase size, in words

b :: largest noun phrase size in 50% of content

c :: largest noun phrase size in 90% of content

bar :: proportion of a:b:c


Table 8
commentary

Exclusive and Shared Noun Phrase Count and length

Table 9a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
speaker noun phrase count
all top-level
Hillary Clinton
362 225
42.8% 62.2%
137225
341 223
94.2% 65.4%
118223
Donald Trump
410 211
48.5% 51.5%
199211
369 204
90.0% 55.3%
165204
both candidates
73 18
8.6% 24.7%
5518
14 5
19.2% 35.7%
95

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 9b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Hillary Clinton
2.36 2 3
2.3622.0003.000
2.38 2 3
2.3782.0003.000
Donald Trump
2.33 2 4
2.3272.0004.000
2.36 2 4
2.3602.0004.000
both candidates
2.00 2 2
2.0002.0002.000
2.00 2 2
2.0002.0002.000

Fields with (e.g. 155) link to data files and Wordles. Hover over the field to show these links.

Table 9a
legend
a c
b d
1070

a :: number of noun phrases

b :: (a) relative to number of all noun phrases

c :: number of unique phrases

d :: (c) relative to (a)

bar :: normalized ratio of (a-c):c

Table 9b
legend
a b c
102080

a :: average noun phrase size, in words

b :: largest noun phrase size in 50% of content

c :: largest noun phrase size in 90% of content

bar :: proportion of a:b:c


Table 9
commentary

Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.

Table 10
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
speaker Windbag Index
index value index terms
Hillary Clinton
0
+0.0%
0
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
+0.0% +0.0% +0.0% +0.0% +0.0% +0.0% +0.0% +0.0%
<div>2580.000 6004.000</div><div>1090 2580.000</div><div>541 1184.000</div><div>347 701.000</div><div>224 382.000</div><div>48 140.000</div><div>232 398.000</div><div>225 232</div>
Donald Trump
0
+0.0%
0
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
+0.0% +0.0% +0.0% +0.0% +0.0% +0.0% +0.0% +0.0%
<div>3012.000 7139.000</div><div>976 3012.000</div><div>510 1281.000</div><div>267 741.000</div><div>251 529.000</div><div>55 178.000</div><div>219 447.000</div><div>207 219</div>
Table 10
legend
The Windbag Index is 1/(t1*t2*...*t9) where t1,t2,...,t8 are

t1 :: fraction of words that are non-stop

t2 :: fraction of non-stop words that are unique

t3 :: fraction of nouns that are unique

t4 :: fraction of verbs that are unique

t5 :: fraction of adjectives that are unique

t6 :: fraction of adverbs that are unique

t7 :: fraction of noun phrases that are unique

t8 :: fraction of noun phrases that are top-level


Large individual terms t1...t9 contribute to a smaller index.

The percentage values below the index and each term are relative differences to the other speaker's corresponding term (i.e. 100*(a-b)/b where a is the value for one speaker and b for the other).
Table 10
commentary

Word Clouds

In the word clouds below, the size of the word is proportional to the number of times it was used by a candidate (method details).

Not all words from a group used to draw the cloud fit in the image — less frequently used words for large word groups may fall outside the image.

All Words for Each Candidate

Each candidate's debate portion was extracted and frequencies were compiled for each part of speech (noun, verb, adjective, adverb), with words colored by their part of speech category.

The distribution of sizes within a tag cloud follows the frequency distribution of words. However, word size cannot be compared between clouds, since the minimum and maximum size of the words is fixed.

Debate Word Cloud for Hillary Clinton - all words

Debate tag cloud for Hillary Clinton

Debate Word Cloud for Donald Trump - all words

Debate tag cloud for Donald Trump
commentary

Exclusive Words for Each Candidate

The clouds below show words used exlusively by a candidate. For example, if candidate A used the word "invest" (any number of times), but candidate B did not, then the word will appear in the exclusive word tag cloud for candidate A.

Words exclusive to Hillary Clinton

Debate tag cloud for Hillary Clinton

Words exclusive to Donald Trump

Debate tag cloud for Donald Trump
commentary

Part of Speech Word Clouds

In these clouds, words from each major part of speech were colored based on whether they were exclusive to a candidate or shared by the candidates.

The size of the word is relative to the frequency for the candidate — word sizes between candidates should not be used to indicate difference in absolute frequency.

Cloud of noun words, by speaker

commentary

Cloud of verb words, by speaker

commentary

Cloud of adjective words, by speaker

commentary

Cloud of adverb words, by speaker

commentary

Cloud of all words, by speaker

commentary

Word Pair Clouds for Each Candidate

word pairs for Hillary Clinton

^ adjective/adjective by Hillary Clinton
^ adjective/adverb by Hillary Clinton
^ adjective/noun by Hillary Clinton
^ adjective/verb by Hillary Clinton
^ adverb/adverb by Hillary Clinton
^ adverb/noun by Hillary Clinton
^ adverb/verb by Hillary Clinton
^ noun/noun by Hillary Clinton
^ noun/verb by Hillary Clinton
^ verb/verb by Hillary Clinton

word pairs for Donald Trump

^ adjective/adjective by Donald Trump
^ adjective/adverb by Donald Trump
^ adjective/noun by Donald Trump
^ adjective/verb by Donald Trump
^ adverb/adverb by Donald Trump
^ adverb/noun by Donald Trump
^ adverb/verb by Donald Trump
^ noun/noun by Donald Trump
^ noun/verb by Donald Trump
^ verb/verb by Donald Trump
commentary

Downloads

Debate transcript

Parsed word lists and word clouds (word lists, part of speech lists, noun phrases, sentences) (word clouds)

Raw data structure

Please see the methods section for details about these files.