home > results and commentary > Barack Obama vs. John McCain (1st debate)

Word Analysis of 2008 U.S. Presidential Debates

Barack Obama vs. John McCain (1st debate)

26 September 2008



Word Statistics

Debate Word Count

Summary Word Count

The summary word count reports the total number of words and the number of unique, non-stop words used by each candidate. Word number is expressed as both absolute and relative values.

Table 1. Number of all words and unique words used by each speaker.
speaker word count
Barack Obama
7,529 1,376
51.7% 16.5%
62861243
John McCain
7,043 1,380
48.3% 17.6%
58001243
all
14,572 2,115
100.0% 13.5%
126031969
Table 1 Analysis

The candidates' time allowance was equal and given the fact that both candidates used approximately the same number of words, it can be concluded that the global cadence of speech is similar.

Although I am not surprised that ratio of the total number of used words is similar (Obama delivered 7,529 words, 7% more than McCain's 7,043), the fact that the total number of unique words was nearly identical for both candidates (1,376 vs 1,380) was a shock. Though both Obama and McCain can be considered articulate, Obama presents as verbally sharper than McCain and his delivery has a greater nimbleness to it, which is reflected in his slightly higher volume of word delivery. During his unscripted deliveries, Obama's manner hints at a significant command of the English language and suggests that his verbal abilities are not stretched. For this reason, I was expecting his unique word count to be higher.

The fact that the unique word count is identical suggests a high degree of rehearsal and preparation. It may well be that both candidates spent significant amount of time being coached to effect the best delivery that would reach the most number of people. It may also be that through the process of political selection, both candidates epitomize an archetype of spoken word delivery.

It also came as a surprise that the total number of unique words used by both candidates was only 2,115. Initially, I felt this to be low — surely the matters of state require more than two thousand words. For both candidates, I suspect a significant amount of coaching towards conformity to the average American's comprehension.

Table 1 Legend
a c
b d
3010
a :: total number of words
b :: proportion of words in the debate
c :: unique words in (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c

Stop Word Contribution

In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words are frequently-used bridging words (e.g. pronouns and conjunctions) and do not carry inherent meaning. The fraction of words that are stop words is one measure of the complexity of speech.

Table 2. Expanded analysis of total, stop and non-stop word count.
speaker word category
all stop non-stop
Barack Obama
7,529 1,376
51.7% 18.3%
61531376
4,263 133
56.6% 3.1%
4130133
3,266 1,243
43.4% 38.1%
20231243
John McCain
7,043 1,380
48.3% 19.6%
56631380
3,922 137
55.7% 3.5%
3785137
3,121 1,243
44.3% 39.8%
18781243
all
14,572 2,115
100.0% 14.5%
124572115
8,185 146
56.2% 1.8%
8039146
6,387 1,969
43.8% 30.8%
44181969
Table 2 Analysis

Obama's absolute stop word count is higher than McCains (4,263 vs 3,922) but Obama's total word count is also higher. When the total number of words is considered, Obama and McCain stop word delivery is similar, at 56.6% and 55.7%, respectively.

Stop word counts do not reveal significant difference between the two candidates.

Table 2 Legend
a c
b d
3010
a :: total number of words, for a given category (all, stop, non-stop)
b :: (a) relative to words in the debate if category=all, otherwise relative to words by the candidate
c :: number of unique words with set (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c

All further analysis uses debate content that has been filtered for stop words.

Word frequency

The word frequency table summarizes the frequency with which words were used. Specifically, the average word frequency and the weighted cumulative frequencies at 50 and 90 percentile. The average word frequency indicates how many times, on average, a word is used. For a given fraction of the entire delivery, the weighted cumulative frequency indicates the largest word frequency within this fraction (details about weighted cumulative distribution).

Table 3. Average, 50%, and 90% weighted cumulative word frequencies (content filtered for stop words).
speaker word frequency
Barack Obama
2.63 4.00 24.00
2.6284.00024.000
John McCain
2.51 4.00 21.00
2.5114.00021.000
all
3.24 7.00 39.00
3.2447.00039.000
Table 3 Analysis

Both Obama and McCain average word frequency is similar, at 2.63 and 2.51, respectively. 50% of Obama's speech is composed of words he used 4 times or fewer — identical to McCain. 90% of Obama's (McCain's) speech was composed of words used 24 (21) times, or fewer, with the difference not being significant.

Table 3 Legend
a b c
51025
a :: average word frequency
b :: largest word frequency in 50% of content
c :: largest word frequency in 90% of content
bar :: proportion of a:b:c

Sentence Size

Table 4. Number of words in a sentence, as measured by average number of words, 50% and 90% weighted cumulative values for three word groups (all words, stop words and non-stop words).
speaker sentence size (by word type)
all stop non-stop
Barack Obama
17.4 25.0 61.0
17.388
25.000
61.000
10.0 15.0 33.0
9.960
15.000
33.000
7.7 11.0 26.0
7.739
11.000
26.000
John McCain
15.9 20.0 42.0
15.863
20.000
42.000
9.0 11.0 25.0
8.995
11.000
25.000
7.1 10.0 21.0
7.077
10.000
21.000
all
16.6 23.0 53.0
16.616
23.000
53.000
9.5 13.0 28.0
9.473
13.000
28.000
7.4 10.0 24.0
7.401
10.000
24.000
Table 4 Analysis

Obama's sentences are 9% longer (17.4 vs 15.9) when all words are considered and 8% longer (7.7 vs 7.1) when stop words are removed. This sentence size difference is commensurate with the difference in total word delivery, suggesting that the total number of sentences by the candidates was similar. In fact, Obama delivered 449 sentences and McCain 460 (not shown).

Table 4 Legend
a b c
15
30
75
a :: average sentence size
b :: largest sentence size for 50% of content
c :: largest sentence size for 90% of content
bar :: proportion of a:b:c

Part of Speech Analysis

In this section, word frequency is broken down by their part of speech (POS). The four POS groups examined are nouns, verbs, adjectives and adverbs. Conjunctions and prepositions are not considered. The first category (n+v+adj+adv) is composed of all four POS groups.

Part of Speech Count

Table 5. Count of words (total and unique) categorized by part of speech (POS).
parts of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
3,118 1,211
100.0% 38.8%
100364542436125921314073
1,648 645
52.9% 39.1%
1003645
785 361
25.2% 46.0%
424361
472 213
15.1% 45.1%
259213
213 73
6.8% 34.3%
14073
John McCain
3,004 1,210
100.0% 40.3%
95266343436323419210165
1,615 663
53.8% 41.1%
952663
797 363
26.5% 45.5%
434363
426 192
14.2% 45.1%
234192
166 65
5.5% 39.2%
10165
all
6,122 1,926
100.0% 31.5%
22291034981601579319271108
3,263 1,034
53.3% 31.7%
22291034
1,582 601
25.8% 38.0%
981601
898 319
14.7% 35.5%
579319
379 108
6.2% 28.5%
271108
Table 5 Analysis

The composition of the candidates' speech by part of speech is remarkably similar. The relative breakdown of nouns, verbs, adjectives and adverbs for Obama is 53:25:15:7 and 54:26:14:5 for McCain. I am more than mildly surprised at such an incredible uniformity in the speech of the candidates. The ratio of noun:verb:adjective:adverb reduces to about 8:4:2:1.

Within each POS category, the number of unique words is nearly identical for both candidates, with Obama (McCain) having 39% (41%), 46% (45%), 45% (45%) and 34% (39%) of their nouns, verbs, adjectives and adverbs unique. The largest difference is in the use of adverbs, with McCain having 39.2% of all his adverbs unique, whereas Obama's adverbs have a unique component of 34.3%.

Note that Obama uses adverbs more than McCain (6.8% vs 5.5%) — his speech included 213 adverbs (73 unique) whereas McCain used 166 adverbs (65 unique).

Table 5 Legend
a c
b d
1535
a :: total number of words for a given POS (all, noun, verb, adjective, adverb)
b :: (a) relative to all words by candidate
c :: unique words in (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c

Part of Speech Frequency

Table 5. Frequency of words by part of speech (POS).
part of speech frequency
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
2.58 4.0 24
2.5754.00024.000
2.56 4.0 18
2.5554.00018.000
2.10 3.0 16
2.0953.00016.000
2.22 3.0 16
2.2163.00016.000
2.92 5.0 36
2.9185.00036.000
John McCain
2.48 4.0 21
2.4834.00021.000
2.44 4.0 22
2.4364.00022.000
2.06 3.0 25
2.0583.00025.000
2.22 3.0 11
2.2193.00011.000
2.55 4.0 13
2.5544.00013.000
all
3.18 6.0 40
3.1796.00040.000
3.16 6.0 31
3.1566.00031.000
2.49 4.0 39
2.4944.00039.000
2.81 5.0 20
2.8155.00020.000
3.51 6.0 49
3.5096.00049.000
Table 5 Analysis

This table hints at a significant difference in verb and adverb use.

As indicated in the previous table, McCain used fewer total adverbs than Obama (166 vs 213), but his unique adverb fraction was higher (39.2% vs 34.3%). It looks like Obama really likes adverbs, and really likes repeating them too. Obama's average adverb frequency was 2.92, compared to McCain's 2.55. Moreover, 90% of Obama's adverbs were used with a frequency of 36 times or less, whereas 90% of McCain's adverbs were used with a frequency of 13 or less.

Obama, however, is significantly less repetitive with verbs, with 90% of his verbs used 16 times or less, compared to 90% of McCain's verbs which were used 25 times or less. Thus, although the candidates' total and unique verb count was similar (see previous table), Obama's distribution in verb frequency was skewed towards less repetition.

Table 5 Legend
a b c
51025
a :: average word frequency
b :: largest word frequency in 50% of content
c :: largest word frequency in 90% of content
bar :: proportion of a:b:c

Part of Speech Pairing

Through word pairing, I attempt to capture the contextual use of parts of speech within a sentence and extract concepts from the text. Specifically, unique pairs of words indicate complexity and inter-relatedness between concepts in a sentence.

Table 6a (Barack Obama). Word pairs (total and unique) categorized by part of speech (POS) for Barack Obama.
parts of speech pairings — Barack Obama
noun verb adjective adverb
noun
5,040 4,107
25.1% 81.5%
9334107
verb
4,999 4,232
24.9% 84.7%
7674232
1,012 901
5.0% 89.0%
111901
adjective
2,960 2,496
14.7% 84.3%
4642496
1,310 1,159
6.5% 88.5%
1511159
338 303
1.7% 89.6%
35303
adverb
1,183 973
5.9% 82.2%
210973
588 514
2.9% 87.4%
74514
336 280
1.7% 83.3%
56280
84 66
0.4% 78.6%
1866
Table 6b (John McCain). Word pairs (total and unique) categorized by part of speech (POS) for John McCain.
parts of speech pairings — John McCain
noun verb adjective adverb
noun
4,212 3,355
26.2% 79.7%
8573355
verb
4,024 3,304
25.0% 82.1%
7203304
832 753
5.2% 90.5%
79753
adjective
2,322 1,895
14.4% 81.6%
4271895
944 816
5.9% 86.4%
128816
266 234
1.7% 88.0%
32234
adverb
857 719
5.3% 83.9%
138719
417 363
2.6% 87.1%
54363
249 225
1.5% 90.4%
24225
49 45
0.3% 91.8%
445
Table 6c (Barack Obama vs John McCain). Word Pairs (total and unique) categorized by part of speech (POS) for both candidates.
parts of speech pairings
noun (n) verb (v) adjective (adj) adverb (adv)
noun
5,040 4,212
  83.6%
81.5% 79.7%
5040.000
4107
4212.000
3355
verb
4,999 4,024
  80.5%
84.7% 82.1%
4999.000
4232
4024.000
3304
1,012 832
  82.2%
89.0% 90.5%
1012.000
901
832.000
753
adjective
2,960 2,322
  78.4%
84.3% 81.6%
2960.000
2496
2322.000
1895
1,310 944
  72.1%
88.5% 86.4%
1310.000
1159
944.000
816
338 266
  78.7%
89.6% 88.0%
338.000
303
266.000
234
adverb
1,183 857
  72.4%
82.2% 83.9%
1183.000
973
857.000
719
588 417
  70.9%
87.4% 87.1%
588.000
514
417.000
363
336 249
  74.1%
83.3% 90.4%
336.000
280
249.000
225
84 49
  58.3%
78.6% 91.8%
84.000
66
49.000
45
Table 6 Analysis

Obama had consistenly more total and unique pairings than McCain. This is largely due to the fact that Obama had longer sentences (7.7 non-stop words) than McCain (7.1 non-stop words).

The largest pairing difference was seen in the adverb/adverb category, where obama had nearly twice as many pairings (84 vs 49) than McCain. In other POS pairing categories, McCain' numbers were consistently 70-85% that of Obama's.

When unique pairings are compared, the candidates fare more equally, both having 80-90% of pairings unique. The only exception was adverb/adverb pairs, with Obama's unique component being 78.6% compared to McCain's 91.8% (Obama repeats adverbs).

Table 6a,b Legend
a c
b d
3010
a :: total number of pairs, for a given category (e.g. verb/noun)
b :: (a) relative to all pairs
c :: number of unique pairs within set (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c
Table 6c Legend
a c
  d
b e
50
45
35
30
a :: total number of pairs for Barack Obama
b :: relative unique pairs for Barack Obama
c :: total pairs for John McCain
d :: (c) relative to (a) (i.e. John McCain relative to Barack Obama)
e :: relative unique pairs for John McCain
bars :: values of (a), (b), (c) and (e)

Word usage

This section enumerates words that were unique to a canddiate (e.g. used by one candidate but not the other). For a given part of speech, the table breaks down the number of words that were spoken by only one of the candidates or both candidates (intersection). The last row includes all words (union).

Table 7. Total and unique words used exclusively by a candidate or by both candidates.
parts of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
1,124 716
100.0% 63.7%
18.4% 37.2%
408716
217371110238571273443
588 371
52.3% 63.1%
18.0% 35.9%
217371
217371
348 238
31.0% 68.4%
22.0% 39.6%
110238
110238
184 127
16.4% 69.0%
20.5% 39.8%
57127
57127
77 43
6.9% 55.8%
20.3% 39.8%
3443
3443
John McCain
1,065 715
100.0% 67.1%
17.4% 37.1%
350715
20138985240471062035
590 389
55.4% 65.9%
18.1% 37.6%
201389
201389
325 240
30.5% 73.8%
20.5% 39.9%
85240
85240
153 106
14.4% 69.3%
17.0% 33.2%
47106
47106
55 35
5.2% 63.6%
14.5% 32.4%
2035
2035
both
3,933 495
100.0% 12.6%
64.2% 25.7%
3438495
18112747861234758621730
2,085 274
53.0% 13.1%
63.9% 26.5%
1811274
1811274
909 123
23.1% 13.5%
57.5% 20.5%
786123
786123
561 86
14.3% 15.3%
62.5% 27.0%
47586
47586
247 30
6.3% 12.1%
65.2% 27.8%
21730
21730
all
6,122 1,926
100.0% 31.5%
100.0% 100.0%
41961926
22291034981601579319271108
3,263 1,034
53.3% 31.7%
100.0% 100.0%
22291034
22291034
1,582 601
25.8% 38.0%
100.0% 100.0%
981601
981601
898 319
14.7% 35.5%
100.0% 100.0%
579319
579319
379 108
6.2% 28.5%
100.0% 100.0%
271108
271108
Table 7 Analysis

Previous tables indicated that speech delivery for the candidates is incredibly uniform. This table shows each candidate's contribution to unique words in the debate.

There were a total of 1,034 unique nouns used in the debate, with 371 (35.9%) used only by Obama, 389 (37.6%) by McCain and 274 (26.5%) by both. In fact, for all parts of speech the candidates had more words that they used exclusively than those they shared with each other. Even for adverbs, which is the least populated group of words, the candidates shared only 30 adverbs, and had 43 (Obama) and 35 (McCain) that they used exclusively.

It is not surprising that the proportion of unique words was larger for the set of exclusive words than for the set of shared words. Typically, the unique proportion within exclusive words were around (60-70%) but much lower at 12-15% for shared words. This indicates that the words spoken by both candidates were repeated much more frequently.

Table 7c Legend
a d
b e
c f
4030
40302015105
a :: total number of words unique to a candidate, for a given POS group
b :: (a) relative to all unique words to the candidate
c :: (a) relative to all words
d :: unique words in (a)
e :: (d) relative to (a)
f :: (d) relative to all unique words
bar1 :: normalized ratio of (a-d):d
bar2 :: absolute ratio of (a-d):d for all POS groups (first column) or POS group (other columns)

Noun Phrase Usage

Noun phrases were extracted from the text and analyzed for frequency, word count, unique word count and richness.

Top-level noun phrases are those without a parent noun phrase (a parent phrase is one that a similar, longer phrase). Derived noun phrases are those with a parent (more details about noun phrase analysis).

The top-level noun phrases can be interpreted as independent concepts. Derived noun phrases can be interpreted as variants on concepts embodied by the top-level phrases.

Noun Phrase Count

This table reports the absolute number of noun phrases, which is related to the number of total words (specifically, nouns) delivered. The next table presents the number of phrases relative to the number of nouns.

Table 8. Number of noun phrases.
speaker noun phrase
all top-level derived
Barack Obama
885 746
100.0% 84.3%
139746
413 394
46.7% 95.4%
19394
472 352
53.3% 74.6%
120352
John McCain
851 709
100.0% 83.3%
142709
384 371
45.1% 96.6%
13371
467 338
54.9% 72.4%
129338
Table 8 Analysis

Obama has +4.0% more noun phrases than McCain (855 vs 851). The difference between the fraction of unique noun phrases, however, is smaller between Obama and McCain, whose noun phrase uniqueness is 84.3% and 83.3%. Relatively to the number of noun phrases, the number of top-level phrases is similar between them, as is the top-level uniqueness ratio.

Table 8c Legend
a c
b d
1070
a :: number of noun phrases
b :: (a) relative to number of all noun phrases
c :: number of unique phrases
d :: (c) relative to (a)
bar :: normalized ratio of (a-c):c

Noun Phrase Richness

The previous table presented the total number of noun phrases, which can be equated to individual concepts. In this table, this value is shown relative to the number of nouns used. The interpretation of this ratio is that of richness. In other words, how many noun phrases were constructed, per noun.

Table 9. Number of noun phrases relative to the number of nouns.
speaker noun phrase
all top-level derived
Barack Obama
0.54 1.16
0.5370145631067961.15658914728682
0.25 0.61
0.2506067961165050.610852713178295
0.29 0.55
0.2864077669902910.545736434108527
John McCain
0.53 1.07
0.5269349845201241.06938159879336
0.24 0.56
0.2377708978328170.559577677224736
0.29 0.51
0.2891640866873060.509803921568627
Table 9 Analysis

The ratios here are very similar. Extremely similar, in fact, with the exception of the ratio of unique noun phrases to unique nouns, which is 1.16 for Obama and 1.07 for McCain. The interpretation is that Obama constructed a greater diversity of distinct concepts with his nouns.

Table 9c Legend
a b
25
a :: ratio of the number of noun phrases to number of nouns
b :: ratio of the number of unique noun phrases to number of unique nouns
bar :: ratio of a:b

Noun Phrase Frequency and Size

Table 10. Noun phrase frequency, word count and unique word count.
speaker noun phrase
avg frequency word count unique word count
Barack Obama
1.19 1.00 3.00
1.1861.0003.000
2.73 3.00 7.00
2.7313.0007.000
2.69 3.00 7.00
2.6883.0007.000
John McCain
1.20 1.00 3.00
1.2001.0003.000
2.76 3.00 7.00
2.7643.0007.000
2.70 3.00 7.00
2.7023.0007.000
Table 10 Analysis

Values are nearly identical for both candidates. Both repeat noun phrases an average of 1.19-1.20 times, and have 2.73-2.76 words per noun phrase.

Table 10c Legend
a b c
51020
a :: average
b :: 50% weighted cumulative value
c :: 90% weighted cumulative value
bar1 :: normalized ratio of a:b:c

Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.

Table 11. Windbag Index for each speaker. The higher the value, the greater the degree of repetition in the speech.
speaker Windbag Index
index value index terms
Barack Obama
422
+14.7%
422.58413716989
0.434 0.381 0.391 0.460 0.451 0.343 0.843 0.528 1.157
-2.1% -4.4% -4.7% +1.0% +0.1% -12.5% +1.2% +0.9% +8.2%
0.4337893478549610.3805878750765460.3913834951456310.4598726114649680.4512711864406780.3427230046948360.8429378531073450.5281501340482571.15658914728682
John McCain
368
-12.8%
368.318302055936
0.443 0.398 0.411 0.455 0.451 0.392 0.833 0.523 1.069
+2.2% +4.6% +4.9% -1.0% -0.1% +14.3% -1.2% -0.9% -7.5%
0.4431350276870650.3982697853252160.4105263157894740.4554579673776660.4507042253521130.3915662650602410.8331374853113980.523272214386461.06938159879336
Table 11 Analysis

Obama's Windbag Index is +14.7% when compared to McCain's, at 422 vs 368.

The index is a compound score, with contributions from nine terms. Individually, Obama does better at the verb, adjective and noun phrase components. McCain, on the other hand, has superior contributions from word counts, nouns and adverbs.

Table 11c Legend
The Windbag Index is 1/(t1*t2*...*t9) where t1,t2,...,t9 are the individual terms. These terms are

t1 :: fraction of words which are non-stop
t2 :: fraction of non-stop words which are unique
t3 :: fraction of nouns which are unique
t4 :: fraction of verbs which are unique
t5 :: fraction of adjectives which are unique
t6 :: fraction of adverbs which are unique
t7 :: fraction of noun phrases which are unique
t8 :: fraction of noun phrases which have no parent
t9 :: ratio of unique noun phrases to unique nouns

Note that large individual terms t1...t9 contribute to a smaller index.

The percentage values below the index and each term are relative differences to the other speaker' corresponding term (i.e. 100*(x-x0)/x0 where x is the value for the present speaker and x0 for the other speaker).

Tag Clouds

In the tag clouds below, the size of the word is proportional to the number of times it was used by a candidate (tag cloud details).

Not all words from a group used to draw the cloud fit in the image. Specifically, less frequently used words for large word groups fall outside the image.

Debate Tag Clouds for Each Candidate — All Words

Each candidate's debate portion was extracted and frequencies were compiled for each part of speech (noun, verb, adjective, adverb), with words colored by their part of speech category. The words in these tag clouds include words unique to one candidate as well as words used by both candidates. For other tag clouds below, only words unique to a candidate are used.

Keep in mind that the word sizes between tag clouds cannot be directly compared, since the minimum and maximum size of the words in each tag cloud is the same. However, the distribution of sizes within a tag cloud reflects the frequency distribution of words (tag cloud details).

Debate Tag Cloud for Barack Obama — all words

Debate tag cloud for Barack Obama

Debate Tag Cloud for John McCain — all words

Debate tag cloud for John McCain
Debate Tag Cloud Analysis

The tag clouds for all words used by each candidate powerfully show the difference in word frequency distribution between Obama and McCain. In a few tables, I indicated the average and 50%/90% weighted cumulative values for frequencies, but did not explicity show a distribution. Well, these tag clouds show that.

McCain's cloud has a significantly more large words, when compared to Obama's, indicating that McCain repeated a larger subset of words throughout the debate. For example, McCain's use of the word "nuclear" was nearly as frequent as his use of the word "Obama". On the other hand, Obama's use of "nuclear" was smaller than his use of the word "McCain".

It is also interesting to see that Obama very frequently used "John", calling his opponent by his first name, whereas McCain never used Obama's first name, Barack.

Debate Tag Clouds for Each Candidate — Unique Words

The tag clouds below show only used exlusively by a candidate. For example, if candidate A used the word "invest" (any number of times), but the other candidate B did not, then the word will appear in the unique word tag cloud for candidate A.

Debate Tag Cloud for Barack Obama — words unique to Barack Obama

Debate tag cloud for Barack Obama

Debate Tag Cloud for John McCain — words unique to John McCain

Debate tag cloud for John McCain
Unique Word Tag Cloud Analysis

The tag cloud composed of words used exclusively by McCain' indicates a high degree of relative repetition of a small subset of the words. The center of McCain's tag cloud is bloated with large text, indicating high relative usage of words like "afraid", "serious", "fragile", and "badly". Remember, these are words unique to McCain — Obama did not use these words.

Obama's tag cloud shows relatively less repetition among the words used only by Obama. In general, the words used by Obama that were not used by McCain are more uniformly distributed in frequency. It is surprising words like "recognize", "strategic", "solve", "invest", and "agree" are unique to Obama (they were not used by McCain).

Part of Speech Tag Clouds

In these tag clouds, words by both candidates were categorized on the basis of exclusivity to a candidate. Words unique to each candidate are drawn with a different color. Words used by both candidates are shown in grey.

The size of the word is relative to the frequency for the candidate — word sizes between candidates should not be used to indicate difference in absolute frequency.

Words were further cateogorized by part of speech (noun, verb, adjective, adverb) and individual tag clouds were prepared for each category.

The last tag cloud in this section, which uses all (noun + verb + adjective + adverb) parts of speech.

Tag Cloud of noun words, by speaker

Noun Tag Cloud Analysis

Not surprisingly, the candidates' most frequent noun was Obama (for McCain) and John (for Obama). As I mentioned previously, it is curious to find that McCain never refered to Obama by his first name, Barack.

The cloud of green words around the central core of the tag cloud indicates that nouns unique to Obama appeared at a higher relative frequency than McCain.

Some interesting nouns for Obama are "alternative", "fundamentals", "medicare", and "diplomacy". On the other hand, words like "restraint", "failure", "corruption" and "maverick" are unique to McCain.

Tag Cloud of verb words, by speaker

Verb Tag Cloud Analysis

The top verb unique to McCain was "control", closely followed by "fought" and "succeed", followed by verbs like "defeat", "win", and "legitimize". For Obama, the top unique verb was "getting", followed by "invest", "funded", "recognize", "agree" and "rebuild", but those were of relatively lower frequencies than for words at the same rank in McCain's list. McCain repeats strong verbs.

If the verbs are an indication of action planned for and supported by the candidates, then McCain is someone who wishes to "legitimize" and "succeed [at] control", whereas Obama is more conciliatory and positive with "invest", "focused", "solve" and "recognize".

Tag Cloud of adjective words, by speaker

Adjective Tag Cloud Analysis

McCain's exclusive use of "afraid", "serious" and "fragile" are interesting and hint at fear mongering.

Tag Cloud of adverb words, by speaker

Adverb Tag Cloud Analysis

Adverbs are the least frequent of the four parts of speech, so the tag cloud here is less complex. Both candidates use strong and certain action modifiers like "completely" (McCain) and "absolutely" (Obama). As for other parts of speech, McCain had high relatively frequency of terms unique to him, and this is evident by a more large blue words in this tag cloud.

It is interesting to see Obama exclusively use words like "responsibly" and "structurally".

Tag Cloud of all words, by speaker

All Tag Cloud Analysis

When all parts of speech are combined into one tag cloud, Obama's unique words swamp out those of McCain', suggesting that when parts of speech are combined, Obama repeated terms exclusive to him more frequently.

Word Pair Vignette Tag Clouds for Each Candidate

Tag Cloud of word pairs by Barack Obama

adjective/adjective by Barack Obama
adjective/adverb by Barack Obama
adjective/noun by Barack Obama
adjective/verb by Barack Obama
adverb/adverb by Barack Obama
adverb/noun by Barack Obama
adverb/verb by Barack Obama
noun/noun by Barack Obama
noun/verb by Barack Obama
verb/verb by Barack Obama
Word Pair Tag Cloud Analysis for Barack Obama.

The major contributors to Obama's word pair tag clouds are open-ended word pairs such as "last several" (adjective/adjective), "correct just" (adjective/adverb), "mccain senator" (noun/noun). A couple of concepts such as "al qaeda" (qaeda was tagged as a verb by the Brill tagger), "north korea" were prominent, but these are proper nouns and reflect the topic under discussion.

Obama touched on many concepts, as indicated by the relatively flat distribution of sizes in the noun/noun tag cloud. Some of these were "care health", "biodiesel energy", "oil world", "energy wind". Some curious ones were "john spending", "crisis day", "afghanistan iraq", "deal russia". These should be contrasted to noun/noun pairs for McCain (below), which focused on threats and the military.

Tag Cloud of word pairs by John McCain

adjective/adjective by John McCain
adjective/adverb by John McCain
adjective/noun by John McCain
adjective/verb by John McCain
adverb/adverb by John McCain
adverb/noun by John McCain
adverb/verb by John McCain
noun/noun by John McCain
noun/verb by John McCain
verb/verb by John McCain
Word Pair Tag Cloud Analysis for John McCain.

McCain' pair tag clouds have significantly different morphology than those of Obama. Primarily, due to McCain' repetitive use of certain words, the tag clouds are overwhelmed with these frequent (therefore large) pairs.

His adjective/noun tag clouds has an apocalyptic theme: "nuclear threat", "important thing", "long way", and (this is fascinating) "old russian" and "next states".

The noun/noun tag cloud size distribution is relatively flat, like that of Obama, and indicates topics such as "threat weapons", "business tax", "ahmadinejad extermination" and "aggression georgia". The majority of McCain's noun/noun concepts were threat- and military-related (contrast this to Obama, who was focused more on energy and economy). Environment? What environment?

Downloads

debate transcript (courtesy of CNN).

parsed word lists (analyzed transcript, including words by speaker, by POS, and all POS pairings).

tag cloud images

data structure

Please see the methods section for details about these files.