Lexical Analysis of 2008 US Presidential and Vice-Presidential Debates home | Martin Krzywinski : projects contact

HOME // results and analysis Obama/McCain (1st) :: Obama/McCain (2nd) :: Obama/McCain (3nd) :: Obama/McCain (combined) :: Biden vs Palin

Lexical Analysis of
Barack Obama vs John McCain (combined debates)

There be dragons here. The results here are based on a combined transcript from all debates between these candidates. When interpreting metric values from this analysis (e.g. fraction of unique words), keep in mind that you are looking at results based on speech of multiple debates, not one. In the limit of an infinite number of debates, most metrics will converge to a value that is characteristic of the speaker (e.g. total vocabulary size).

Word Statistics

Debate Word Count

Summary Word Count

The summary word count reports the total number of words and the number of unique, non-stop words used by each candidate. Word number is expressed as both absolute and relative values.

Table 1. Number of all words and unique words used by each speaker.
speaker word count
Barack Obama
21,818 2,523
52.1% 10.9%
194492369
John McCain
20,020 2,446
47.9% 11.4%
177302290
all
41,838 3,656
100.0% 8.3%
383483490
Table 1 Analysis

Across all the debates, the candidates delivered just short of 42,000 words. With each debate approximately 1.5 hours in length, the amount of unique words delivered by both candidates corresponds to a delivery rate of one unique word every 4.4 seconds (3 debates x 1.5 hours x 3600 s = 16,200 seconds). The average speech rate was 2.6 words per second.

Obama delivered +9.0% more words than McCain and had a larger overall vocabulary, by +3.1%.

Table 1 Legend
a c
b d
3010
a :: total number of words
b :: proportion of words in the debate
c :: unique words in (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c

Stop Word Contribution

In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words are frequently-used bridging words (e.g. pronouns and conjunctions) and do not carry inherent meaning. The fraction of words that are stop words is one measure of the complexity of speech.

Table 2. Expanded analysis of total, stop and non-stop word count.
speaker word category
all stop non-stop
Barack Obama
21,818 2,523
52.1% 11.6%
192952523
12,235 154
56.1% 1.3%
12081154
9,583 2,369
43.9% 24.7%
72142369
John McCain
20,020 2,446
47.9% 12.2%
175742446
11,063 156
55.3% 1.4%
10907156
8,957 2,290
44.7% 25.6%
66672290
all
41,838 3,656
100.0% 8.7%
381823656
23,298 166
55.7% 0.7%
23132166
18,540 3,490
44.3% 18.8%
150503490
Table 2 Analysis

Overall, Obama's stop word fraction was slightly higher than McCain's. However, Obama delivered more words throughout the debates and displayed greater range of vocabulary, with 2,369 unique non-stop words, +3.4% more than McCain.

Table 2 Legend
a c
b d
3010
a :: total number of words, for a given category (all, stop, non-stop)
b :: (a) relative to words in the debate if category=all, otherwise relative to words by the candidate
c :: number of unique words with set (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c

All further analysis uses debate content that has been filtered for stop words.

Word frequency

The word frequency table summarizes the frequency with which words were used. Specifically, the average word frequency and the weighted cumulative frequencies at 50 and 90 percentile. The average word frequency indicates how many times, on average, a word is used. For a given fraction of the entire delivery, the weighted cumulative frequency indicates the largest word frequency within this fraction (details about weighted cumulative distribution).

Table 3. Average, 50%, and 90% weighted cumulative word frequencies (content filtered for stop words).
speaker word frequency
Barack Obama
4.04 10.00 63.00
4.04510.00063.000
John McCain
3.91 8.00 53.00
3.9118.00053.000
all
5.31 16.00 113.00
5.31216.000113.000
Table 3 Analysis

Absolute values of word frequency statistics for a combined debate transcript are not useful because they are directly proportional to the length of the concatenated transcript. In the limit of a large number of debates, total vocabulary size approaches a limit, and as word count goes up so does word frequency.

However, a comparison between the candidates can still be made. Obama's word frequency is slightly higher than McCain's, but not by much (+3.3%).

Table 3 Legend
a b c
51025
a :: average word frequency
b :: largest word frequency in 50% of content
c :: largest word frequency in 90% of content
bar :: proportion of a:b:c

Sentence Size

Table 4. Number of words in a sentence, as measured by average number of words, 50% and 90% weighted cumulative values for three word groups (all words, stop words and non-stop words).
speaker sentence size (by word type)
all stop non-stop
Barack Obama
18.3 26.0 56.0
18.334
26.000
56.000
10.4 15.0 32.0
10.377
15.000
32.000
8.2 11.0 26.0
8.233
11.000
26.000
John McCain
15.5 20.0 46.0
15.471
20.000
46.000
8.7 12.0 26.0
8.725
12.000
26.000
7.0 10.0 22.0
7.042
10.000
22.000
all
16.8 23.0 51.0
16.843
23.000
51.000
9.5 13.0 29.0
9.521
13.000
29.000
7.6 11.0 24.0
7.611
11.000
24.000
Table 4 Analysis

Obama consistently delivers larger sentences, at 8.2 words, compared to McCain, at 7.0 words. Obama's sentence size distribution has a greater component of large sentences. 90% of his speech is in sentences ≤26 words in length, whereas McCain fits 90% of his speech in sentences ≤22 words in length.

Table 4 Legend
a b c
15
30
75
a :: average sentence size
b :: largest sentence size for 50% of content
c :: largest sentence size for 90% of content
bar :: proportion of a:b:c

Part of Speech Analysis

In this section, word frequency is broken down by their part of speech (POS). The four POS groups examined are nouns, verbs, adjectives and adverbs. Conjunctions and prepositions are not considered. The first category (n+v+adj+adv) is composed of all four POS groups.

Part of Speech Count

Table 5. Count of words (total and unique) categorized by part of speech (POS).
parts of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
9,189 2,324
100.0% 25.3%
3574123516297191026406463137
4,809 1,235
52.3% 25.7%
35741235
2,348 719
25.6% 30.6%
1629719
1,432 406
15.6% 28.4%
1026406
600 137
6.5% 22.8%
463137
John McCain
8,633 2,238
100.0% 25.9%
343412531584686824371366115
4,687 1,253
54.3% 26.7%
34341253
2,270 686
26.3% 30.2%
1584686
1,195 371
13.8% 31.0%
824371
481 115
5.6% 23.9%
366115
all
17,822 3,431
100.0% 19.3%
76371859352210962041586894187
9,496 1,859
53.3% 19.6%
76371859
4,618 1,096
25.9% 23.7%
35221096
2,627 586
14.7% 22.3%
2041586
1,081 187
6.1% 17.3%
894187
Table 5 Analysis

This is a great table for the combined debate analysis because it shows the part of speech breakdown across three independent samples of speech and is therefore a more robust measure of the candidates' natural style than a sampling from a single event.

McCain uses more nouns than Obama, with 54.3% of his parts of speech being nouns (remember, in this analysis I only consider nouns, verbs, adjectives and adverbs and all to the exclusion of other parts of speech), whereas Obama's fraction is 52.3%. McCain's +3.8% increase suggests speech with a greater emphasis on concrete concepts.

Verb usage is also greater by McCain, at 26.3% vs Obama's 25.6%. The difference is +2.7%, smaller than for nouns.

Once we get into adjectives and adverbs, however, it's a different story. Obama's use of adjectives and adverbs is significantly higher than McCain's. Obama's adjective fraction is +13.0% larger than McCain's and his adverb fraction is +16.1% larger than McCain's. This suggests that Obama's speech is more nuanced and that he captures and delivers more texture in his nouns and verbs than McCain.

Table 5 Legend
a c
b d
1535
a :: total number of words for a given POS (all, noun, verb, adjective, adverb)
b :: (a) relative to all words by candidate
c :: unique words in (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c

Part of Speech Frequency

Table 5. Frequency of words by part of speech (POS).
part of speech frequency
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
3.95 10.0 63
3.95410.00063.000
3.89 9.0 52
3.8949.00052.000
3.08 6.0 57
3.0776.00057.000
3.53 7.0 50
3.5277.00050.000
4.38 11.0 92
4.38011.00092.000
John McCain
3.86 8.0 53
3.8578.00053.000
3.74 7.0 42
3.7417.00042.000
3.02 5.0 48
3.0235.00048.000
3.22 6.0 21
3.2216.00021.000
4.18 11.0 42
4.18311.00042.000
all
5.19 15.0 113
5.19415.000113.000
5.11 15.0 80
5.10815.00080.000
3.90 10.0 107
3.90410.000107.000
4.48 11.0 58
4.48311.00058.000
5.78 19.0 130
5.78119.000130.000
Table 5 Analysis

Obama's overall part of speech frequency is slightly higher than McCain, but not by much (+2.3%). He consistently has slightly greater repetition of nouns and verbs, at +4.0% and +2.0% more than McCain, respectively.

Obama's adjective and adverb use frequency is much higher than McCain's, however, at +9.6% and +4.8%, respectively. This increase reflects the greater proportion of adjectives and adverbs in Obama's speech.

Table 5 Legend
a b c
51025
a :: average word frequency
b :: largest word frequency in 50% of content
c :: largest word frequency in 90% of content
bar :: proportion of a:b:c

Part of Speech Pairing

Through word pairing, I attempt to capture the contextual use of parts of speech within a sentence and extract concepts from the text. Specifically, unique pairs of words indicate complexity and inter-relatedness between concepts in a sentence.

Table 6a (Barack Obama). Word pairs (total and unique) categorized by part of speech (POS) for Barack Obama.
parts of speech pairings - Barack Obama
noun verb adjective adverb
noun
14,989 10,955
25.0% 73.1%
403410955
verb
14,266 11,157
23.8% 78.2%
310911157
2,880 2,383
4.8% 82.7%
4972383
adjective
9,281 7,026
15.5% 75.7%
22557026
3,997 3,243
6.7% 81.1%
7543243
1,193 969
2.0% 81.2%
224969
adverb
3,430 2,645
5.7% 77.1%
7852645
1,736 1,390
2.9% 80.1%
3461390
1,033 820
1.7% 79.4%
213820
245 169
0.4% 69.0%
76169
Table 6b (John McCain). Word pairs (total and unique) categorized by part of speech (POS) for John McCain.
parts of speech pairings - John McCain
noun verb adjective adverb
noun
12,749 9,276
27.4% 72.8%
34739276
verb
11,354 8,877
24.4% 78.2%
24778877
2,215 1,904
4.8% 86.0%
3111904
adjective
6,652 5,088
14.3% 76.5%
15645088
2,602 2,193
5.6% 84.3%
4092193
729 614
1.6% 84.2%
115614
adverb
2,445 1,955
5.3% 80.0%
4901955
1,159 953
2.5% 82.2%
206953
683 559
1.5% 81.8%
124559
121 96
0.3% 79.3%
2596
Table 6c (Barack Obama vs John McCain). Word Pairs (total and unique) categorized by part of speech (POS) for both candidates.
parts of speech pairings
noun (n) verb (v) adjective (adj) adverb (adv)
noun
14,989 12,749
  85.1%
73.1% 72.8%
14989.000
10955
12749.000
9276
verb
14,266 11,354
  79.6%
78.2% 78.2%
14266.000
11157
11354.000
8877
2,880 2,215
  76.9%
82.7% 86.0%
2880.000
2383
2215.000
1904
adjective
9,281 6,652
  71.7%
75.7% 76.5%
9281.000
7026
6652.000
5088
3,997 2,602
  65.1%
81.1% 84.3%
3997.000
3243
2602.000
2193
1,193 729
  61.1%
81.2% 84.2%
1193.000
969
729.000
614
adverb
3,430 2,445
  71.3%
77.1% 80.0%
3430.000
2645
2445.000
1955
1,736 1,159
  66.8%
80.1% 82.2%
1736.000
1390
1159.000
953
1,033 683
  66.1%
79.4% 81.8%
1033.000
820
683.000
559
245 121
  49.4%
69.0% 79.3%
245.000
169
121.000
96
Table 6 Analysis

Obama has larger delivery of all pairings. The largest difference is in adverb/adverb pairings, with Obama having twice as many as McCain.

When compared to Obama, McCain has significantly lower parings that include adjectives and adverbs. While for combinations of nouns and verbs McCain is at 76-85% of Obama, when adjectives and adverbs are brought into the mix McCain is at 50-72%.

These numbers starkly illustrate Obama's greater penchant for precision and modification.

Table 6a,b Legend
a c
b d
3010
a :: total number of pairs, for a given category (e.g. verb/noun)
b :: (a) relative to all pairs
c :: number of unique pairs within set (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c
Table 6c Legend
a c
  d
b e
50
45
35
30
a :: total number of pairs for Barack Obama
b :: relative unique pairs for Barack Obama
c :: total pairs for John McCain
d :: (c) relative to (a) (i.e. John McCain relative to Barack Obama)
e :: relative unique pairs for John McCain
bars :: values of (a), (b), (c) and (e)

Word usage

This section enumerates words that were unique to a canddiate (e.g. used by one candidate but not the other). For a given part of speech, the table breaks down the number of words that were spoken by only one of the candidates or both candidates (intersection). The last row includes all words (union).

Table 7. Total and unique words used exclusively by a candidate or by both candidates.
parts of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
1,890 1,193
100.0% 63.1%
10.6% 34.8%
6971193
3475761753751182082764
923 576
48.8% 62.4%
9.7% 31.0%
347576
347576
550 375
29.1% 68.2%
11.9% 34.2%
175375
175375
326 208
17.2% 63.8%
12.4% 35.5%
118208
118208
91 64
4.8% 70.3%
8.4% 34.2%
2764
2764
John McCain
1,889 1,107
100.0% 58.6%
10.6% 32.3%
7821107
4796001353281011703145
1,079 600
57.1% 55.6%
11.4% 32.3%
479600
479600
463 328
24.5% 70.8%
10.0% 29.9%
135328
135328
271 170
14.3% 62.7%
10.3% 29.0%
101170
101170
76 45
4.0% 59.2%
7.0% 24.1%
3145
3145
both
14,043 1,131
100.0% 8.1%
78.8% 33.0%
129121131
67616293164309180119182565
7,390 629
52.6% 8.5%
77.8% 33.8%
6761629
6761629
3,473 309
24.7% 8.9%
75.2% 28.2%
3164309
3164309
1,992 191
14.2% 9.6%
75.8% 32.6%
1801191
1801191
890 65
6.3% 7.3%
82.3% 34.8%
82565
82565
all
17,822 3,431
100.0% 19.3%
100.0% 100.0%
143913431
76371859352210962041586894187
9,496 1,859
53.3% 19.6%
100.0% 100.0%
76371859
76371859
4,618 1,096
25.9% 23.7%
100.0% 100.0%
35221096
35221096
2,627 586
14.7% 22.3%
100.0% 100.0%
2041586
2041586
1,081 187
6.1% 17.3%
100.0% 100.0%
894187
894187
Table 7 Analysis

This is another table that benefits from a combined debate treatment. Here we can see the number of words, by part of speech, spoken exclusively by one candidate, or by both. Presumably, as the number of debates increases, the number of words spoken by one candidate but not the other steadily decreases, until it reaches some core value that represents words truly unique to that candidate (e.g. the other candidate does not know the word, or consciously avoids using it).

The key values to draw your attention to are the number of exclusive unique words (first two rows, second column for each part of speech). This number corresponds to the exclusive contribution by each candidate to the vocabulary of the speech.

For example, of the 1,859 unique nouns used in the debate, 629 (33.8%) were spoken by both candidates, 600 (32.3%) by McCain only and 576 (31.0%) by Obama only. McCain thus contributed more nouns to the debate, and his repetition of these words was lower than Obama (55.6% vs 62.4%).

When it comes to verbs, however, Obama's contribution is higher, at 34.2% of all debate verbs vs 29.9% for McCain. Note that verbs were the parts of speech that had the lowest shared fraction - only 28.2% of verbs in the debate were spoken by both candidates.

Obama also contributed a greater variety of adjectives and adverbs to the debate. In particular, Obama's contribution to adverbs was 34.2% compared to 24.1% for McCain. In other words, for every 3 adverbs used by Obama not spoken by McCain, McCain had only 2 not spoken by Obama.

The profile presented in this table closely matches previous the result of previous work by Pennebaker) in which McCain is concluded to be a categorical thinker (heavy noun use), while Obama is fluid and contextual (verb and modifier use).

Table 7c Legend
a d
b e
c f
4030
40302015105
a :: total number of words unique to a candidate, for a given POS group
b :: (a) relative to all unique words to the candidate
c :: (a) relative to all words
d :: unique words in (a)
e :: (d) relative to (a)
f :: (d) relative to all unique words
bar1 :: normalized ratio of (a-d):d
bar2 :: absolute ratio of (a-d):d for all POS groups (first column) or POS group (other columns)

Noun Phrase Usage

Noun phrases were extracted from the text and analyzed for frequency, word count, unique word count and richness.

Top-level noun phrases are those without a parent noun phrase (a parent phrase is one that a similar, longer phrase). Derived noun phrases are those with a parent (more details about noun phrase analysis).

The top-level noun phrases can be interpreted as independent concepts. Derived noun phrases can be interpreted as variants on concepts embodied by the top-level phrases.

Noun Phrase Count

This table reports the absolute number of noun phrases, which is related to the number of total words (specifically, nouns) delivered. The next table presents the number of phrases relative to the number of nouns.

Table 8. Number of noun phrases.
speaker noun phrase
all top-level derived
Barack Obama
2,494 1,934
100.0% 77.5%
5601934
807 770
32.4% 95.4%
37770
1,687 1,164
67.6% 69.0%
5231164
John McCain
2,386 1,863
100.0% 78.1%
5231863
761 723
31.9% 95.0%
38723
1,625 1,140
68.1% 70.2%
4851140
Table 8 Analysis

Obama delivered +3.8% more noun phrases than McCain. He had +6.5% more top-level noun phrases and +2.1% more derived noun phrases. The increase of top-level noun phrases is greater than the increase of derived noun phrases, suggesting greater variation in concept usage.

Table 8c Legend
a c
b d
1070
a :: number of noun phrases
b :: (a) relative to number of all noun phrases
c :: number of unique phrases
d :: (c) relative to (a)
bar :: normalized ratio of (a-c):c

Noun Phrase Richness

The previous table presented the total number of noun phrases, which can be equated to individual concepts. In this table, this value is shown relative to the number of nouns used. The interpretation of this ratio is that of richness. In other words, how many noun phrases were constructed, per noun.

Table 9. Number of noun phrases relative to the number of nouns.
speaker noun phrase
all top-level derived
Barack Obama
0.52 1.57
0.5186109378249121.56599190283401
0.17 0.62
0.1678103555832810.623481781376518
0.35 0.94
0.350800582241630.94251012145749
John McCain
0.51 1.49
0.5090676338809471.48683160415004
0.16 0.58
0.1623639854917860.577015163607342
0.35 0.91
0.3467036483891620.909816440542698
Table 9 Analysis

Number of noun phrases relative to the number of nouns remains relatively constant.

Table 9c Legend
a b
25
a :: ratio of the number of noun phrases to number of nouns
b :: ratio of the number of unique noun phrases to number of unique nouns
bar :: ratio of a:b

Noun Phrase Frequency and Size

Table 10. Noun phrase frequency, word count and unique word count.
speaker noun phrase
avg frequency word count unique word count
Barack Obama
1.29 1.00 6.00
1.2901.0006.000
2.95 4.00 8.00
2.9524.0008.000
2.90 3.00 7.00
2.8993.0007.000
John McCain
1.28 1.00 6.00
1.2811.0006.000
2.95 4.00 7.00
2.9474.0007.000
2.89 4.00 7.00
2.8864.0007.000
Table 10 Analysis

Noun phrase frequency and size remains relatively constant.

Table 10c Legend
a b c
51020
a :: average
b :: 50% weighted cumulative value
c :: 90% weighted cumulative value
bar1 :: normalized ratio of a:b:c

Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.

Table 11. Windbag Index for each speaker. The higher the value, the greater the degree of repetition in the speech.
speaker Windbag Index
index value index terms
Barack Obama
3,741
+15.6%
3741.71562761977
0.439 0.247 0.257 0.306 0.284 0.228 0.775 0.398 1.566
-1.8% -3.3% -3.9% +1.3% -8.7% -4.5% -0.7% +2.6% +5.3%
0.4392244935374460.247208598559950.2568101476398420.3062180579216350.2835195530726260.2283333333333330.7754611066559740.3981385729058941.56599190283401
John McCain
3,235
-13.5%
3235.83080114485
0.447 0.256 0.267 0.302 0.310 0.239 0.781 0.388 1.487
+1.9% +3.4% +4.1% -1.3% +9.5% +4.7% +0.7% -2.5% -5.1%
0.4474025974025970.2556659595846820.2673351824194580.3022026431718060.3104602510460250.2390852390852390.7808046940486170.3880837359098231.48683160415004
Table 11 Analysis

This index is not particularly well suited for a combined analysis, because it is expected that the candidates repeat themselves across three debates. The same points will be brought up, the same questions asked, and so on. Naturally, the more words are said the more words are repeated, since the pool of unique words is fixed.

The Windbag Index is +15.6% greater for Obama. Although he does better for verbs, and 2/3 of the noun phrase metrics, his uniqueness scores in other categories are lower.

Table 11c Legend
The Windbag Index is 1/(t1*t2*...*t9) where t1,t2,...,t9 are the individual terms. These terms are

t1 :: fraction of words which are non-stop
t2 :: fraction of non-stop words which are unique
t3 :: fraction of nouns which are unique
t4 :: fraction of verbs which are unique
t5 :: fraction of adjectives which are unique
t6 :: fraction of adverbs which are unique
t7 :: fraction of noun phrases which are unique
t8 :: fraction of noun phrases which have no parent
t9 :: ratio of unique noun phrases to unique nouns

Note that large individual terms t1...t9 contribute to a smaller index.

The percentage values below the index and each term are relative differences to the other speaker' corresponding term (i.e. 100*(x-x0)/x0 where x is the value for the present speaker and x0 for the other speaker).

Tag Clouds

In the tag clouds below, the size of the word is proportional to the number of times it was used by a candidate (tag cloud details).

Not all words from a group used to draw the cloud fit in the image. Specifically, less frequently used words for large word groups fall outside the image.

Debate Tag Clouds for Each Candidate - All Words

Each candidate's debate portion was extracted and frequencies were compiled for each part of speech (noun, verb, adjective, adverb), with words colored by their part of speech category. The words in these tag clouds include words unique to one candidate as well as words used by both candidates. For other tag clouds below, only words unique to a candidate are used.

Keep in mind that the word sizes between tag clouds cannot be directly compared, since the minimum and maximum size of the words in each tag cloud is the same. However, the distribution of sizes within a tag cloud reflects the frequency distribution of words (tag cloud details).

Debate Tag Cloud for Barack Obama - all words

Debate tag cloud for Barack Obama

Debate Tag Cloud for John McCain - all words

Debate tag cloud for John McCain
Debate Tag Cloud Analysis

Across all the debates, Obama maintains "important" as his most important (ha ha) word. Note "energy", "health", "economic", "care", "tax" and "people" are central concepts.

In stark contrast, McCain truly feels that "nuclear" is an important topic and as relatively important as "Obama".

Debate Tag Clouds for Each Candidate - Unique Words

The tag clouds below show only used exlusively by a candidate. For example, if candidate A used the word "invest" (any number of times), but the other candidate B did not, then the word will appear in the unique word tag cloud for candidate A.

Debate Tag Cloud for Barack Obama - words unique to Barack Obama

Debate tag cloud for Barack Obama

Debate Tag Cloud for John McCain - words unique to John McCain

Debate tag cloud for John McCain
Unique Word Tag Cloud Analysis

The unique word clouds are particularly informative in a combined debate analysis. The more words said, the fewer words are attributed to only one candidate and these gain importance with increased number of debate samples. Remember, these are words spoken by one candidate, but not the other, across all debates.

Obama's unique words have a large noun component, with words such as "notion", "fundamentals", "consequence", and "wages". His most prominent unique word was the verb "agree", which McCain did not use (note: there is no stemming done in the analysis - McCain did use "agreed"). Obama's use of "potentially" suggests openness to complications and the unforeseen.

McCain's unique words on the other hand focus nearly exclusively on verbs. He uses strong action words such as "opposes" and "legitimize" which suggest a confrontational and unilateral view. His top unique adverb was "badly", which suggests an attack stance (presumably the word is used in context of his opponent).

Part of Speech Tag Clouds

In these tag clouds, words by both candidates were categorized on the basis of exclusivity to a candidate. Words unique to each candidate are drawn with a different color. Words used by both candidates are shown in grey.

The size of the word is relative to the frequency for the candidate - word sizes between candidates should not be used to indicate difference in absolute frequency.

Words were further cateogorized by part of speech (noun, verb, adjective, adverb) and individual tag clouds were prepared for each category.

The last tag cloud in this section, which uses all (noun + verb + adjective + adverb) parts of speech.

Tag Cloud of noun words, by speaker

Noun Tag Cloud Analysis

Do you see many blue words? Those are nouns exclusive to McCain and there is is hardly a blue word in sight. It is shocking how overwhelming Obama's delivery drowns out McCain's contribution in the realm of nouns across all the debates.

The third debate saw a cloud like this, but McCain at least managed to get a few words into the cloud.

Tag Cloud of verb words, by speaker

Verb Tag Cloud Analysis

For verbs, McCain's contribution was overwhelming - a situation opposite to that of nouns. Take a look, however, at what Obama brings to the cloud: words like "agree", "invest", "recognize", "focused" and "thinking". Obama's contribution is that of conciliation and careful consideration.

Tag Cloud of adjective words, by speaker

Adjective Tag Cloud Analysis

Split in adjective contribution is more even between the debaters. McCain's curious repetition of "angry", "excess" and "afraid" contrasts Obama's central use of "enormous" as well as "strategic", "easy" and "local".

Tag Cloud of adverb words, by speaker

Adverb Tag Cloud Analysis

McCain, though delivering fewer adverbs than Obama, repeats them quite a bit. Here, his relative usage contribution outweights Obama's. Contrast McCain's "badly" to Obama's "potentially". McCain comes across as a hard-liner whereas Obama comes across as moderate.

Tag Cloud of all words, by speaker

All Tag Cloud Analysis

When all parts of speech are compared, Obama is easily the greater verbal force. McCain's contribution is absolutely swamped out by Obama's unique words.

Word Pair Vignette Tag Clouds for Each Candidate

Tag Cloud of word pairs by Barack Obama

adjective/adjective by Barack Obama

adjective/adverb by Barack Obama

adjective/noun by Barack Obama

adjective/verb by Barack Obama

adverb/adverb by Barack Obama

adverb/noun by Barack Obama

adverb/verb by Barack Obama

noun/noun by Barack Obama

noun/verb by Barack Obama

verb/verb by Barack Obama

Word Pair Tag Cloud Analysis for Barack Obama.

An interesting adjective/adverb pairing frequent for Obama is "military never", as well as "correct quickly". Cross all debates, the top pairings suggest focus on "care health" (large noun/noun component), and "think understand" (large verb/verb component).

Tag Cloud of word pairs by John McCain

adjective/adjective by John McCain

adjective/adverb by John McCain

adjective/noun by John McCain

adjective/verb by John McCain

adverb/adverb by John McCain

adverb/noun by John McCain

adverb/verb by John McCain

noun/noun by John McCain

noun/verb by John McCain

verb/verb by John McCain

Word Pair Tag Cloud Analysis for John McCain.

McCain's repetition of "nuclear power" and "national security" drowns out any mention of economy or domestic policy. His largest verb/verb pairing is "america united" (compare this to "think understand" for Obama), and a large component to adverb/verb is "completely control". McCain's stance is one of nationalism and certainty.

Downloads

debate transcript (courtesy of CNN).

parsed word lists (analyzed transcript, including words by speaker, by POS, and all POS pairings).

tag cloud images

data structure

Please see the methods section for details about these files.