Lexical Analysis of 2008 US Presidential and Vice-Presidential Debates home | Martin Krzywinski : projects contact

HOME // results and analysis Obama/McCain (1st) :: Obama/McCain (2nd) :: Obama/McCain (3nd) :: Obama/McCain (combined) :: Biden vs Palin

Lexical Analysis of
Barack Obama vs John McCain (2nd debate)

Word Statistics

Debate Word Count

Summary Word Count

The summary word count reports the total number of words and the number of unique, non-stop words used by each candidate. Word number is expressed as both absolute and relative values.

Table 1. Number of all words and unique words used by each speaker.
speaker word count
Barack Obama
7,031 1,305
52.1% 16.6%
58621169
John McCain
6,454 1,262
47.9% 17.4%
53291125
all
13,485 1,929
100.0% 13.2%
117101775
Table 1 Analysis

Proportions of total and unique words are similar to those of the first debate.

Overall, the 2nd debate saw fewer words (13,485 vs 14,572, a decrease of +7.5%. As before, Obama contributed more words to the debate, with 7,031 words vs 6,454 for McCain (+8.9% more - compare this to the 7% difference seen in the first debate).

The difference in the fraction of unique words for both candidates was greater than seen in the first debate. Here, Obama edged over McCain by +3.4%.

Table 1 Legend
a c
b d
3010
a :: total number of words
b :: proportion of words in the debate
c :: unique words in (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c

Stop Word Contribution

In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words are frequently-used bridging words (e.g. pronouns and conjunctions) and do not carry inherent meaning. The fraction of words that are stop words is one measure of the complexity of speech.

Table 2. Expanded analysis of total, stop and non-stop word count.
speaker word category
all stop non-stop
Barack Obama
7,031 1,305
52.1% 18.6%
57261305
3,992 136
56.8% 3.4%
3856136
3,039 1,169
43.2% 38.5%
18701169
John McCain
6,454 1,262
47.9% 19.6%
51921262
3,598 137
55.7% 3.8%
3461137
2,856 1,125
44.3% 39.4%
17311125
all
13,485 1,929
100.0% 14.3%
115561929
7,590 154
56.3% 2.0%
7436154
5,895 1,775
43.7% 30.1%
41201775
Table 2 Analysis

Stop word contribution for the 2nd debate is virtually identical to what was seen for the first debate. Actually, this is a little creepy.

In the first debate stop words accounted for 56.6% and 55.7% of words by Obama and McCain. In this debate, the numbers are 56.8% and 55.7%.

Table 2 Legend
a c
b d
3010
a :: total number of words, for a given category (all, stop, non-stop)
b :: (a) relative to words in the debate if category=all, otherwise relative to words by the candidate
c :: number of unique words with set (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c

All further analysis uses debate content that has been filtered for stop words.

Word frequency

The word frequency table summarizes the frequency with which words were used. Specifically, the average word frequency and the weighted cumulative frequencies at 50 and 90 percentile. The average word frequency indicates how many times, on average, a word is used. For a given fraction of the entire delivery, the weighted cumulative frequency indicates the largest word frequency within this fraction (details about weighted cumulative distribution).

Table 3. Average, 50%, and 90% weighted cumulative word frequencies (content filtered for stop words).
speaker word frequency
Barack Obama
2.60 4.00 25.00
2.6004.00025.000
John McCain
2.54 4.00 20.00
2.5394.00020.000
all
3.32 7.00 41.00
3.3217.00041.000
Table 3 Analysis

Word frequency profile is also virtually identical to the first debate. Obama continues with a slighly higher average word frequency.

Table 3 Legend
a b c
51025
a :: average word frequency
b :: largest word frequency in 50% of content
c :: largest word frequency in 90% of content
bar :: proportion of a:b:c

Sentence Size

Table 4. Number of words in a sentence, as measured by average number of words, 50% and 90% weighted cumulative values for three word groups (all words, stop words and non-stop words).
speaker sentence size (by word type)
all stop non-stop
Barack Obama
17.3 24.0 52.0
17.318
24.000
52.000
10.2 14.0 31.0
10.184
14.000
31.000
7.5 11.0 23.0
7.522
11.000
23.000
John McCain
14.5 21.0 47.0
14.536
21.000
47.000
8.5 12.0 26.0
8.506
12.000
26.000
6.5 9.0 21.0
6.491
9.000
21.000
all
15.9 23.0 48.0
15.865
23.000
48.000
9.3 13.0 28.0
9.313
13.000
28.000
7.0 10.0 22.0
6.985
10.000
22.000
Table 4 Analysis

In the second debate, McCain' sentences were significantly shorter than in the first debate (6.5 words vs 7.1 words), whereas for Obama sentence length remained relatively constant (7.5 vs 7.7). The first debate was a podium format and the second, a town-hall format in which some questions were asked by the audience. This more informal approach, where the candidates were engaging directly with the audience members as well as interjecting each other, likely contributed to the shorter sentences. Interestingly, McCain was more susceptible to this than Obama, whose sentence length was virtually identical to that in the more formal debate.

Table 4 Legend
a b c
15
30
75
a :: average sentence size
b :: largest sentence size for 50% of content
c :: largest sentence size for 90% of content
bar :: proportion of a:b:c

Part of Speech Analysis

In this section, word frequency is broken down by their part of speech (POS). The four POS groups examined are nouns, verbs, adjectives and adverbs. Conjunctions and prepositions are not considered. The first category (n+v+adj+adv) is composed of all four POS groups.

Part of Speech Count

Table 5. Count of words (total and unique) categorized by part of speech (POS).
parts of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
2,914 1,139
100.0% 39.1%
91459841734426519411072
1,512 598
51.9% 39.6%
914598
761 344
26.1% 45.2%
417344
459 194
15.8% 42.3%
265194
182 72
6.2% 39.6%
11072
John McCain
2,758 1,101
100.0% 39.9%
80859346332519619511068
1,401 593
50.8% 42.3%
808593
788 325
28.6% 41.2%
463325
391 195
14.2% 49.9%
196195
178 68
6.5% 38.2%
11068
all
5,672 1,740
100.0% 30.7%
19949191020529541309254106
2,913 919
51.4% 31.5%
1994919
1,549 529
27.3% 34.2%
1020529
850 309
15.0% 36.4%
541309
360 106
6.3% 29.4%
254106
Table 5 Analysis

The difference in adverb use seen in the first debate disappeared. Instead, use of adjectives showed greater split, with their use increasing for Obama (15.8% vs 15.1%) and remaining constant with McCain (14.2%). The number of unique adjectives dropped for Obama by 2.5% (42.3% for this debate and 45.1% for the first debate) whereas McCain' unique adjectives radically increased (49.9% for this debate and 45.1% for the first debate).

McCain also showed lower noun use from the first debate (50.8% for this debate vs 53.8% for the first debate), instead increasing his verb use (28.6% for this debate vs 26.5% for the first debate).

Other than adjectives, Obama' part of speech breakdown was relatively unchanged.

Table 5 Legend
a c
b d
1535
a :: total number of words for a given POS (all, noun, verb, adjective, adverb)
b :: (a) relative to all words by candidate
c :: unique words in (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c

Part of Speech Frequency

Table 5. Frequency of words by part of speech (POS).
part of speech frequency
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
2.56 4.0 23
2.5584.00023.000
2.53 4.0 18
2.5284.00018.000
2.08 3.0 26
2.0793.00026.000
2.37 4.0 17
2.3664.00017.000
2.53 4.0 32
2.5284.00032.000
John McCain
2.50 4.0 22
2.5054.00022.000
2.36 3.0 15
2.3633.00015.000
2.17 3.0 20
2.1753.00020.000
2.00 3.0 8
2.0053.0008.000
2.62 4.0 14
2.6184.00014.000
all
3.26 7.0 41
3.2607.00041.000
3.17 6.0 25
3.1706.00025.000
2.68 4.0 51
2.6774.00051.000
2.75 5.0 18
2.7515.00018.000
3.40 7.0 41
3.3967.00041.000
Table 5 Analysis

The second debate saw quite a few differences in the frequency of parts of speech.

Although noun frequency remained relatively the same, McCain was seen repeating more verbs than Obama, a reverse of the first debate. Here, McCain's average verb frequency was 2.17 (2.06 in the first debate) and Obama's was 2.08 (2.10 in the first debate).

The use of both adjectives and adverbs saw changes. McCain decreased the repetition of adjectives from 2.22 in the first debate to 2.00 in this debate and slighly upped his adverb frequency from 2.55 to 2.62. Obama on the other hand reduced adverb repetition significantly from 2.92 to 2.53.

Table 5 Legend
a b c
51025
a :: average word frequency
b :: largest word frequency in 50% of content
c :: largest word frequency in 90% of content
bar :: proportion of a:b:c

Part of Speech Pairing

Through word pairing, I attempt to capture the contextual use of parts of speech within a sentence and extract concepts from the text. Specifically, unique pairs of words indicate complexity and inter-relatedness between concepts in a sentence.

Table 6a (Barack Obama). Word pairs (total and unique) categorized by part of speech (POS) for Barack Obama.
parts of speech pairings - Barack Obama
noun verb adjective adverb
noun
4,209 3,451
24.6% 82.0%
7583451
verb
3,970 3,397
23.2% 85.6%
5733397
793 725
4.6% 91.4%
68725
adjective
2,692 2,246
15.7% 83.4%
4462246
1,135 967
6.6% 85.2%
168967
358 311
2.1% 86.9%
47311
adverb
969 821
5.7% 84.7%
148821
479 416
2.8% 86.8%
63416
297 262
1.7% 88.2%
35262
60 53
0.4% 88.3%
753
Table 6b (John McCain). Word pairs (total and unique) categorized by part of speech (POS) for John McCain.
parts of speech pairings - John McCain
noun verb adjective adverb
noun
3,432 2,828
24.6% 82.4%
6042828
verb
3,359 2,826
24.1% 84.1%
5332826
722 643
5.2% 89.1%
79643
adjective
1,974 1,633
14.2% 82.7%
3411633
861 758
6.2% 88.0%
103758
234 210
1.7% 89.7%
24210
adverb
778 660
5.6% 84.8%
118660
432 358
3.1% 82.9%
74358
233 191
1.7% 82.0%
42191
40 34
0.3% 85.0%
634
Table 6c (Barack Obama vs John McCain). Word Pairs (total and unique) categorized by part of speech (POS) for both candidates.
parts of speech pairings
noun (n) verb (v) adjective (adj) adverb (adv)
noun
4,209 3,432
  81.5%
82.0% 82.4%
4209.000
3451
3432.000
2828
verb
3,970 3,359
  84.6%
85.6% 84.1%
3970.000
3397
3359.000
2826
793 722
  91.0%
91.4% 89.1%
793.000
725
722.000
643
adjective
2,692 1,974
  73.3%
83.4% 82.7%
2692.000
2246
1974.000
1633
1,135 861
  75.9%
85.2% 88.0%
1135.000
967
861.000
758
358 234
  65.4%
86.9% 89.7%
358.000
311
234.000
210
adverb
969 778
  80.3%
84.7% 84.8%
969.000
821
778.000
660
479 432
  90.2%
86.8% 82.9%
479.000
416
432.000
358
297 233
  78.5%
88.2% 82.0%
297.000
262
233.000
191
60 40
  66.7%
88.3% 85.0%
60.000
53
40.000
34
Table 6 Analysis

Compared to the first debate, the number of unique pairs relative to all pairs for a given part of speech group remained unchanged, with the exception of adverb/adjective pairs. In the first debate Obama's unique component was 83.3% and McCain's 90.4%. In this debate these values are 88.2% and 82.0% respectively. Obama repeated the same adjective/adverb pairs, whereas McCain decreased his repetition.

Given that Obama's sentences were longer for this debate than McCain's (and by a larger margin than for the first debate), it is a surprise that the total number of word pairs was not significantly lower for McCain for this debate. In fact, most of the Obama/McCain pairing ratio values increased (e.g. verb/noun increased from 80.5% to 84.6%, verb/verb from 82.2% to 91.0%, adverb/noun from 72.4% to 80.3%, adverb/verb from 70.9% to 90.2%). The largest drop was in the adjective/adjective pairings, from 78.7% in the first debate to 65.4% in this debate.

Table 6a,b Legend
a c
b d
3010
a :: total number of pairs, for a given category (e.g. verb/noun)
b :: (a) relative to all pairs
c :: number of unique pairs within set (a)
d :: (c) relative to (a)
bar :: proportion of (a-c):c
Table 6c Legend
a c
  d
b e
50
45
35
30
a :: total number of pairs for Barack Obama
b :: relative unique pairs for Barack Obama
c :: total pairs for John McCain
d :: (c) relative to (a) (i.e. John McCain relative to Barack Obama)
e :: relative unique pairs for John McCain
bars :: values of (a), (b), (c) and (e)

Word usage

This section enumerates words that were unique to a canddiate (e.g. used by one candidate but not the other). For a given part of speech, the table breaks down the number of words that were spoken by only one of the candidates or both candidates (intersection). The last row includes all words (union).

Table 7. Total and unique words used exclusively by a candidate or by both candidates.
parts of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
943 639
100.0% 67.8%
16.6% 36.7%
304639
16732686204491141238
493 326
52.3% 66.1%
16.9% 35.5%
167326
167326
290 204
30.8% 70.3%
18.7% 38.6%
86204
86204
163 114
17.3% 69.9%
19.2% 36.9%
49114
49114
50 38
5.3% 76.0%
13.9% 35.8%
1238
1238
John McCain
907 601
100.0% 66.3%
16.0% 34.5%
306601
19132160185371152134
512 321
56.4% 62.7%
17.6% 34.9%
191321
191321
245 185
27.0% 75.5%
15.8% 35.0%
60185
60185
152 115
16.8% 75.7%
17.9% 37.2%
37115
37115
55 34
6.1% 61.8%
15.3% 32.1%
2134
2134
both
3,822 500
100.0% 13.1%
67.4% 28.7%
3322500
16362728741404558022134
1,908 272
49.9% 14.3%
65.5% 29.6%
1636272
1636272
1,014 140
26.5% 13.8%
65.5% 26.5%
874140
874140
535 80
14.0% 15.0%
62.9% 25.9%
45580
45580
255 34
6.7% 13.3%
70.8% 32.1%
22134
22134
all
5,672 1,740
100.0% 30.7%
100.0% 100.0%
39321740
19949191020529541309254106
2,913 919
51.4% 31.5%
100.0% 100.0%
1994919
1994919
1,549 529
27.3% 34.2%
100.0% 100.0%
1020529
1020529
850 309
15.0% 36.4%
100.0% 100.0%
541309
541309
360 106
6.3% 29.4%
100.0% 100.0%
254106
254106
Table 7 Analysis

Obama lowered his adverb use from 6.9% in the first debate to 5.3%. McCain, on the other hand, found a new penchant for them and his speech now had 6.1% adverb content, from 5.2%. Obama' decrease adverb use was accompanied by a higher unique adverb ratio - he not only used fewer adverbs, he repeated them less frequently.

McCain also increased his adjective use from 14.4% to 16.8% and decreased repetition of them.

McCain used fewer verbs and his unique contribution to the debate also fell. In the first debate, he contributed 39.9% unique verbs to the debate. The second time around, he only contributed 35.0%. This difference was offset with Obama's decreased unique adverb contribution, from 39.8% to 35.8%.

Table 7c Legend
a d
b e
c f
4030
40302015105
a :: total number of words unique to a candidate, for a given POS group
b :: (a) relative to all unique words to the candidate
c :: (a) relative to all words
d :: unique words in (a)
e :: (d) relative to (a)
f :: (d) relative to all unique words
bar1 :: normalized ratio of (a-d):d
bar2 :: absolute ratio of (a-d):d for all POS groups (first column) or POS group (other columns)

Noun Phrase Usage

Noun phrases were extracted from the text and analyzed for frequency, word count, unique word count and richness.

Top-level noun phrases are those without a parent noun phrase (a parent phrase is one that a similar, longer phrase). Derived noun phrases are those with a parent (more details about noun phrase analysis).

The top-level noun phrases can be interpreted as independent concepts. Derived noun phrases can be interpreted as variants on concepts embodied by the top-level phrases.

Noun Phrase Count

This table reports the absolute number of noun phrases, which is related to the number of total words (specifically, nouns) delivered. The next table presents the number of phrases relative to the number of nouns.

Table 8. Number of noun phrases.
speaker noun phrase
all top-level derived
Barack Obama
822 677
100.0% 82.4%
145677
376 360
45.7% 95.7%
16360
446 317
54.3% 71.1%
129317
John McCain
801 646
100.0% 80.6%
155646
388 360
48.4% 92.8%
28360
413 286
51.6% 69.2%
127286
Table 8 Analysis

Obama delivered slightly more noun phrases than McCain on this debate (+2.6% more), a difference smaller than for the first debate, where he delivered 4% more phrases. This time, McCain had relatively more top-level phrases than Obama, at 48.4%, increasing his relative top-level noun phrase by more than 3% from the previous debate. McCain also delivered a greater fraction of unique top-level phrases, at 96.6%, relative to the first debate (92.8%).

Obama's noun phrase profile is relatively unchanged from the first debate. He did, however, deliver slightly greater fraction of unique phrases during this debate, at 84.3% (vs 82.4% for the first debae). Table 8c Legend

a c
b d
1070
a :: number of noun phrases
b :: (a) relative to number of all noun phrases
c :: number of unique phrases
d :: (c) relative to (a)
bar :: normalized ratio of (a-c):c

Noun Phrase Richness

The previous table presented the total number of noun phrases, which can be equated to individual concepts. In this table, this value is shown relative to the number of nouns used. The interpretation of this ratio is that of richness. In other words, how many noun phrases were constructed, per noun.

Table 9. Number of noun phrases relative to the number of nouns.
speaker noun phrase
all top-level derived
Barack Obama
0.54 1.13
0.5436507936507941.13210702341137
0.25 0.60
0.2486772486772490.602006688963211
0.29 0.53
0.2949735449735450.530100334448161
John McCain
0.57 1.09
0.5717344753747321.0893760539629
0.28 0.61
0.2769450392576730.6070826306914
0.29 0.48
0.2947894361170590.482293423271501
Table 9 Analysis

The difference in the richness of all phrases was smaller between the candidates for this debate. In the first debate, the difference was +8.4% in favour to Obama and during this debate this dropped to +3.7%, still in Obama's favour.

McCain's uniqueness for top-level noun phrases dropped from 0.61 in the first debate to 0.56, whereas Obama's increased slightly.

Table 9c Legend
a b
25
a :: ratio of the number of noun phrases to number of nouns
b :: ratio of the number of unique noun phrases to number of unique nouns
bar :: ratio of a:b

Noun Phrase Frequency and Size

Table 10. Noun phrase frequency, word count and unique word count.
speaker noun phrase
avg frequency word count unique word count
Barack Obama
1.21 1.00 3.00
1.2141.0003.000
2.78 3.00 7.00
2.7833.0007.000
2.73 3.00 7.00
2.7343.0007.000
John McCain
1.24 1.00 4.00
1.2401.0004.000
2.66 3.00 7.00
2.6593.0007.000
2.61 3.00 7.00
2.6133.0007.000
Table 10 Analysis

Noun phrase frequency dropped slightly for both candidates for this debate, but the drop is minor. Word counts remained the same.

Table 10c Legend
a b c
51020
a :: average
b :: 50% weighted cumulative value
c :: 90% weighted cumulative value
bar1 :: normalized ratio of a:b:c

Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.

Table 11. Windbag Index for each speaker. The higher the value, the greater the degree of repetition in the speech.
speaker Windbag Index
index value index terms
Barack Obama
405
+15.2%
405.799107368748
0.432 0.385 0.396 0.452 0.423 0.396 0.824 0.532 1.132
-2.3% -2.3% -6.6% +9.6% -15.3% +3.6% +2.1% -4.6% +3.9%
0.4322287014649410.3846660085554460.3955026455026460.452036793692510.4226579520697170.3956043956043960.823600973236010.5317577548005911.13210702341137
John McCain
352
-13.2%
352.296365996521
0.443 0.394 0.423 0.412 0.499 0.382 0.806 0.557 1.089
+2.4% +2.4% +7.0% -8.8% +18.0% -3.4% -2.1% +4.8% -3.8%
0.4425162689804770.393907563025210.423269093504640.412436548223350.4987212276214830.3820224719101120.8064918851435710.5572755417956661.0893760539629
Table 11 Analysis

Obama's Windbag Index is still higher than McCain's. The difference is the same as seen in the first debate. The index dropped slightly for both candidates, compared to the first debate, suggesting that the town hall format contributes to less repetitive and long-winded speech.

When individual index terms are compared, Obama maintains a stronger showing in the noun phrase component - all of his terms are higher than McCain' (i.e. contributing to a lower index). Obama continues to do better with verbs, but does significantly better with adverbs than in the first debate.

Table 11c Legend
The Windbag Index is 1/(t1*t2*...*t9) where t1,t2,...,t9 are the individual terms. These terms are

t1 :: fraction of words which are non-stop
t2 :: fraction of non-stop words which are unique
t3 :: fraction of nouns which are unique
t4 :: fraction of verbs which are unique
t5 :: fraction of adjectives which are unique
t6 :: fraction of adverbs which are unique
t7 :: fraction of noun phrases which are unique
t8 :: fraction of noun phrases which have no parent
t9 :: ratio of unique noun phrases to unique nouns

Note that large individual terms t1...t9 contribute to a smaller index.

The percentage values below the index and each term are relative differences to the other speaker' corresponding term (i.e. 100*(x-x0)/x0 where x is the value for the present speaker and x0 for the other speaker).

Tag Clouds

In the tag clouds below, the size of the word is proportional to the number of times it was used by a candidate (tag cloud details).

Not all words from a group used to draw the cloud fit in the image. Specifically, less frequently used words for large word groups fall outside the image.

Debate Tag Clouds for Each Candidate - All Words

Each candidate's debate portion was extracted and frequencies were compiled for each part of speech (noun, verb, adjective, adverb), with words colored by their part of speech category. The words in these tag clouds include words unique to one candidate as well as words used by both candidates. For other tag clouds below, only words unique to a candidate are used.

Keep in mind that the word sizes between tag clouds cannot be directly compared, since the minimum and maximum size of the words in each tag cloud is the same. However, the distribution of sizes within a tag cloud reflects the frequency distribution of words (tag cloud details).

Debate Tag Cloud for Barack Obama - all words

Debate tag cloud for Barack Obama

Debate Tag Cloud for John McCain - all words

Debate tag cloud for John McCain
Debate Tag Cloud Analysis

The main words in Obama' tag cloud remain the same: "just", "think", "important". It is interesting to see the word "nuclear" disappear from the center of the cloud. The main new word is "health", a testament to a major topic of the debate.

McCain, on the other hand, continues to use "nuclear" and in fact the relative frequency of this word is larger than before. "friends" now appears large, as do instances of "america", "american" and "americans". Presumably McCain is attempting to connect to the swing voters.

Debate Tag Clouds for Each Candidate - Unique Words

The tag clouds below show only used exlusively by a candidate. For example, if candidate A used the word "invest" (any number of times), but the other candidate B did not, then the word will appear in the unique word tag cloud for candidate A.

Debate Tag Cloud for Barack Obama - words unique to Barack Obama

Debate tag cloud for Barack Obama

Debate Tag Cloud for John McCain - words unique to John McCain

Debate tag cloud for John McCain
Unique Word Tag Cloud Analysis

For Obama, "actually" takes center stage, displacing "absolutely" from the first debate. Obama's cloud now has a more prominent central component, suggesting that he repeated certain words with relatively higher frequency than in the first debate. Words with relatively increased frequency are "interested", "harder" and "enormous". Note that these tag clouds show words exclusive to a candidate. McCain did not use "intersted" or "harder".

McCain continues to press with "fought" and "petraeus", words consistenly avoided by Obama. McCain' use of "nervous" is interesting - certainly not a word that soothes.

Part of Speech Tag Clouds

In these tag clouds, words by both candidates were categorized on the basis of exclusivity to a candidate. Words unique to each candidate are drawn with a different color. Words used by both candidates are shown in grey.

The size of the word is relative to the frequency for the candidate - word sizes between candidates should not be used to indicate difference in absolute frequency.

Words were further cateogorized by part of speech (noun, verb, adjective, adverb) and individual tag clouds were prepared for each category.

The last tag cloud in this section, which uses all (noun + verb + adjective + adverb) parts of speech.

Tag Cloud of noun words, by speaker

Noun Tag Cloud Analysis

The noun tag cloud shows fewer words exclusive to Obama with high relative frequencies. The sea of green in the cloud from the first debate is replaced by a more even mixture of green/blue words. In fact, words unique to each candidate were used with relatively lower frequency in this debate, when compared to the most frequently used words.

Tag Cloud of verb words, by speaker

Verb Tag Cloud Analysis

Obama says "making" and "help", whereas McCain presses for "control" and "fought". Obama' use of verbs was more varied in this debate, as indicated by fewer large words in this cloud.

Tag Cloud of adjective words, by speaker

Adjective Tag Cloud Analysis

The adjective tag cloud has significantly different morphology in this debate when compared to the cloud from the first debate. Plainly, there are more large words in this debate' cloud, indicating more words with relatively high frequencies.

McCain repeted adjectives like "refundable" and "excess", with Obama using "enormous", "federal", "responsible" and "interested" relatively more frequently than before.

Note the much less prominant "nuclear" in this debate - presumably people in the town forum were more interested in notions better described by "responsible", "federal", and "middle-income".

Tag Cloud of adverb words, by speaker

Adverb Tag Cloud Analysis

Adverbs are weakly used by both candidates. Most of them are relatively generic. However, McCain is seen using "militarily" in relative high abundance.

Tag Cloud of all words, by speaker

All Tag Cloud Analysis

Once all parts of speech are mixed, both candidates showed less repetition of exclusive words, as indicated by the larger proportion of grey words.

Word Pair Vignette Tag Clouds for Each Candidate

Tag Cloud of word pairs by Barack Obama

adjective/adjective by Barack Obama

adjective/adverb by Barack Obama

adjective/noun by Barack Obama

adjective/verb by Barack Obama

adverb/adverb by Barack Obama

adverb/noun by Barack Obama

adverb/verb by Barack Obama

noun/noun by Barack Obama

noun/verb by Barack Obama

verb/verb by Barack Obama

Word Pair Tag Cloud Analysis for Barack Obama.

A huge difference in Obama' noun/noun cloud, with extremely prominent (i.e. repeated) "care health", "health insurance", "children health" and "energy need". Obama' noun/noun cloud from the first debate was much flatter, showing less repetition.

Tag Cloud of word pairs by John McCain

adjective/adjective by John McCain

adjective/adverb by John McCain

adjective/noun by John McCain

adjective/verb by John McCain

adverb/adverb by John McCain

adverb/noun by John McCain

adverb/verb by John McCain

noun/noun by John McCain

noun/verb by John McCain

verb/verb by John McCain

Word Pair Tag Cloud Analysis for John McCain.

McCain continues with military concepts in his adjective/noun cloud, with "nuclear power" and "national security" being very prominent. It is clear from McCain' noun/noun cloud that health care was a huge topic, because McCain shifted his speech from a stance that focused on military and threats to "care health", "american people" and "homes value".

Downloads

debate transcript (courtesy of CNN).

parsed word lists (analyzed transcript, including words by speaker, by POS, and all POS pairings).

tag cloud images

data structure

Please see the methods section for details about these files.