2023 Pi Daylatest newsbuy art
And she looks like the moon. So close and yet, so far.Future Islandsaim highmore quotes
very clickable
art + design
buy artwork Song of the White-crowned Sparrow (Zonotrichia leucophrys) by Martin Krzywinski
VISIONS OF TYPE | Put my typographical posters on your wall. (buy artwork / see all my art)
The 2020 π Day art celebrates the digits of π with piku (パイク) — poetry inspired by haiku. They serve as the form for The Outbreak Poems. On our 2022 Pi Day album "three one four: a number of notes", a piku accompanies each track.
See how much you can see with my modern interpretation of the Snellen eye chart. Test yourself with mathematical operators, physical constants, chemical elements and nautical flags. There's even a version for the sightless.
The charts use my SnellenMK optotype font.

Visions of Type

buy artwork
The last lines of the plays of William Shakespare by Martin Krzywinski
THE FINAL WORDS | The last lines of all Shakespeare plays. (buy artwork / see all my art)

the type of man with the best words

This section celebrates the words of William Shakespeare.

If you love letters in just the right combination, these pages, the art is for you. If you like to delve into the words yourself, use my plain-text annotated version of all his plays.

The posters are available for purchase.

1 · Shakespeare in plain-text

Here I've made all of 37 Shakespeare's plays available in a single plain-text file. Each spoken line and annotation (e.g. start of scene, character exit, etc) are provided on separate and indexed lines.

I am grateful to Liam Larsen's Kaggle project, which was the only plain-text easily parsable version of Shakespeare that I've been able to find. Liam's file didn't include Henry IV Part 2, which I've added to my file as parsed from the Shakespeare pages at MIT.

My format is different than Liam's. I provide more information about what the line represents and annotate some lines with flags to indicate start/end of a segment, such as scene, act, or a character's appearance.

If you spot any errors or inconsistencies in the file, please let me know.

2 · File Format

Here's a snippet of the first and last records from A Comedy of Errors. The field delimiter is a pipe "|".

A_Comedy_of_Errors | play_start | 1966
A_Comedy_of_Errors | act_start | 274 | 1
A_Comedy_of_Errors | scene_start | 1026 | 1 | 1 | A hall in DUKE SOLINUS'S palace.
A_Comedy_of_Errors | enter | 1 | 1 | DUKE SOLINUS, AEGEON, Gaoler, Officers, and other Attendants
A_Comedy_of_Errors | line | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | AEGEON | +a,+ca,+cp,+cs,+p,+s | Proceed, Solinu
s, to procure my fall
A_Comedy_of_Errors | line | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | AEGEON |  | And by the doom of death end woes a
nd all.
A_Comedy_of_Errors | line | 1 | 1 | 3 | 3 | 3 | 3 | 1 | 1 | 1 | DUKE_SOLINUS | +ca,+cp,+cs | Merchant of Syracu
se, plead no more;
A_Comedy_of_Errors | line | 1 | 1 | 4 | 4 | 4 | 4 | 1 | 2 | 2 | DUKE_SOLINUS |  | I am not partial to infringe 
our laws:
...
A_Comedy_of_Errors | line | 5 | 1 | 453 | 1963 | 453 | 1023 | 99 | 1 | 314 | DROMIO_OF_SYRACUSE | -ca,-cp,-cs | We'll draw cuts for the senior: till then lead thou first.
A_Comedy_of_Errors | line | 5 | 1 | 454 | 1964 | 454 | 1024 | 63 | 1 | 185 | DROMIO_OF_EPHESUS |  | Nay, then, thus:
A_Comedy_of_Errors | line | 5 | 1 | 455 | 1965 | 455 | 1025 | 63 | 2 | 186 | DROMIO_OF_EPHESUS |  | We came into the world like brother and brother;
A_Comedy_of_Errors | line | 5 | 1 | 456 | 1966 | 456 | 1026 | 63 | 3 | 187 | DROMIO_OF_EPHESUS | -a,-ca,-cp,-cs,-p,-s | And now let's go hand in hand, not one before another.
A_Comedy_of_Errors | exeunt | 5 | 1 | all
...

Every line has the format

play_name | record_type | ...

where record_type is one of

  play_start - start of the play
   act_start - start of an act
 scene_start - start of a scene
    prologue - start of prologue

       enter - a character enters 
        exit - character or characters exit
      exeunt - character or characters exit

        line - spoken line

        misc - action, emote, death, alarm, or other non-spoken event

The exit and exeunt labels are interchangeable. Although strictly exit is singular and exeunt is plural, there are exit lines in which multiple characters leave. The misc may correspond to an entrance, re-entrance or exit.

Depending on the record_type the line has different number of fields.

#  * indicates the field may be blank (e.g. speaker)

play_start  | spoken_lines_in_play
act_start   | spoken_lines_in_act   | act_number
scene_start | spoken_lines_in_scene | act_number | scene_number | scene_description

prologue    | act_number | 0

enter  | act | scene | speaker* | description
exit   | act | scene | speaker* | description
exeunt | act | scene | speaker* | description

line   | act | scene | line_in_play | line_in_act | line_in_scene | 
         speaker_appearance | line_in_speaker_appearance | speaker_line | 
         flag* | line_text

misc   | act | scene | speaker* | description

All counts start at 1, except the prologue scene number which is 0.

Only spoken lines count towards the line count.

Every speaker has three line counters. speaker_appearance gives the index of the speaker's appearance (contiguous set of lines). line_in_speaker_appearance counts the lines within a speaker's appearance (contiguous set of lines). speaker_line counts the total lines spoken by the speaker across the play. For example, at the start of the Comedy of Errors

# Aegeon's first apperance of 2 lines (running total for Aegeon: 2 lines)
... 1 | 1 | 1 | AEGEON | +a,+ca,+cp,+cs,+p,+s | Proceed, Solinus, to procure my fall
... 1 | 2 | 2 | AEGEON |  | And by the doom of death end woes and all.
# Duke Solinus's first apperance of 23 lines (running total for Duke Solinus: 23 lines)
... 1 | 1 | 1 | DUKE_SOLINUS | +ca,+cp,+cs | Merchant of Syracuse, plead no more;
... 1 | 2 | 2 | DUKE_SOLINUS |  | I am not partial to infringe our laws:
... 1 | 3 | 3 | DUKE_SOLINUS |  | The enmity and discord which of late
    ...
... 1 | 21 | 21 | DUKE_SOLINUS |  | Thy substance, valued at the highest rate,
... 1 | 22 | 22 | DUKE_SOLINUS |  | Cannot amount unto a hundred marks;
... 1 | 23 | 23 | DUKE_SOLINUS |  | Therefore by law thou art condemned to die.
# Aegeon's second appearance of 2 lines (running total for Aegeon: 4 lines)
... 2 | 1 | 3 | AEGEON |  | Yet this my comfort: when your words are done,
... 2 | 2 | 4 | AEGEON |  | My woes end likewise with the evening sun.
# Duke Solinus's second apperance of 3 lines (running total for Duke Solinus: 26 lines)
... 2 | 1 | 24 | DUKE_SOLINUS |  | Well, Syracusian, say in brief the cause
... 2 | 2 | 25 | DUKE_SOLINUS |  | Why thou departed'st from thy native home
... 2 | 3 | 26 | DUKE_SOLINUS |  | And for what cause thou camest to Ephesus.
# Aegeon's third appearance of 65 lines (running total for Aegeon: 69 lines)
... 3 | 1 | 5 | AEGEON |  | A heavier task could not have been imposed
... 3 | 2 | 6 | AEGEON |  | Than I to speak my griefs unspeakable:
... 3 | 3 | 7 | AEGEON |  | Yet, that the world may witness that my end
    ...
... 3 | 63 | 67 | AEGEON |  | Of Corinth that, of Epidaurus this:
... 3 | 64 | 68 | AEGEON |  | But ere they came,--O, let me say no more!
... 3 | 65 | 69 | AEGEON |  | Gather the sequel by that went before.
# Duke Solinus's third apperance of 2 lines (running total for Duke Solinus: 28 lines)
... 3 | 1 | 27 | DUKE_SOLINUS |  | Nay, forward, old man; do not break off so;
... 3 | 2 | 28 | DUKE_SOLINUS |  | For we may pity, though not pardon thee.

The flag field is zero or more of

# first line
+p   in play
+a   in act
+s   in scene

# last line
-p   in play
-a   in act
-s   in scene

# first line of speaker in
+cp  play
+ca  act
+cs  scene

# last line of speaker in
-cp  play
-ca  act
-cs  scene

3 · Examples

3.1 · Last words of death

Searching for "-cp" and "death" gives you all the last lines of a given character in the play that said "death".

> grep "\-cp" shakespeare.all.plays.plain.text.txt | grep death 
A_Winters_Tale | line | 5 | 1 | 242 | 2968 | 242 | 242 | 4 | 6 | 24 | Lord | -ca,-cp,-cs | With divers deaths in death.
Antony_and_Cleopatra | line | 4 | 14 | 114 | 2877 | 532 | 114 | 27 | 2 | 47 | EROS | -ca,-cp,-cs | Of Antony's death.
As_you_like_it | line | 5 | 4 | 17 | 2477 | 234 | 17 | 24 | 1 | 75 | SILVIUS | +cs,-ca,-cp,-cs | Though to have her and death were both one thing.
Coriolanus | line | 5 | 4 | 40 | 3542 | 470 | 40 | 12 | 5 | 38 | Messenger | -ca,-cp,-cs | They'll give him death by inches.
Henry_IV,_Part_1 | line | 5 | 3 | 14 | 2776 | 258 | 14 | 11 | 3 | 41 | SIR_WALTER_BLUNT | -ca,-cp,-cs | Lord Stafford's death.
Henry_VI_Part_1 | line | 1 | 3 | 85 | 418 | 418 | 85 | 1 | 6 | 6 | Officer | -ca,-cp,-cs | henceforward, upon pain of death.
Henry_VI_Part_3 | line | 2 | 2 | 65 | 859 | 274 | 65 | 1 | 3 | 3 | PRINCE | -ca,-cp,-cs | And in that quarrel use it to the death.
King_Lear | line | 4 | 6 | 276 | 2874 | 616 | 276 | 38 | 5 | 76 | OSWALD | -ca,-cp,-cs | Upon the British party: O, untimely death!
Merchant_of_Venice | line | 5 | 1 | 311 | 2650 | 311 | 311 | 36 | 4 | 84 | NERISSA | -ca,-cp,-cs | After his death, of all he dies possess'd of.
Richard_II | line | 4 | 1 | 19 | 1914 | 19 | 19 | 6 | 12 | 22 | BAGOT | -ca,-cp,-cs | In this your cousin's death.
Richard_III | line | 4 | 4 | 200 | 2840 | 500 | 200 | 44 | 13 | 142 | DUCHESS_OF_YORK | -ca,-cp,-cs | Shame serves thy life and doth thy death attend.
Timon_of_Athens | line | 2 | 2 | 94 | 709 | 132 | 94 | 4 | 2 | 7 | Page | -ca,-cp,-cs | dog's death. Answer not; I am gone.
Titus_Andronicus | line | 3 | 1 | 242 | 1281 | 242 | 242 | 1 | 7 | 7 | Messenger | -ca,-cp,-cs | More than remembrance of my father's death.

3.2 · Most spoken lines

Searching for "-cp" and sorting by the speaker's line count gives you a ranked list of the most number of spoken lines in a play. Here are the top 10:

grep "\-cp" shakespeare.all.plays.plain.text.txt | sort -nr +20 -21 | head -10
Hamlet | line | 5 | 2 | 374 | 3963 | 681 | 374 | 358 | 7 | 1498 | HAMLET | -ca,-cp,-cs | Which have solicited. The rest is silence.
Othello | line | 5 | 2 | 350 | 3483 | 494 | 350 | 272 | 2 | 1099 | IAGO | -ca,-cp,-cs | From this time forth I never will speak word.
Henry_V | line | 5 | 2 | 372 | 3216 | 503 | 373 | 147 | 6 | 1029 | KING_HENRY_V | -ca,-cp,-cs | EPILOGUE
Othello | line | 5 | 2 | 411 | 3544 | 555 | 411 | 274 | 2 | 887 | OTHELLO | -ca,-cp,-cs | Killing myself, to die upon a kiss.
Measure_for_measure | line | 5 | 1 | 578 | 2838 | 578 | 578 | 194 | 16 | 857 | DUKE_VINCENTIO | -a,-ca,-cp,-cs,-p,-s | What's yet behind, that's meet you all should know.
Antony_and_Cleopatra | line | 4 | 15 | 70 | 3003 | 658 | 70 | 202 | 9 | 849 | MARK_ANTONY | -ca,-cp,-cs | I can no more.
Timon_of_Athens | line | 5 | 1 | 246 | 2361 | 247 | 247 | 207 | 10 | 824 | TIMON | -ca,-cp,-cs | Sun, hide thy beams! Timon hath done his reign.
Richard_II | line | 5 | 5 | 113 | 2742 | 507 | 113 | 98 | 8 | 758 | KING_RICHARD_II | -ca,-cp,-cs | Whilst my gross flesh sinks downward, here to die.
King_Lear | line | 5 | 3 | 367 | 3480 | 458 | 367 | 187 | 7 | 752 | KING_LEAR | -ca,-cp,-cs | Look there, look there!
Julius_Caesar | line | 5 | 5 | 57 | 2566 | 349 | 57 | 194 | 3 | 728 | BRUTUS | -ca,-cp,-cs | I kill'd not thee with half so good a will.

Hamlet has 1,498 lines, almost 50% more than the next character, Othello, who has 1,099.

3.3 · Longest delivery

Who has the longest delivery? To find out just sort on the line_in_speaker_appearance field.

> grep -w line shakespeare.all.plays.plain.text.txt | sort -nr +18 -19 | head -1
Henry_IV,_Part_2 | line | 1 | 2 | 229 | 496 | 455 | 229 | 10 | 139 | 202 | FALSTAFF |  | so both the degrees prevent my curses. Boy!

It's Sir John Falstaff in Henry IV Part 2, who delivers 139 consecutive lines in his 10th delivery.

After that, it's King Henry V, who delivers 83 consecutive lines in his 2nd delivery.

> grep -w line shakespeare.all.plays.plain.text.txt | sort -nr +18 -19 | grep -v FALSTAFF | head -1
Henry_IV,_Part_2 | line | 5 | 2 | 146 | 2941 | 227 | 146 | 2 | 83 | 101 | KING_HENRY_V | -ca,-cp,-cs,-s | God shorten Harry's happy life one day!

3.4 · Most turns to speak

Hamlet has 358 turns to speak, the most of any character. To find out, sort on the speaker_appearance field.

> grep -w line shakespeare.all.plays.plain.text.txt | sort -nr +16 -17 | head -1
Hamlet | line | 5 | 2 | 374 | 3963 | 681 | 374 | 358 | 7 | 1498 | HAMLET | -ca,-cp,-cs | Which have solicited. The rest is silence.

After Hamlet, it's Othello who has 274 turns to speak.

> grep -w line shakespeare.all.plays.plain.text.txt | sort -nr +16 -17 | grep -v HAMLET | head -1
Othello | line | 5 | 2 | 411 | 3544 | 555 | 411 | 274 | 2 | 887 | OTHELLO | -ca,-cp,-cs | Killing myself, to die upon a kiss.

3.5 · Most tragic

Let's count up the number of times "death" is mentioned by all characters.

# number of times "death" is spoken by character
> grep -w line shakespeare.all.plays.plain.text.txt | grep -i death | cut -d "|" -f 1,12 | suc | sort -nr | head -15
     21 Romeo_and_Juliet | ROMEO 
     18 Measure_for_measure | DUKE_VINCENTIO 
     16 Julius_Caesar | BRUTUS 
     15 Henry_VI_Part_1 | TALBOT 
     14 Romeo_and_Juliet | FRIAR_LAURENCE 
     14 Richard_III | GLOUCESTER 
     13 Hamlet | KING_CLAUDIUS 
     12 Antony_and_Cleopatra | MARK_ANTONY 
     10 Richard_II | KING_RICHARD_II 
     10 Henry_VI_Part_2 | KING_HENRY_VI 
     10 Hamlet | HAMLET 
      9 Romeo_and_Juliet | JULIET 
      9 Measure_for_measure | ISABELLA 
      8 Richard_III | QUEEN_MARGARET 
      8 Richard_III | DUCHESS_OF_YORK

Romeo has 21 lines in which he says "death" (any lines with the word appearing twice is counted only once). After that, it's Duke Vincentio with 18 lines and Brutus with 16 lines.

If we just count the number of times "death" is said in a play, then Romeo and Juliet wins with 73 lines with the word. Followed closely by Richard III with 72 mentions.

# number of times "death" appears in a line
> grep -w line shakespeare.all.plays.plain.text.txt | grep -i death | cut -d "|" -f 1 | suc | sort -nr
     73 Romeo_and_Juliet 
     72 Richard_III 
     63 Henry_VI_Part_2 
     45 Henry_VI_Part_1 
     43 Richard_II 
     42 Measure_for_measure 
     42 Henry_VI_Part_3 
     39 Hamlet 
     35 Antony_and_Cleopatra 
     34 King_John 
     31 Julius_Caesar 
     28 Titus_Andronicus 
     27 Cymbeline 
     24 Henry_IV,_Part_2 
     23 A_Winters_Tale 
     22 King_Lear 
     22 Coriolanus 
     21 Macbeth 
     21 Henry_IV,_Part_1 
     18 Pericles 
     17 Much_Ado_about_nothing 
     17 Alls_well_that_ends_well 
     16 Troilus_and_Cressida 
     15 Othello 
     15 Henry_V 
     14 A_Midsummer_nights_dream 
     12 Merchant_of_Venice 
     10 Twelfth_Night 
     10 Henry_VIII 
      9 A_Comedy_of_Errors 
      8 Timon_of_Athens 
      8 Loves_Labours_Lost 
      7 Two_Gentlemen_of_Verona 
      7 As_you_like_it 
      6 The_Tempest 
      6 Taming_of_the_Shrew 
      6 Merry_Wives_of_Windsor 
news + thoughts

Convolutional neural networks

Thu 17-08-2023

Nature uses only the longest threads to weave her patterns, so that each small piece of her fabric reveals the organization of the entire tapestry. – Richard Feynman

Following up on our Neural network primer column, this month we explore a different kind of network architecture: a convolutional network.

The convolutional network replaces the hidden layer of a fully connected network (FCN) with one or more filters (a kind of neuron that looks at the input within a narrow window).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Convolutional neural networks. (read)

Even through convolutional networks have far fewer neurons that an FCN, they can perform substantially better for certain kinds of problems, such as sequence motif detection.

Derry, A., Krzywinski, M & Altman, N. (2023) Points of significance: Convolutional neural networks. Nature Methods 20:.

Background reading

Derry, A., Krzywinski, M. & Altman, N. (2023) Points of significance: Neural network primer. Nature Methods 20:165–167.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of significance: Logistic regression. Nature Methods 13:541–542.

Neural network primer

Tue 10-01-2023

Nature is often hidden, sometimes overcome, seldom extinguished. —Francis Bacon

In the first of a series of columns about neural networks, we introduce them with an intuitive approach that draws from our discussion about logistic regression.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Neural network primer. (read)

Simple neural networks are just a chain of linear regressions. And, although neural network models can get very complicated, their essence can be understood in terms of relatively basic principles.

We show how neural network components (neurons) can be arranged in the network and discuss the ideas of hidden layers. Using a simple data set we show how even a 3-neuron neural network can already model relatively complicated data patterns.

Derry, A., Krzywinski, M & Altman, N. (2023) Points of significance: Neural network primer. Nature Methods 20:165–167.

Background reading

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of significance: Logistic regression. Nature Methods 13:541–542.

Cell Genomics cover

Mon 16-01-2023

Our cover on the 11 January 2023 Cell Genomics issue depicts the process of determining the parent-of-origin using differential methylation of alleles at imprinted regions (iDMRs) is imagined as a circuit.

Designed in collaboration with with Carlos Urzua.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Our Cell Genomics cover depicts parent-of-origin assignment as a circuit (volume 3, issue 1, 11 January 2023). (more)

Akbari, V. et al. Parent-of-origin detection and chromosome-scale haplotyping using long-read DNA methylation sequencing and Strand-seq (2023) Cell Genomics 3(1).

Browse my gallery of cover designs.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A catalogue of my journal and magazine cover designs. (more)

Science Advances cover

Thu 05-01-2023

My cover design on the 6 January 2023 Science Advances issue depicts DNA sequencing read translation in high-dimensional space. The image showss 672 bases of sequencing barcodes generated by three different single-cell RNA sequencing platforms were encoded as oriented triangles on the faces of three 7-dimensional cubes.

More details about the design.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
My Science Advances cover that encodes sequence onto hypercubes (volume 9, issue 1, 6 January 2023). (more)

Kijima, Y. et al. A universal sequencing read interpreter (2023) Science Advances 9.

Browse my gallery of cover designs.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A catalogue of my journal and magazine cover designs. (more)

Regression modeling of time-to-event data with censoring

Thu 17-08-2023

If you sit on the sofa for your entire life, you’re running a higher risk of getting heart disease and cancer. —Alex Honnold, American rock climber

In a follow-up to our Survival analysis — time-to-event data and censoring article, we look at how regression can be used to account for additional risk factors in survival analysis.

We explore accelerated failure time regression (AFTR) and the Cox Proportional Hazards model (Cox PH).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Regression modeling of time-to-event data with censoring. (read)

Dey, T., Lipsitz, S.R., Cooper, Z., Trinh, Q., Krzywinski, M & Altman, N. (2022) Points of significance: Regression modeling of time-to-event data with censoring. Nature Methods 19:1513–1515.


© 1999–2023 Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentreBC Cancer Research CenterBC CancerPHSA