Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography
Trance opera.Spente le Stelle

numbers: fun


Circos at British Library Beautiful Science exhibit—Feb 20–May 26


visualization + design

Typography geek? If you like the geometry and mathematics of these posters, you may enjoy something more lettered. Visions of type: Type Peep Show: The Private Curves of Letters posters.

watch video

Watch the video at Numberphile about my art.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Numberphile video — Pi is Beautiful. (watch)

download

numbers.tgz
1,000,000 digits of π, φ, e and ASN.

buy artwork

All the artwork can be purchased from Fine Art America. Most of the pieces were created by myself, and some by Cristian Ilies Vasile.

buy Martin Krzywinski's work

buy Christian Vasile's work

← art(π,φ,e)

Round art of π, φ and e

Numerology is bogus, but art based on numbers has a beautiful random quality.

For other examples of numerical art, see my inessiness project. Nixie clock lovers should investigate the accidental similarity number (ASN), which I render in a ASN Nixie poster.

Circos art of π — digit transition paths

It's fitting to use Circos to visualize the digits of π. After all, what is more round than Circos?

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A path connecting segments traces out the digits of π. Here the transition for the 6 digits is shown. Concept by Cristian Ilies Vasile. Created with Circos.

Cristian Ilies Vasile had the idea of representing the digits of π as a path traced by links between successive digits. Each digit is assigned a segment around the circle and a link between segment i and j corresponds to the appearance of ij in π. For example, the "14" in "3.14..." is drawn as a link between segment 1 and segment 4.

The position of the link on a digit's segment is associated with the position of the digit π. For example, the "14" link associated with the 2nd digit (1) and the 3rd digit (4) is drawn from position 2 on the 1 segment to position 3 on the 4 segment.

As more digits are added to the path, the image becomes a weaving mandala.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Digit transition paths for 10, 100 and 1,000 digits of π. Concept by Cristian Ilies Vasile. Created with Circos.
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Transition paths for the first 10,000 digits of π. Concept by Cristian Ilies Vasile. Created with Circos. (PNG, BUY ARTWORK)
Flow of Pi — Animation of digit transition paths by Ekrem Guner. (download good mp4 high MP4)

circos art of π, φ and e — transition paths and bubbles

I added to Cristian's representation by showing the number of transitions between digits in a series of concentric circles placed outside the links. This summary representation counts the number of transition links within a region and addresses the question of what kind of digits appear immediately before or after a given digit in π. The approach is diagrammed below.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The number of transitions to and from a given digit within a window of 10 digits is shown by circles. For a given digit segment (here, 9) each circle indicates the presence of a specific digit appearing before (inner track) or after (after track) the digit. Solid circles are used for the digit that appears most often and if all digits appear equally often, the choice is arbitrary. In some images the order of digits in the inner track is outward. (zoom)

The original images were generated using the 10-color Brewer paired qualitative palette, which was later modified as shown below.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
For added visual impact, I inverted the color palette and added hue shift and vibrance effects.

The bubbles that count the number of links quickly draw attention to regions where specific digit pairs are frequent. In the image for π below, which shows transitions for the first 1,000 digits, the large bubble on the 9 segment is due to the sequence "999999" sequence at decimal place 762.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Progression and transition for the first 1,000 digits of π. Created with Circos. (PNG, BUY ARTWORK)

This sequence of 6 9's occurs significantly earlier than expected by chance. Because the distribution and sequence of digits of π is, as far as we know, uniformly random, we can calculate how frequently we should expect a series of 6 identical digits.

For a given digit, the chance that the next 5 digits are the same is 0.00001 (0.1 that the next digit is the same * 0.1 that the second-nex digit is the same * ...). Therefore the chance that a given position the next 5 digits are not the same is 1 - 1/0.00001 = 0.99999. From this, the chance that k consecutive digits don't initiate a 6-digit sequence is therefore 0.99999k.

If I ask what is k for which this value is 0.5, I need to solve 0.99999k, which gives k = 69,314. Thus, chances are 50-50 that in a 69,000 digit random sequence we'll see a run of 6 idendical digits. This calculation is an approximation.

It's fun to look for words in π. For example, love appears at 13,099,586th digit.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Progression and transition for the first 1,000 digits of π, φ and e. Created with Circos. (PNG, BUY ARTWORK)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Progression and transition for the first 1,000 digits of φ. Created with Circos. (PNG, BUY ARTWORK)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Progression and transition for the first 1,000 digits of e. Created with Circos. (PNG, BUY ARTWORK)

The transition probabilities for each 10 digit bin for the first 2,000 digits of π, φ and e are shown in the image below.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Progression and transition for the first 2,000 digits of e. Created with Circos. (PNG, BUY ARTWORK)

A tangent into randomness

The digits of π are, as far as we know, randomly distributed. Art based on its digits therefore as a quality that is influenced by this random distribution. To provide a reference of what such a random pattern looks like, below are 16 random numbers represented in the same way. They're all different, yet strangely the same.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Digit transition paths of sixteen 1,000 digit random numbers. (PNG, BUY ARTWORK)

Circos art of π — heaps of bubbles

Below are more images by Cristian Ilies Vasile, where dots are used to represent the adjacency between digits. As in the image above, each digit 0-9 is represented by a colored segment. For each digit sequence ij, a dot is placed on ith's segment at the position of i colored by j.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
In a digit bubble heap, a digit is represented by a bubble and placed on the segment of its previous neighbour at the index position of the neighbour.

For example, for π the dot coordinates for the first 7 digits are (segment:position:label) 3:0:1 → 1:1:4 → 4:2:1 → 1:3:5 → 5:4:9 ...

segment position colored_by

3       0        1
1       1        4
4       2        1
1       3        5
5       4        9
9       5        2
2       6        6 

Because there is a large number of digits, the dots stack up near their position to avoid overlapping. The layout of the dots is automated by Circos' text track layout.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Progression and transition for the first 10,000 digits of π. Created with Circos. (PNG, BUY ARTWORK)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Progression and transition for the first 10,000 digits of φ. Created with Circos. (PNG, BUY ARTWORK)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Progression and transition for the first 10,000 digits of e. Created with Circos. (PNG, BUY ARTWORK)

When the digits of π, e and φ are aligned, positions at which the three numbers have the same digit yield the accidental similarity number (ASN). Below is a dot plot of the transition of the ASN.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Progression and transition for the first 10,000 digits of the accidental similarity number. Created with Circos. (PNG, BUY ARTWORK)

spiral art of π

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The Archimedean spiral embodies π.

By mapping the digits onto a red-yellow-blue Brewer palette (0 9) and placing them as circles on an Archimedean spiral a dense and pleasant layout can be obtained.

Why the Archimedean spiral? This spiral is defined as r = a + bθ and has the interesting property that a ray from the origin will intersect the spiral every 2πb. Thus, each spiral can accomodate inscribed circles of radius πb.

Why the Brewer palette? These color schemes have some very useful perceptual properties and are commonly used to encode quantitative and categorical data.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The digits of π assembled along an Archimedean spiral.
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Calculating (x,y) coordinates for each digit along the Archimedean spiral.
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Distribution of the first 13,689 digits of π. (PNG, BUY ARTWORK)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Distribution of the first 3,422, 13,689 and 123,201 digits of π. (PNG, BUY ARTWORK)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Distribution of the first 3,422, 13,689 and 123,201 digits of π. (PNG, BUY ARTWORK)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Distribution of the first 3,422 digits of π. (PNG, BUY ARTWORK)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Distribution of the first 123,201 digits of π. (PNG, BUY ARTWORK)


news + thoughts

Happy Pi Approximation Day— π, roughly speaking 10,000 times

Wed 23-07-2014

Celebrate Pi Approximation Day (July 22nd) with the art arm waving. This year I take the first 10,000 most accurate approximations (m/n, m=1..10,000) and look at their accuracy.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Accuracy of the first 10,000 m/n approximations of Pi. (details)

I turned to the spiral again after applying it to stack stacked ring plots of frequency distributions in Pi for the 2014 Pi Day.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Frequency distribution of digits of Pi in groups of 4 up to digit 4,988. (details)

Analysis of Variance (ANOVA) and Blocking—Accounting for Variability in Multi-factor Experiments

Mon 07-07-2014

Our 10th Points of Significance column! Continuing with our previous discussion about comparative experiments, we introduce ANOVA and blocking. Although this column appears to introduce two new concepts (ANOVA and blocking), you've seen both before, though under a different guise.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Analysis of variance (ANOVA) and blocking. (read)

If you know the t-test you've already applied analysis of variance (ANOVA), though you probably didn't realize it. In ANOVA we ask whether the variation within our samples is compatible with the variation between our samples (sample means). If the samples don't all have the same mean then we expect the latter to be larger. The ANOVA test statistic (F) assigns significance to the ratio of these two quantities. When we only have two-samples and apply the t-test, t2 = F.

ANOVA naturally incorporates and partitions sources of variation—the effects of variables on the system are determined based on the amount of variation they contribute to the total variation in the data. If this contribution is large, we say that the variation can be "explained" by the variable and infer an effect.

We discuss how data collection can be organized using a randomized complete block design to account for sources of uncertainty in the experiment. This process is called blocking because we are blocking the variation from a known source of uncertainty from interfering with our measurements. You've already seen blocking in the paired t-test example, in which the subject (or experimental unit) was the block.

We've worked hard to bring you 20 pages of statistics primers (though it feels more like 200!). The column is taking a month off in August, as we shrink our error bars.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Analysis of Variance (ANOVA) and Blocking Nature Methods 11:699-700.

Background reading

Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments Nature Methods 11:597-598.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.

...more about the Points of Significance column

Designing Experiments—Coping with Biological and Experimental Variation

Thu 29-05-2014

This month, Points of Significance begins a series of articles about experimental design. We start by returning to the two-sample and paired t-tests for a discussion of biological and experimental variability.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Designing Comparative Experiments. (read)

We introduce the concept of blocking using the paired t-test as an example and show how biological and experimental variability can be related using the correlation coefficient, ρ, and how its value imapacts the relative performance of the paired and two-sample t-tests.

We also emphasize that when reporting data analyzed with the paired t-test, differences in sample means (and their associated 95% CI error bars) should be shown—not the original samples—because the correlation in the samples (and its benefits) cannot be gleaned directly from the sample data.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments Nature Methods 11:597-598.

Background reading

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.

Have skew, will test

Wed 28-05-2014

Our May Points of Significance Nature Methods column jumps straight into dealing with skewed data with Non Parametric Tests.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Non Parametric Testing. (read)

We introduce non-parametric tests and simulate data scenarios to compare their performance to the t-test. You might be surprised—the t-test is extraordinarily robust to distribution shape, as we've discussed before. When data is highly skewed, non-parametric tests perform better and with higher power. However, if sample sizes are small they are limited to a small number of possible P values, of which none may be less than 0.05!

Krzywinski, M. & Altman, N. (2014) Points of Significance: Non Parametric Testing Nature Methods 11:467-468.

Background reading

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.

Mind your p's and q's

Sat 29-03-2014

In the April Points of Significance Nature Methods column, we continue our and consider what happens when we run a large number of tests.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Comparing Samples — Part II — Multiple Testing. (read)

Observing statistically rare test outcomes is expected if we run enough tests. These are statistically, not biologically, significant. For example, if we run N tests, the smallest P value that we have a 50% chance of observing is 1–exp(–ln2/N). For N = 10k this P value is Pk=10kln2 (e.g. for 104=10,000 tests, P4=6.9×10–5).

We discuss common correction schemes such as Bonferroni, Holm, Benjamini & Hochberg and Storey's q and show how they impact the false positive rate (FPR), false discovery rate (FDR) and power of a batch of tests.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part II — Multiple Testing Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.

Happy Pi Day— go to planet π

Fri 21-03-2014

Celebrate Pi Day (March 14th) with the art of folding numbers. This year I take the number up to the Feynman Point and apply a protein folding algorithm to render it as a path.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Digits of Pi form landmass and shoreline. (details)

For those of you who liked the minimalist and colorful digit grid, I've expanded on the concept to show stacked ring plots of frequency distributions.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Frequency distribution of digits of Pi in groups of 6 up to the Feynman Point. (details)

And if spirals are your thing...

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Frequency distribution of digits of Pi in groups of 4 up to digit 4,988. (details)