Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert Curve
Lips that taste of tears, they say, are the best for kissing.Dorothy Parkerget cranky

More than Pretty Pictures—Aesthetics of Data Representation, Denmark, April 13–16, 2015


visualization + design

Typography geek? If you like the geometry and mathematics of these posters, you may enjoy something more lettered. Visions of type: Type Peep Show: The Private Curves of Letters posters.

download

numbers.tgz
1,000,000 digits of π , φ , e and ASN.

find your own path

The source code is freely available. Read how you can compute your own π path!

watch video

Watch the video at Numberphile about my art.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Numberphile video — Pi is Beautiful. (watch)

2013 Pi Day art

Explore Pi Day art for 2013.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Pi Day art for 2013. (explore)

buy artwork

All the artwork can be purchased from Fine Art America.

Buy 2014 Pi Day Path posters

Buy 2014 Pi Day Circle posters

Buy 2013 Pi Day posters

Buy Love in Pi posters

The art of Pi (π), Phi (φ) and e

the art

Numbers are a lot of fun. They can start conversations—the interesting number paradox is a party favourite. Of course, in the wrong company they can just as easily end conversations.

The art here represents my attempt at transforming famous numbers in mathematics into pretty visual forms. This work is 99% art and 1% data visualization. Because the digits in the numbers are essentially random (as far as we know), the essence of the art is based on randomness.

In a few cases, the art reveals an interesting and unexpected observation. For example, the sequence 999999 in π at digit 762 appears significantly earlier than expected by chance. Or that if you calculate π to 13,099,586 digits you will find love, as encoded by 1114214 in the scheme a=0, b=1, c=2...

Keep in mind that because the digits are random and never terminating, they have the property that they contain all observations about numbers within them. In fact, because the digits go on forever, you'll eventually find π within π.

the numbers

Of these three transcendental numbers, π is the most well known. It is the ratio of a circle's circumference to its diameter (d = πr).

The Golden Ratio (φ) is the attractive proportion of values a and b (a > b) that satisfy (a+b)/a = a/b, which solves to a/b = (1+√5)/2.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The numbers π, φ and e nearly form a right-angled triangle.

The last of the three numbers, e is Euler's number and also known as the base of the natural logarithm. It, too, can be defined geometrically—it is the unique real number, e, for which the function f(x)=ex has a tangent of slope 1 at x=0. Like π, e appears throughout mathematics. For example, e is central in the expression for the normal distribution as well as the definition of entropy. And if you've ever heard of someone talking about log plots ... well, there's e again!

π 
φ
e 
= 3.141592653589793238462643...
= 1.618033988749894848204586...
= 2.718281828459045235360287...

did you see something special?

These three numbers have the curious property that they are almost Pythagorean. In other words, if they are made into sides of a triangle, the triangle is nearly a right-angled triangle (89.1°).

Did you notice how in the 12th decimal point all three numbers have the same digit—9? This accidental similarity generates its own number—the Accidental Similarity Number (ASN).

methods

perl, SVG, Illustrator

Happy Pi Day!

Hug π on March 14th and celebrate Pi Day. Those who favour τ will have to postpone celebrations until July 26th (τ = 2 π). If you're not into details, you may opt to party on July 22nd, which is π approximation day (π ≈ 22/7).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
2013 Pi Day posters. Celebrate with this post-modern poster. (BUY ARTWORK)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
2014 Pi Day posters. Celebrate with this modern poster. Pi is folded on a self-avoiding path to maximize the number of neighbouring prime digits. (BUY ARTWORK)

The 2013 posters were inspired by the beautiful AIDS posters by Elena Miska.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
The 4ness of π. Shown here are the first 2,000 4’s in pi. Each digit is formatted based on its 4-ness, which is a measure of how similar its neighbours are to 4. (...more, BUY ARTWORK)

4ness of Pi (π)

A concept created for this visualization, the iness of a number measures how close each of its digits is to a given number, i.

The iness is calculated for each digit from the average of the relative difference between i and the digit's neighbours.

The 4ness of Pi (π) is a specific case of an iness, for i=4.

Thanks to Lance Bailey for suggesting how to measure iness.

example

In the sequence of Pi (π) 3.1415 the neighbours of the 4 are 3, 1, 1 and 5. The relative distances to 4 are -1, -3, -1 and 1. The average, which is the 4ness, of this digit (which is also a 4, coincidentally) is -1.5. The 4ness of each of the other digits is computed identically.

In the iness posters, the 4ness is mapped onto a color and the standard deviation of the differences onto a size.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
The accidental similarity number for π, φ and e created from the first 1,000,000 digits of each number. (...more, BUY ARTWORK)

accidental similarity number

The accidental similarity number is a kind of overlap between numbers. I came up with this concept after creating typographical art about the 4ness of Pi (π).

example

To construct this number for Pi (π), Phi (φ) and e we first write the numbers on top of each other and then identify positions for which the numbers have the same digit.

3.141 … 3589793 … 7067982 … 7019385 … 
1.618 … 8749894 … 1137484 … 5959395 … 
2.718 … 8459045 … 6427427 … 6279434 … 

These digits are then used to create the accidental similarity number. In thise case,

asn(π,φ,e) = 0.979 …

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca buy artwork
Circos art depicting π, φ and e. (...more, BUY ARTWORK)

Circos numerical art

Numerology is bogus, but art based on numbers is pretty, in a random non-metaphysical way.

These depictions were generated using my Circos software by Cristian Ilies Vasile and myself.

news + thoughts

Nested Designs—Assessing Sources of Noise

Mon 29-09-2014

Sources of noise in experiments can be mitigated and assessed by nested designs. This kind of experimental design naturally models replication, which was the topic of last month's column.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Nested designs. (read)

Nested designs are appropriate when we want to use the data derived from experimental subjects to make general statements about populations. In this case, the subjects are random factors in the experiment, in contrast to fixed factors, such as we've seen previously.

In ANOVA analysis, random factors provide information about the amount of noise contributed by each factor. This is different from inferences made about fixed factors, which typically deal with a change in mean. Using the F-test, we can determine whether each layer of replication (e.g. animal, tissue, cell) contributes additional variation to the overall measurement.

Krzywinski, M., Altman, N. & Blainey, P. (2014) Points of Significance: Nested designs Nature Methods 11:977-978.

Background reading

Blainey, P., Krzywinski, M. & Altman, N. (2014) Points of Significance: Replication Nature Methods 11:879-880.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Analysis of variance (ANOVA) and blocking Nature Methods 11:699-700.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments Nature Methods 11:597-598.

...more about the Points of Significance column

Replication—Quality over Quantity

Tue 02-09-2014

It's fitting that the column published just before Labor day weekend is all about how to best allocate labor.

Replication is used to decrease the impact of variability from parts of the experiment that contribute noise. For example, we might measure data from more than one mouse to attempt to generalize over all mice.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Replication. (read)

It's important to distinguish technical replicates, which attempt to capture the noise in our measuring apparatus, from biological replicates, which capture biological variation. The former give us no information about biological variation and cannot be used to directly make biological inferences. To do so is to commit pseudoreplication. Technical replicates are useful to reduce the noise so that we have a better chance to detect a biologically meaningful signal.

Blainey, P., Krzywinski, M. & Altman, N. (2014) Points of Significance: Replication Nature Methods 11:879-880.

Background reading

Krzywinski, M. & Altman, N. (2014) Points of Significance: Analysis of variance (ANOVA) and blocking Nature Methods 11:699-700.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments Nature Methods 11:597-598.

...more about the Points of Significance column

Monkeys on a Hilbert Curve—Scientific American Graphic

Tue 19-08-2014

I was commissioned by Scientific American to create an information graphic that showed how our genomes are more similar to those of the chimp and bonobo than to the gorilla.

I had about 5 x 5 inches of print space to work with. For 4 genomes? No problem. Bring out the Hilbert curve!

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Our genomes are much more similar to the chimp and bonobo than to the gorilla. And, we're practically still Denisovans. (details)

To accompany the piece, I will be posting to the Scientific American blog about the process of creating the figure. And to emphasize that the genome is not a blueprint!

As part of this project, I created some Hilbert curve art pieces. And while exploring, found thousands of Hilbertonians!

Happy Pi Approximation Day— π, roughly speaking 10,000 times

Wed 13-08-2014

Celebrate Pi Approximation Day (July 22nd) with the art of arm waving. This year I take the first 10,000 most accurate approximations (m/n, m=1..10,000) and look at their accuracy.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Accuracy of the first 10,000 m/n approximations of Pi. (details)

I turned to the spiral again after applying it to stack stacked ring plots of frequency distributions in Pi for the 2014 Pi Day.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Frequency distribution of digits of Pi in groups of 4 up to digit 4,988. (details)

Analysis of Variance (ANOVA) and Blocking—Accounting for Variability in Multi-factor Experiments

Mon 07-07-2014

Our 10th Points of Significance column! Continuing with our previous discussion about comparative experiments, we introduce ANOVA and blocking. Although this column appears to introduce two new concepts (ANOVA and blocking), you've seen both before, though under a different guise.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Analysis of variance (ANOVA) and blocking. (read)

If you know the t-test you've already applied analysis of variance (ANOVA), though you probably didn't realize it. In ANOVA we ask whether the variation within our samples is compatible with the variation between our samples (sample means). If the samples don't all have the same mean then we expect the latter to be larger. The ANOVA test statistic (F) assigns significance to the ratio of these two quantities. When we only have two-samples and apply the t-test, t2 = F.

ANOVA naturally incorporates and partitions sources of variation—the effects of variables on the system are determined based on the amount of variation they contribute to the total variation in the data. If this contribution is large, we say that the variation can be "explained" by the variable and infer an effect.

We discuss how data collection can be organized using a randomized complete block design to account for sources of uncertainty in the experiment. This process is called blocking because we are blocking the variation from a known source of uncertainty from interfering with our measurements. You've already seen blocking in the paired t-test example, in which the subject (or experimental unit) was the block.

We've worked hard to bring you 20 pages of statistics primers (though it feels more like 200!). The column is taking a month off in August, as we shrink our error bars.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Analysis of Variance (ANOVA) and Blocking Nature Methods 11:699-700.

Background reading

Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments Nature Methods 11:597-598.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.

...more about the Points of Significance column

Designing Experiments—Coping with Biological and Experimental Variation

Thu 29-05-2014

This month, Points of Significance begins a series of articles about experimental design. We start by returning to the two-sample and paired t-tests for a discussion of biological and experimental variability.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Designing Comparative Experiments. (read)

We introduce the concept of blocking using the paired t-test as an example and show how biological and experimental variability can be related using the correlation coefficient, ρ, and how its value imapacts the relative performance of the paired and two-sample t-tests.

We also emphasize that when reporting data analyzed with the paired t-test, differences in sample means (and their associated 95% CI error bars) should be shown—not the original samples—because the correlation in the samples (and its benefits) cannot be gleaned directly from the sample data.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments Nature Methods 11:597-598.

Background reading

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.