Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert Curve
Here we are now at the middle of the fourth large part of this talk.Pepe Deluxeget nowheremore quotes

circles: exciting


In Silico Flurries: Computing a world of snow. Scientific American. 23 December 2017


visualization + design

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The 2017 Pi Day art imagines the digits of Pi as a star catalogue with constellations of extinct animals and plants. The work is featured in the article Pi in the Sky at the Scientific American SA Visual blog.

`\pi` Approximation Day Art Posters


Pi Day 2014 Art Poster - Folding the Number Pi
 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2017 `\pi` day

Pi Day 2014 Art Poster - Folding the Number Pi
 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2016 `\pi` approximation day

Pi Day 2014 Art Poster - Folding the Number Pi
 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2016 `\pi` day

Pi Day 2014 Art Poster - Folding the Number Pi
 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2015 `\pi` day

Pi Day 2014 Art Poster - Folding the Number Pi
 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2014 `\pi` approx day

Pi Day 2014 Art Poster - Folding the Number Pi
 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2014 `\pi` day

Pi Day 2014 Art Poster - Folding the Number Pi
 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2013 `\pi` day

Pi Day 2014 Art Poster - Folding the Number Pi
 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Circular `\pi` art

The never-repeating digits of `\pi` can be approximated by 22/7 = 3.142857 to within 0.04%. These pages artistically and mathematically explore rational approximations to `\pi`. This 22/7 ratio is celebrated each year on July 22nd. If you like hand waving or back-of-envelope mathematics, this day is for you: `\pi` approximation day!

Want more math + art? Discover the Accidental Similarity Number. Find humor in my poster of the first 2,000 4s of `\pi`.

The `22/7` approximation of `\pi` is more accurate than using the first three digits `3.14`. In light of this, it is curious to point out that `\pi` Approximation Day depicts `\pi` 20% more accurately than the official `\pi` Day! The approximation is accurate within 0.04% while 3.14 is accurate to 0.05%.

first 10,000 approximations to `\pi`

For each `m=1...10000` I found `n` such that `m/n` was the best approximation of `\pi`. You can download the entire list, which looks like this

    m     n            m/n relative_error best_seen?
    1     1 1.000000000000 0.681690113816 improved
    2     1 2.000000000000 0.363380227632 improved
    3     1 3.000000000000 0.045070341449 improved
    4     1 4.000000000000 0.273239544735 
    5     2 2.500000000000 0.204225284541 
    7     2 3.500000000000 0.114084601643 
    8     3 2.666666666667 0.151173636843 
    9     4 2.250000000000 0.283802756086 
   10     3 3.333333333333 0.061032953946 
   11     4 2.750000000000 0.124647812995 
   12     5 2.400000000000 0.236056273159 
   13     4 3.250000000000 0.034507130097 improved
   14     5 2.800000000000 0.108732318685 
   16     5 3.200000000000 0.018591635788 improved
   17     5 3.400000000000 0.082253613025 
   18     5 3.600000000000 0.145915590262 
   19     6 3.166666666667 0.007981306249 improved
   20     7 2.857142857143 0.090543182332 
   21     8 2.625000000000 0.164436548768 
   22     7 3.142857142857 0.000402499435 improved
   23     7 3.285714285714 0.045875340318 
   24     7 3.428571428571 0.091348181202 
...
  354   113 3.132743362832 0.002816816734 
  355   113 3.141592920354 0.000000084914 improved
  356   113 3.150442477876 0.002816986561 
...
 9998  3183 3.141061891298 0.000168946885 
 9999  3182 3.142363293526 0.000245302310 
10000  3183 3.141690229343 0.000031059327 

As the value of `m` is increased, better approximations are possible. For example, each of `13/4`, `16/5`, `19/6` and `22/7` are in turn better approximations of `\pi`. The line includes the improved flag if the approximation is better than others found thus far.

next best after 22/7

After `22/7`, the next better approximation is at `179/57`.

Out of all the 10,000 approximations, the best one is `355/113`, which is good to 7 digits (6 decimal places).

      pi = 3.1415926
 355/113 = 3.1415929

I've scanned to beyond `m=1000000` and `355/113` still remains as the only approximation that returns more correct digits than required to remember it.

increasingly accurate approximations

Here is a sequence of approximations that improve on all previous ones.

    1     1 1.000000000000 0.681690113816 improved
    2     1 2.000000000000 0.363380227632 improved
    3     1 3.000000000000 0.045070341449 improved
   13     4 3.250000000000 0.034507130097 improved
   16     5 3.200000000000 0.018591635788 improved
   19     6 3.166666666667 0.007981306249 improved
   22     7 3.142857142857 0.000402499435 improved
  179    57 3.140350877193 0.000395269704 improved
  201    64 3.140625000000 0.000308013704 improved
  223    71 3.140845070423 0.000237963113 improved
  245    78 3.141025641026 0.000180485705 improved
  267    85 3.141176470588 0.000132475164 improved
  289    92 3.141304347826 0.000091770575 improved
  311    99 3.141414141414 0.000056822190 improved
  333   106 3.141509433962 0.000026489630 improved
  355   113 3.141592920354 0.000000084914 improved

For all except one, these approximations aren't all good value for your digits.

For example, `179/57` requires you to remember 5 digits but only gets you 3 digits of `\pi` correct (3.14).

Only `355/113` gets you more digits than you need to remember—you need to memorize 6 but get 7 (3.141592) out of the approximation!

You could argue that `22/7` and `355/113` are the only approximations worth remembering. In fact, go ahead and do so.

approximations for large `m` and `n`

It's remarkable that there is no better `m/n` approximation after `355/113` for all `m \le 10000`.

What do we find for `m > 10000`?

Well, we have to move down the values of `m` all the way to 52,163 to find `52163/16604`. But for all this searching, our improvement in accuracy is miniscule—0.2%!

                pi 3.141592653589793238
    
       m        n  m/n              relative_error
      355      113 3.1415929203     0.00000008491
    52163    16604 3.1415923873     0.00000008474

After 52,162 there is a slew improvements to the approximation.

   104348    33215 3.1415926539     0.000000000106
   208341    66317 3.1415926534     0.0000000000389
   312689    99532 3.1415926536     0.00000000000927
   833719   265381 3.141592653581   0.00000000000277
  1146408   364913 3.14159265359    0.000000000000513
  3126535   995207 3.141592653588   0.000000000000364
  4272943  1360120 3.1415926535893  0.000000000000129
  5419351  1725033 3.1415926535898  0.00000000000000705
 42208400 13435351 3.1415926535897  0.00000000000000669
 47627751 15160384 3.14159265358977 0.00000000000000512
 53047102 16885417 3.14159265358978 0.00000000000000388
 58466453 18610450 3.14159265358978 0.00000000000000287

I stopped looking after `m=58,466,453`.

Despite their accuracy, all these approximations require that you remember more or equal the number of digits than they return. The last one above requires you to memorize 17 (9+8) digits and returns only 14 digits of `\pi`.

The only exception to this is `355/113`, which returns 7 digits for its 6.

You can download the first 175 increasingly accurate approximations, calculated to extended precision (up to `58,466,453/18,610,450`).

VIEW ALL

news + thoughts

Machine learning: supervised methods (SVM & kNN)

Thu 18-01-2018
Supervised learning algorithms extract general principles from observed examples guided by a specific prediction objective.

We examine two very common supervised machine learning methods: linear support vector machines (SVM) and k-nearest neighbors (kNN).

SVM is often less computationally demanding than kNN and is easier to interpret, but it can identify only a limited set of patterns. On the other hand, kNN can find very complex patterns, but its output is more challenging to interpret.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Machine learning: supervised methods (SVM & kNN). (read)

We illustrate SVM using a data set in which points fall into two categories, which are separated in SVM by a straight line "margin". SVM can be tuned using a parameter that influences the width and location of the margin, permitting points to fall within the margin or on the wrong side of the margin. We then show how kNN relaxes explicit boundary definitions, such as the straight line in SVM, and how kNN too can be tuned to create more robust classification.

Bzdok, D., Krzywinski, M. & Altman, N. (2018) Points of Significance: Machine learning: a primer. Nature Methods 15:5–6.

Background reading

Bzdok, D., Krzywinski, M. & Altman, N. (2017) Points of Significance: Machine learning: a primer. Nature Methods 14:1119–1120.

...more about the Points of Significance column

Human Versus Machine

Tue 16-01-2018
Balancing subjective design with objective optimization.

In a Nature graphics blog article, I present my process behind designing the stark black-and-white Nature 10 cover.

Nature 10, 18 December 2017

Machine learning: a primer

Thu 18-01-2018
Machine learning extracts patterns from data without explicit instructions.

In this primer, we focus on essential ML principles— a modeling strategy to let the data speak for themselves, to the extent possible.

The benefits of ML arise from its use of a large number of tuning parameters or weights, which control the algorithm’s complexity and are estimated from the data using numerical optimization. Often ML algorithms are motivated by heuristics such as models of interacting neurons or natural evolution—even if the underlying mechanism of the biological system being studied is substantially different. The utility of ML algorithms is typically assessed empirically by how well extracted patterns generalize to new observations.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Machine learning: a primer. (read)

We present a data scenario in which we fit to a model with 5 predictors using polynomials and show what to expect from ML when noise and sample size vary. We also demonstrate the consequences of excluding an important predictor or including a spurious one.

Bzdok, D., Krzywinski, M. & Altman, N. (2017) Points of Significance: Machine learning: a primer. Nature Methods 14:1119–1120.

...more about the Points of Significance column

Snowflake simulation

Tue 16-01-2018
Symmetric, beautiful and unique.

Just in time for the season, I've simulated a snow-pile of snowflakes based on the Gravner-Griffeath model.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A few of the beautiful snowflakes generated by the Gravner-Griffeath model. (explore)

The work is described as a wintertime tale in In Silico Flurries: Computing a world of snow and co-authored with Jake Lever in the Scientific American SA Blog.

Gravner, J. & Griffeath, D. (2007) Modeling Snow Crystal Growth II: A mesoscopic lattice map with plausible dynamics.

Genes that make us sick

Wed 22-11-2017
Where disease hides in the genome.

My illustration of the location of genes in the human genome that are implicated in disease appears in The Objects that Power the Global Economy, a book by Quartz.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The location of genes implicated in disease in the human genome, shown here as a spiral. (more...)

Ensemble methods: Bagging and random forests

Wed 22-11-2017
Many heads are better than one.

We introduce two common ensemble methods: bagging and random forests. Both of these methods repeat a statistical analysis on a bootstrap sample to improve the accuracy of the predictor. Our column shows these methods as applied to Classification and Regression Trees.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Ensemble methods: Bagging and random forests. (read)

For example, we can sample the space of values more finely when using bagging with regression trees because each sample has potentially different boundaries at which the tree splits.

Random forests generate a large number of trees by not only generating bootstrap samples but also randomly choosing which predictor variables are considered at each split in the tree.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Ensemble methods: bagging and random forests. Nature Methods 14:933–934.

Background reading

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. Nature Methods 14:757–758.

...more about the Points of Significance column