latest news

Distractions and amusements, with a sandwich and coffee.

syncopation & accordion
•
• like France, but no dog poop
• more quotes

On March 14th celebrate `\pi` Day. Hug `\pi`—find a way to do it.

For those who favour `\tau=2\pi` will have to postpone celebrations until July 26th. That's what you get for thinking that `\pi` is wrong.

If you're not into details, you may opt to party on July 22nd, which is `\pi` approximation day (`\pi` ≈ 22/7). It's 20% more accurate that the official `\pi` day!

Finally, if you believe that `\pi = 3`, you should read why `\pi` is not equal to 3.

2013 was the first year in which I made `\pi` day art. It was a year of dots and love.

René Hansen has created an interactive version of this year's posters! Why not go to the Feynman point directly!

The posters explore the relationship between adjacent digits in `\pi`, which are encoded by color using the scheme shown above. The design appears to shimmer due to the luminance effect. In some versions of the poster, adjacent identical (or similar) digits are connected by lines.

The recipe for each poster is included in its figure legend. It gives the color of the `i`th outer and inner circles. `\pi_i` is used to represent the `i`th digit of `\pi`. For example, the recipe

`\pi_i` / `\pi_{i+1}`

corresponds to the case where outer circle color encodes the `i`th digit and the inner circle color encodes the next digit `i+1`th. In this scheme, inner and outer circles of adjacent positions have the same color.

The posters were generated automatically with a Perl script that generated SVG files. Post processing and layout was done in Illustrator. If you are interested in depicting your favourite number this way, let me know.

The design was inspired by the beautiful AIDS posters by Elena Miska.

I calculated `pi` to 13,099,586 digits and then I found love.

It's fun to look for digits or look for words in `\pi`.

Just don't get carried away. Because `\pi` is likely normal in base 10, all words and all patterns appear in it, somewhere.

I wanted to know the first time that "*love*" appears in `\pi`. When encoded using the scheme a=0, b=1, ..., z=25, "*love*" is the digit sequence 1114214.

This sequence appears first at position 13,099,586 (...8921991631**1114214**8187311392...). And, of course, infinitely many times after that.

Curiously, "hate" (0700194) appears well before love, at digit 514,717. In the first 200,000,000 digit "hate" appears 23 times, 6 times more than "love".

If you use the scheme a=1, b=2, ..., z=26, then "*love*" becomes 1215225. This is first seen at 6,317,696 (...6103119129**1215225**6606850141...).

We examine two very common supervised machine learning methods: linear support vector machines (SVM) and k-nearest neighbors (kNN).

SVM is often less computationally demanding than kNN and is easier to interpret, but it can identify only a limited set of patterns. On the other hand, kNN can find very complex patterns, but its output is more challenging to interpret.

We illustrate SVM using a data set in which points fall into two categories, which are separated in SVM by a straight line "margin". SVM can be tuned using a parameter that influences the width and location of the margin, permitting points to fall within the margin or on the wrong side of the margin. We then show how kNN relaxes explicit boundary definitions, such as the straight line in SVM, and how kNN too can be tuned to create more robust classification.

Bzdok, D., Krzywinski, M. & Altman, N. (2018) Points of Significance: Machine learning: a primer. Nature Methods 15:5–6.

Bzdok, D., Krzywinski, M. & Altman, N. (2017) Points of Significance: Machine learning: a primer. Nature Methods 14:1119–1120.

In a Nature graphics blog article, I present my process behind designing the stark black-and-white Nature 10 cover.

Nature 10, 18 December 2017

In this primer, we focus on essential ML principles— a modeling strategy to let the data speak for themselves, to the extent possible.

The benefits of ML arise from its use of a large number of tuning parameters or weights, which control the algorithm’s complexity and are estimated from the data using numerical optimization. Often ML algorithms are motivated by heuristics such as models of interacting neurons or natural evolution—even if the underlying mechanism of the biological system being studied is substantially different. The utility of ML algorithms is typically assessed empirically by how well extracted patterns generalize to new observations.

We present a data scenario in which we fit to a model with 5 predictors using polynomials and show what to expect from ML when noise and sample size vary. We also demonstrate the consequences of excluding an important predictor or including a spurious one.

Bzdok, D., Krzywinski, M. & Altman, N. (2017) Points of Significance: Machine learning: a primer. Nature Methods 14:1119–1120.

Just in time for the season, I've simulated a snow-pile of snowflakes based on the Gravner-Griffeath model.

The work is described as a wintertime tale in In Silico Flurries: Computing a world of snow and co-authored with Jake Lever in the Scientific American SA Blog.

Gravner, J. & Griffeath, D. (2007) Modeling Snow Crystal Growth II: A mesoscopic lattice map with plausible dynamics.

My illustration of the location of genes in the human genome that are implicated in disease appears in The Objects that Power the Global Economy, a book by Quartz.

We introduce two common ensemble methods: bagging and random forests. Both of these methods repeat a statistical analysis on a bootstrap sample to improve the accuracy of the predictor. Our column shows these methods as applied to Classification and Regression Trees.

For example, we can sample the space of values more finely when using bagging with regression trees because each sample has potentially different boundaries at which the tree splits.

Random forests generate a large number of trees by not only generating bootstrap samples but also randomly choosing which predictor variables are considered at each split in the tree.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Ensemble methods: bagging and random forests. *Nature Methods* **14**:933–934.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. *Nature Methods* **14**:757–758.