latest news

Distractions and amusements, with a sandwich and coffee.

I'm not real and I deny I won't heal unless I cry.
•
• let it go
• more quotes

Numbers are a lot of fun. They can start conversations—the interesting number paradox is a party favourite: every number must be interesting because the first number that wasn't would be very interesting! Of course, in the wrong company they can just as easily end conversations.

I debunk the proof that `\pi = 3` by proving, once and for all, that `\pi` can be any number you like!

Periodically I receive kooky emails from people who claim to know more. Not more than me—which makes me feel great—but more than everybody—which makes me feel suspicious. A veritable fount of crazy is The Great Design Book, Integration of the Cosmic, Atomic & Darmic (Dark Matter) Systems by R.A. Forde.

Look at the margin of error. Archimedes' value for `\pi` (3.14) is an approximation - not an exact value. Would you accept an approximation or errors for your bank account balance? Then, why do you accept it for `\pi`? What else may be wrong? —R.A. Forde

What else may be wrong? Everything!

Here is a "proof" I recently received that π = 3. The main thrust of the proof is that "God said so." QED? Not quite.

Curiously the proof was sent to me as a bitmap.

Given that it claims to show that π has the exact value of 3, it begins reasonably humbly—that I "may find this information ... interesting." Actually, if this were true, I would find this information *staggering*.

Because mathematics is the language of physical reality, there's only that far that you can go with wrong math. If you build it based on wrong math, it will break.

Given that math is axiomatic and not falsifiable, its arguments are a kind of argument from authority—the authority of the axioms. You must accept the axioms for the rest to make sense.

Religion also makes its arguments from authority—a kind of divine authority by proxy—though its "axioms" are nowhere as compelling nor its conclusions useful. Normally, the deception in religion's arguments from authority is not obvious. The arguments have been inocculated over time—amgiguity, hedging and the appeal to faith—to be immune to criticism.

When these arguments include demonstrably incorrect math, the curtain falls. The stage, props and other machinery of the scheme becomes apparent. Here you can see this machinery in action. Or, should I say, inaction.

If you're 5 years-old: (1) draw a reasonably good circle, (2) lay out a piece of string along the circle and measure the length of the string (circumference), (3) measure the diameter of the circle, (4) divide circumference by diameter. You should get a value close to the actual value of π = 3.14. If you're older, read on.

The book purports "real" (why the quotes?) life experiments to demonstrate that that π is 3. I'll take a look at one below, since it makes use of a coffee cup and I don't like to see coffee cups besmirched through hucksterish claims.

What appears below is a critique of a wrong proof. It constitutes the right proof of the fact that the original proof is wrong. It is not a proof that `\pi = 3`!

The proof begins with some horrendous notation. But, since notation has never killed anyone (though frustration is a kind of death, of patience), let's go with it. We're asked to consider the following equation, which is used by the proof to show that `\pi = 3`. $$ \sin^{-1} \Delta \theta^c = \frac{\pi}{6} \frac{\theta^{\circ}}{y}\tag{1} $$

where $$ \begin{array}{l} \Delta \theta^c = \frac{2\pi}{12} & \theta^{\circ} = \frac{360^\circ}{12} & y = \frac{1}{2} \end{array} $$

At this point you might already suspect that we're asked to consider a statement which is an **inequality**. The proof might as well have started by saying "We will use `6 = 2\pi` to show that `\pi = 3`." In fact, this is the exact approach I use below prove that `\pi` is any number. But let's continue with examining the proof.

Nothing so simple as equation (1) should look so complicated. Let's clean it up a little bit. $$ \sin^{-1} a = \tfrac{\pi}{3} b\tag{2} $$

where $$ \begin{array}{l} a = \frac{2\pi}{12} & b = \frac{360^\circ}{12} \end{array} $$

The fact that we're being asked to take the inverse sine of a quantity that is explicitly indicated to be an angle should make you suspicious. Although an angle is a dimensionless quantity and we can write $$ \sin^{-1}(\pi \; \text{rad}) = \sin^{-1}(\pi) = 0 $$

using an angle as an argument to `\sin()` suggests that we don't actually know what the function does.

If we go back to (2) and substitute the values we're being asked to use, $$ \sin^{-1} \tfrac{\pi}{6} = \tfrac{\pi}{3} 30 = 10 \pi \tag{3} $$

we get $$ 0.551 = 31.416 \tag{4} $$

That's as good an inequality as you're going to get. An ounce of reason would be enough for us to stop here, backtrack and find our error. Short of that, we press ahead to see how we can manipulate this to our advantage.

In the next step, the proof treats the left-hand side as a quantity in radians—completely bogus step, but let's go with it—and converts it to degrees to obtain $$ 0.551 \times \tfrac{360}{2 \pi} = 31.574 $$

Yes, we just multiplied only one side of equation (4) by a value that is not one. Sigh.

After committing this crime, the proof attempts to shock you into confusion by stating that $$ 31.574 \neq 31.416 $$

And, given that these numbers aren't the same—they weren't the same in equation (4) either, so the additional bogus multiplication by \(\tfrac{360}{{2 \pi}}\) wasn't actually needed‐the proof states that this inequality must be due to the fact that we used the wrong value for `\pi` in equation (1).

The proof fails to distinguish the difference between an incorrect identity (e.g. `1 = 2` is not correct) and the concept of a variable (e.g. `1 = 2 x` may be correct, depending on the value of `x`). Guided by the dim headlamp of unreason, it suggests that we right our delusion that `\pi = 3.1415...` and instead use `\pi = 3` in equation (1), we get $$ sin^{-1} \tfrac{1}{2} = 30 $$

which is true, because `\sin(30^\circ) = \tfrac{1}{2}`. Therefore, `\pi = 3`.

The entire proof is bogus because it starts with an equality that is not true. In equation (1), the left hand side is not equal to the right hand side.

To illustrate explicitly what just happened, here's a proof that `\pi = 4` using the exact same approach.

Consider the equation, $$ 4 = \pi \tag{5} $$

if we substitute the conventionally accepted value of `\pi` we find $$ 4 = 3.1415... $$

which isn't true! But if we use `\pi = 4` then $$ 4 = 4 $$

which is true! Therefore, `\pi = 4`. QED.

This only demonstrated that I'm an idiot, not that `\pi = 4`.

But why stop at 4? Everyone can have their own value of `\pi`. In equation (5) in the above "proof", set 4 to any number you like and use it to prove that `\pi` is any number you like.

Isn't misunderstanding math fun?

The history of the value of π is rich. There is good evidence for `\pi = (16/9)^2` in the Egyptian Rhind Papyris (circa 1650 BC). Archimedes (287-212 BC) estimated `\pi \approx 3.1418` using the inequality `\tfrac{223}{71} \lt \pi \lt \tfrac{22}{7}`

One thing is certain, the precision to which the number is known is always increasing. At this point, after about 12 trillion digits.

So, it might seem, that `\pi \approx 3` is ancient history. Not to some.

Approximations are fantastic—they allow us to get the job done early. We use the best knowledge available to us today to solve today's problems. Tomorrow's problems might require tomorrow's knowledge—an improvement in the approximations of today.

`\pi = 3` is an approximation that is about 2,000 years old (not the best of its time, either). It's comical to consider it as today's best knowledge.

One of the "real" life experiments proposed in the book (pp. 65-68) uses a coffee cup. The experiment is a great example in failing to identify your wrong assumptions.

First you take measurements of your coffee cup. The author finds that the inner radius is `r = 4 cm` and the depth is `d = 8.6 cm`. Using the volume of a cylinder, the author finds that the volume is either `412.8 \; \mathrm{cm}^3 \ 14.0 \mathrm \; {fl.oz}` if `\pi=3` or `432.3 \; \mathrm{cm}^3 = 14.6 \mathrm \; {fl.oz.}` if `pi=3.14...`.

You're next instructed to full up a measuring cup to 14.6 fl.oz. (good luck there, since measuring cups usually come in 1/2 (4 fl.oz) or 1/3 (2.6 fl.oz) increments).

The author supposedly does this and finds that he could fill the cup to the brim using only 13.7 fl.oz, with the remaining 0.9 fl.oz. spilling.

And now, for some reason, he concludes that this is proof that `\pi = 3`, despite that when using this value of `\pi` the cup's volume was calculated to be 14 fl.oz. not 13.7 fl.oz.

Other than being sloppy, it's most likely that the original assumption that the inside of the coffee cup is a perfect cylinder is wrong. The inside of the cup is probably smooth and perhaps even slightly tapered. Using the maximum radius and depth dimensions will yield a volume larger than the cup's. This is why water spilled out.

In this primer, we focus on essential ML principles— a modeling strategy to let the data speak for themselves, to the extent possible.

The benefits of ML arise from its use of a large number of tuning parameters or weights, which control the algorithm’s complexity and are estimated from the data using numerical optimization. Often ML algorithms are motivated by heuristics such as models of interacting neurons or natural evolution—even if the underlying mechanism of the biological system being studied is substantially different. The utility of ML algorithms is typically assessed empirically by how well extracted patterns generalize to new observations.

We present a data scenario in which we fit to a model with 5 predictors using polynomials and show what to expect from ML when noise and sample size vary. We also demonstrate the consequences of excluding an important predictor or including a spurious one.

Bzdok, D., Krzywinski, M. & Altman, N. (2017) Points of Significance: Machine learning: a primer. Nature Methods 14:1119–1120.",

Just in time for the season, I've simulated a snow-pile of snowflakes based on the Gravner-Griffeath model.

Gravner, J. & Griffeath, D. (2007) Modeling Snow Crystal Growth II: A mesoscopic lattice map with plausible dynamics.

My illustration of the location of genes in the human genome that are implicated in disease appears in The Objects that Power the Global Economy, a book by Quartz.

We introduce two common ensemble methods: bagging and random forests. Both of these methods repeat a statistical analysis on a bootstrap sample to improve the accuracy of the predictor. Our column shows these methods as applied to Classification and Regression Trees.

For example, we can sample the space of values more finely when using bagging with regression trees because each sample has potentially different boundaries at which the tree splits.

Random forests generate a large number of trees by not only generating bootstrap samples but also randomly choosing which predictor variables are considered at each split in the tree.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Ensemble methods: bagging and random forests. *Nature Methods* **14**:933–934.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. *Nature Methods* **14**:757–758.

Decision trees classify data by splitting it along the predictor axes into partitions with homogeneous values of the dependent variable. Unlike logistic or linear regression, CART does not develop a prediction equation. Instead, data are predicted by a series of binary decisions based on the boundaries of the splits. Decision trees are very effective and the resulting rules are readily interpreted.

Trees can be built using different metrics that measure how well the splits divide up the data classes: Gini index, entropy or misclassification error.

When the predictor variable is quantitative and not categorical, regression trees are used. Here, the data are still split but now the predictor variable is estimated by the average within the split boundaries. Tree growth can be controlled using the complexity parameter, a measure of the relative improvement of each new split.

Individual trees can be very sensitive to minor changes in the data and even better prediction can be achieved by exploiting this variability. Using ensemble methods, we can grow multiple trees from the same data.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. *Nature Methods* **14**:757–758.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. *Nature Methods* **13**:541-542.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Multiple Linear Regression *Nature Methods* **12**:1103-1104.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. *Nature Methods* **13**:603-604.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Model Selection and Overfitting. *Nature Methods* **13**:703-704.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Regularization. *Nature Methods* **13**:803-804.

The artwork was created in collaboration with my colleagues at the Genome Sciences Center to celebrate the 5 year anniversary of the Personalized Oncogenomics Program (POG).

The Personal Oncogenomics Program (POG) is a collaborative research study including many BC Cancer Agency oncologists, pathologists and other clinicians along with Canada's Michael Smith Genome Sciences Centre with support from BC Cancer Foundation.

The aim of the program is to sequence, analyze and compare the genome of each patient's cancer—the entire DNA and RNA inside tumor cells— in order to understand what is enabling it to identify less toxic and more effective treatment options.