view updates

Distractions and amusements, with a sandwich and coffee.

listen; there's a hell of a good universe next door: let's go.
•
• go there
• more quotes

On March 14th celebrate Pi Day. Hug `\pi`—find a way to do it.

For those who favour `\tau=2\pi` will have to postpone celebrations until July 26th. That's what you get for thinking that `\pi` is wrong.

If you're not into details, you may opt to party on July 22nd, which is `\pi` approximation day (`\pi` ≈ 22/7). It's 20% more accurate that the official Pi day!

Finally, if you believe that `\pi = 3`, you should read why `\pi` is not equal to 3.

Not a circle in sight in the 2015 `\pi` day art. Try to figure out how up to 612,330 digits are encoded before reading about the method. `\pi`'s transcendental friends `\phi` and `e` are there too—golden and natural. Get it?

This year's `\pi` day is particularly special. The digits of time specify a precise time if the date is encoded in North American day-month-year convention: 3-14-15 9:26:53.

The art has been featured in Ana Swanson's Wonkblog article at the Washington Post—10 Stunning Images Show The Beauty Hidden in `\pi`.

This year's art has a modern Bauhaus style. Sharp edges, lines and solid colors. Potato farms from space. CPUs from up close. If the pieces look like the art of Piet Mondrian, you'd be right.

The digits of `pi` are encoded in something that looks like a treemap. I explain how this is done in the methods section, but before reading it, try to see if you can figure out how it's done.

I briefly experimented with the 4-color theorem in trying to apply color to the treemap, but it turned out to lack interesting stucture. Well, at least some graphs were made.

I experimented with different treemap resolutions. For treemaps that use an outline around each rectangle, I decided to stop at 8 levels, at which 111,469 digits of `pi` can be encoded.

I also made a level 9 treemap without the outlines, which encoded 612,330 digits. When rendered at 20,833 × 20,833 pixels (I needed the image in bitmap form to provide the posters for sale), some regions are essentially a pixel in size, as seen in the 1-1 crop below.

*It is important to understand both what a classification metric expresses and what it hides.*

We examine various metrics use to assess the performance of a classifier. We show that a single metric is insufficient to capture performance—for any metric, a variety of scenarios yield the same value.

We also discuss ROC and AUC curves and how their interpretation changes based on class balance.

Altman, N. & Krzywinski, M. (2016) Points of Significance: Classifier evaluation. *Nature Methods* **13**:603-604.

Today is the day and it's hardly an approximation. In fact, `22/7` is 20% more accurate of a representation of `\pi` than `3.14`!

Time to celebrate, graphically. This year I do so with perfect packing of circles that embody the approximation.

By warping the circle by 8% along one axis, we can create a shape whose ratio of circumference to diameter, taken as twice the average radius, is 22/7.

If you prefer something more accurate, check out art from previous `\pi` days: 2013 `\pi` Day and 2014 `\pi` Day, 2015 `\pi` Day, and 2016 `\pi` Day.

*Regression can be used on categorical responses to estimate probabilities and to classify.*

The next column in our series on regression deals with how to classify categorical data.

We show how linear regression can be used for classification and demonstrate that it can be unreliable in the presence of outliers. Using a logistic regression, which fits a linear model to the log odds ratio, improves robustness.

Logistic regression is solved numerically and in most cases, the maximum-likelihood estimates are unique and optimal. However, when the classes are perfectly separable, the numerical approach fails because there is an infinite number of solutions.

Altman, N. & Krzywinski, M. (2016) Points of Significance: Logistic regression. *Nature Methods* **13**:541-542.

Altman, N. & Krzywinski, M. (2016) Points of Significance: Regression diagnostics? *Nature Methods* **13**:385-386.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Multiple Linear Regression *Nature Methods* **12**:1103-1104.

Altman, N. & Krzywinski, M. (2015) Points of significance: Simple Linear Regression *Nature Methods* **12**:999-1000.

Genomic instability is one of the defining characteristics of cancer and within a tumor, which is an ever-evolving population of cells, there are many genomes. Mutations accumulate and propagate to create subpopulations and these groups of cells, called clones, may respond differently to treatment.

It is now possible to sequence individual cells within a tumor to create a profile of genomes. This profile changes with time, both in the kinds of mutation that are found and in their proportion in the overall population.

Clone evolution diagrams visualize these data. These diagrams can be qualitative, showing only trends, or quantitative, showing temporal and population changes to scale. In this Molecular Cell forum article I provide guidelines for drawing these diagrams, focusing with how to use color and navigational elements, such as grids, to clarify the relationships between clones.

I'd like to thank Maia Smith and Cydney Nielsen for assistance in preparing some of the figures in the paper.

Krzywinski, M. (2016) Visualizing Clonal Evolution in Cancer. Mol Cell 62:652-656.

*Limitations in print resolution and visual acuity impose limits on data density and detail.*

Your printer can print at 1,200 or 2,400 dots per inch. At reading distance, your reader can resolve about 200–300 lines per inch. This large gap—how finely we can print and how well we can see—can create problems when we don't take visual acuity into account.

The column provides some guidelines—particularly relevant when showing whole-genome data, where the scale of elements of interest such as genes is below the visual acuity limit—for binning data so that they are represented by elements that can be comfortably discerned.

Krzywinski, M. (2016) Points of view: Binning high-resolution data. Nature Methods 13:463.

*Residual plots can be used to validate assumptions about the regression model.*

Continuing with our series on regression, we look at how you can identify issues in your regression model.

The difference between the observed value and the model's predicted value is the residual, `r = y_i - \hat{y}_i`, a very useful quantity to identify the effects of outliers and trends in the data that might suggest your model is inadequate.

We also discuss normal probability plots (or Q-Q plots) and show how these can be used to check that the residuals are normally distributed, which is one of the assumptions of regression (constant variance being another).

Altman, N. & Krzywinski, M. (2016) Points of Significance: Analyzing outliers: Influential or nuisance? *Nature Methods* **13**:281-282.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Multiple Linear Regression *Nature Methods* **12**:1103-1104.

Altman, N. & Krzywinski, M. (2015) Points of significance: Simple Linear Regression *Nature Methods* **12**:999-1000.