view updates

Distractions and amusements, with a sandwich and coffee.

Sun is on my face ...a beautiful day without you
•
• be apart
• more quotes

Functional annotation of gene sequences—a visualization workshop. Poznan, Poland. Dec 12, 2015

Typography geek? If you like the geometry and mathematics of these posters, you may enjoy something more lettered. Visions of type: Type Peep Show: The Private Curves of Letters posters.

On March 14th celebrate Pi Day. Hug `\pi`—find a way to do it. For those who favour `\tau=2\pi` will have to postpone celebrations until July 26th. Some of these folks will argue that `pi` is wrong. If you're not into details, you may opt to party on July 22nd, which is `pi` approximation day (`\pi` ≈ 22/7).

2013 was the first year in which I made `pi` day art. It was a year of dots and love.

Let's explore what `pi` looks like with something whimsical and pretty and colourful.

Rational art of the highly irrational, a regime where beauty runs with her hair down and lets her "*ribbons to flow confusedly.*" Robert Herrick says it well in Sweet Disorder,

I see a wild civility;—

Do more bewitch me, than when art

Is too precise in every part.

*Choose your own dust adventure!*

Nobody likes dusting but everyone should find dust interesting.

Working with Jeannie Hunnicutt and with Jen Christiansen's art direction, I created this month's Scientific American Graphic Science visualization based on a recent paper The Ecology of microscopic life in household dust.

This was my third information graphic for the Graphic Science page. Unlike the previous ones, it's visually simple and ... interactive. Or, at least, as interactive as a printed page can be.

More of my American Scientific Graphic Science designs

Barberan A et al. (2015) The ecology of microscopic life in household dust. Proc. R. Soc. B 282: 20151139.

A very large list of named colors generated from combining some of the many lists that already exist (X11, Crayola, Raveling, Resene, wikipedia, xkcd, etc).

For each color, coordinates in RGB, HSV, XYZ, Lab and LCH space are given along with the 5 nearest, as measured with ΔE, named neighbours.

I also provide a web service. Simply call this URL with an RGB string.

*It is possible to predict the values of unsampled data by using linear regression on correlated sample data.*

This month, we begin our column with a quote, shown here in its full context from Box's paper Science and Statistics.

In applying mathematics to subjects such as physics or statistics we make tentative assumptions about the real world which we know are false but which we believe may be useful nonetheless. The physicist knows that particles have mass and yet certain results, approximating what really happens, may be derived from the assumption that they do not. Equally, the statistician knows, for example, that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world.

—Box, G. J. Am. Stat. Assoc. 71, 791–799 (1976).

This column is our first in the series about regression. We show that regression and correlation are related concepts—they both quantify trends—and that the calculations for simple linear regression are essentially the same as for one-way ANOVA.

While correlation provides a measure of a specific kind of association between variables, regression allows us to fit correlated sample data to a model, which can be used to predict the values of unsampled data.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Simple Linear Regression *Nature Methods* **12**:999-1000.

Altman, N. & Krzywinski, M. (2015) Points of significance: Association, correlation and causation *Nature Methods* **12**:899-900.

Krzywinski, M. & Altman, N. (2014) Points of significance: Analysis of variance (ANOVA) and blocking. Nature Methods 11:699-700.

*Correlation implies association, but not causation. Conversely, causation implies association, but not correlation.*

This month, we distinguish between association, correlation and causation.

Association, also called dependence, is a very general relationship: one variable provides information about the other. Correlation, on the other hand, is a specific kind of association: an increasing or decreasing trend. Not all associations are correlations. Moreover, causality can be connected only to association.

We discuss how correlation can be quantified using correlation coefficients (Pearson, Spearman) and show how spurious corrlations can arise in random data as well as very large independent data sets. For example, per capita cheese consumption is correlated with the number of people who died by becoming tangled in bedsheets.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Association, correlation and causation *Nature Methods* **12**:899-900.

*For making probabilistic inferences, a graph is worth a thousand words.*

This month we continue with the theme of Bayesian statistics and look at Bayesian networks, which combine network analysis with Bayesian statistics.

In a Bayesian network, nodes represent entities, such as genes, and the influence that one gene has over another is represented by a edge and probability table (or function). Bayes' Theorem is used to calculate the probability of a state for any entity.

In our previous columns about Bayesian statistics, we saw how new information (likelihood) can be incorporated into the probability model (prior) to update our belief of the state of the system (posterior). In the context of a Bayesian network, relationships called conditional dependencies can arise between nodes when information is added to the network. Using a small gene regulation network we show how these dependencies may connect nodes along different paths.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayesian Statistics *Nature Methods* **12**:277-278.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem *Nature Methods* **12**:277-278.

The Points of Significance column is on vacation this month.

Meanwhile, we're showing you how to manage small multiple plots in the Points of View column Unentangling Complex Plots.

Data in small multiples can vary in range, noise level and trend. Gregor McInerny and myself show you how you can deal with this by cropped and scaling the multiples to a different range to emphasize relative changes while preserving the context of the full data range to show absolute changes.

McInerny, G. & Krzywinski, M. (2015) Points of View: Unentangling complex plots. *Nature Methods* **12**:591.