syncopation & accordion
•
• like France, but no dog poop

This talk happened on Thursday, Mar 21st 2013 at VIZBI 2013 at the Broad Institute in Boston.

How often people speak of art and science as though they were two entirely different things, with no interconnection. An artist is emotional, they think, and uses only his intuition; he sees all at once and has no need of reason. A scientist is cold, they think, and uses only his reason; he argues carefully step by step, and needs no imagination. That is all wrong. The true artist is quite rational as well as imaginative and knows what he is doing; if he does not, his art suffers. The true scientist is quite imaginative as well as rational, and sometimes leaps to solutions where reason can follow only slowly; if he does not, his science suffers. —Isaac Asimov (The Roving Mind)

For more visualization and design resources, see my VIZBI 2012 tutorials, Nature Methods Points of View column, and rant about colors.

The video will be posted at vizbi.org.

Slides are available as PDF and keynote (zipped).

A poet is, after all, a sort of scientist, but engaged in a qualitative science in which nothing is measurable. He lives with data that cannot be numbered, and his experiments can be done only once. The information in a poem is, by definition, not reproducible. He becomes an equivalent of scientist, in the act of examining and sorting the things popping in [to his head], finding the marks of remote similarity, points of distant relationship, tiny irregularities that indicate that this one is really the same as that one over there only more important. Gauging the fit, he can meticulously place pieces of the universe together, in geometric configurations that are as beautiful and balanced as crystals. —Lewis Thomas (The Medusa and the Snail: More Notes of a Biology Watcher)

If you're asking how to visualize big data, first make sure you're doing a good job on small and medium data. Each scale requires good design.

Do not expect to use one way

to tell many stories

to tell many stories

Also consider that there is a very large number of combinations of data sets, hypotheses and possible patterns. Because of this, you cannot expect to use one way to tell many stories. There is no Holy Grail of big data visualization. But there are many good questions to ask and practices to follow that make up a process which can help you get there.

Building on last month's column about Bayes' Theorem, we introduce Bayesian inference and contrast it to frequentist inference.

Given a hypothesis and a model, the frequentist calculates the probability of different data generated by the model, *P*(data|model). When this probability to obtain the observed data from the model is small (e.g. `alpha` = 0.05), the frequentist rejects the hypothesis.

In contrast, the Bayesian makes direct probability statements about the model by calculating P(model|data). In other words, given the observed data, the probability that the model is correct. With this approach it is possible to relate the probability of different models to identify one that is most compatible with the data.

The Bayesian approach is actually more intuitive. From the frequentist point of view, the probability used to assess the veracity of a hypothesis, P(data|model), commonly referred to as the *P* value, does not help us determine the probability that the model is correct. In fact, the *P* value is commonly misinterpreted as the probability that the hypothesis is right. This is the so-called "prosecutor's fallacy", which confuses the two conditional probabilities *P*(data|model) for *P*(model|data). It is the latter quantity that is more directly useful and calculated by the Bayesian.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem *Nature Methods* **12**:277-278.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem *Nature Methods* **12**:277-278.

In our first column on Bayesian statistics, we introduce conditional probabilities and Bayes' theorem

*P*(B|A) = *P*(A|B) × *P*(B) / *P*(A)

This relationship between conditional probabilities *P*(B|A) and *P*(A|B) is central in Bayesian statistics. We illustrate how Bayes' theorem can be used to quickly calculate useful probabilities that are more difficult to conceptualize within a frequentist framework.

Using Bayes' theorem, we can incorporate our beliefs and prior experience about a system and update it when data are collected.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem *Nature Methods* **12**:277-278.

Oldford, R.W. & Cherry, W.H. Picturing probability: the poverty of Venn diagrams, the richness of eikosograms. (University of Waterloo, 2006)

Celebrate `pi` Day (March 14th) with splitting its digit endlessly. This year I use a treemap approach to encode the digits in the style of Piet Mondrian.

The art has been featured in Ana Swanson's Wonkblog article at the Washington Post—10 Stunning Images Show The Beauty Hidden in `pi`.

I also have art from 2013 `pi` Day and 2014 `pi` Day.

The split plot design originated in agriculture, where applying some factors on a small scale is more difficult than others. For example, it's harder to cost-effectively irrigate a small piece of land than a large one. These differences are also present in biological experiments. For example, temperature and housing conditions are easier to vary for groups of animals than for individuals.

The split plot design is an expansion on the concept of blocking—all split plot designs include at least one randomized complete block design. The split plot design is also useful for cases where one wants to increase the sensitivity in one factor (sub-plot) more than another (whole plot).

Altman, N. & Krzywinski, M. (2015) Points of Significance: Split Plot Design *Nature Methods* **12**:165-166.

1. Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments *Nature Methods* **11**:597-598.

2. Krzywinski, M. & Altman, N. (2014) Points of Significance: Analysis of variance (ANOVA) and blocking *Nature Methods* **11**:699-700.

3. Blainey, P., Krzywinski, M. & Altman, N. (2014) Points of Significance: Replication *Nature Methods* **11**:879-880.