Tango is a sad thought that is danced.
•
• think & dance

Bioinformatics and Genome Analysis Course. Izmir International Biomedicine and Genome Institute, Izmir, Turkey. May 2–14, 2016

Access all columns for free at Statistics for Biologists Nature Collection.

The Points of Significance column was launched in September 2013 as an educational resource to authors and to provide practical suggestions about best practices in statistical analysis and reporting.

This month we launch a new column "Points of Significance" devoted to statistics, a topic of profound importance for biological research, but one that often doesn’t receive the attention it deserves.

The "aura of exactitude" that often surrounds statistics is one of the main notions that the Points of Significance column will attempt to dispel, while providing useful pointers on using and evaluating statistical measures.

—Dan Evanko, Let's Give Statistics the Attention it Deserves in Biological Research

The column is co-authored with Naomi Altman (Pennsylvania State University). Paul Blainey (Broad) is a contributing co-author.

In February 2015, Nature Methods announced that the entire Points of Significance collection will be free.

When Nature Methods launched the Points of Significance column over a year ago we were hopeful that those biologists with a limited background in statistics, or who just needed a refresher, would find it accessible and useful for helping them improve the statistical rigor of their research. We have since received comments from researchers and educators in fields ranging from biology to meteorology who say they read the column regularly and use it in their courses. Hearing that the column has had a wider impact than we anticipated has been very encouraging and we hope the column continues for quite some time.

—Dan Evanko, Points of Significance now free access

Also, in a recent post on the ofschemesandmemes blog, a new statistics collection for biologists was announced.

The pieces range from comments, to advice on very specific experimental approaches, to the entire collection of the Points of Significance columns that address basic concepts in statistics in an experimental biology context. These columns, originally published in Nature Methods thanks to Martin Krzywinski and guest editor Naomi Altman, have already proven very popular with readers and teachers. Finally, the collection presents a web tool to create box plots among other resources.

—Veronique Kiermer, Statistics for biologists—A free Nature Collection

Each column is written with continuity and consistency in mind. Our goal is to never rely on concepts that we have not previously discussed. We do not assume previous statistical knowledge—only basic math. Concepts are illustrated using practical examples that embody the ideas without extraneous complicated details. All of the figures are designed with the same approach—as simple and self-contained as possible.

I was commissioned by Scientific American to create an information graphic based on Figure 9 in the landmark Nature Integrative analysis of 111 reference human epigenomes paper.

The original figure details the relationships between more than 100 sequenced epigenomes and genetic traits, including disease like Crohn's and Alzheimer's. These relationships were shown as a heatmap in which the epigenome-trait cell depicted the *P* value associated with tissue-specific H3K4me1 epigenetic modification in regions of the genome associated with the trait.

As much as I distrust network diagrams, in this case this was the right way to show the data. The network was meticulously laid out by hand to draw attention to the layered groups of diseases of traits.

This was my second information graphic for the Graphic Science page. Last year, I illustrated the extent of differences in the gene sequence of humans, Denisovans, chimps and gorillas.

The bootstrap is a computational method that simulates new sample from observed data. These simulated samples can be used to determine how estimates from replicate experiments might be distributed and answer questions about precision and bias.

We discuss both parametric and non-parametric bootstrap. In the former, observed data are fit to a model and then new samples are drawn using the model. In the latter, no model assumption is made and simulated samples are drawn with replacement from the observed data.

Kulesa, A., Krzywinski, M., Blainey, P. & Altman, N (2015) Points of Significance: Sampling distributions and the bootstrap *Nature Methods* **12**:477-478.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Importance of being uncertain. *Nature Methods* **10**:809-810.

Building on last month's column about Bayes' Theorem, we introduce Bayesian inference and contrast it to frequentist inference.

Given a hypothesis and a model, the frequentist calculates the probability of different data generated by the model, *P*(data|model). When this probability to obtain the observed data from the model is small (e.g. `alpha` = 0.05), the frequentist rejects the hypothesis.

In contrast, the Bayesian makes direct probability statements about the model by calculating P(model|data). In other words, given the observed data, the probability that the model is correct. With this approach it is possible to relate the probability of different models to identify one that is most compatible with the data.

The Bayesian approach is actually more intuitive. From the frequentist point of view, the probability used to assess the veracity of a hypothesis, P(data|model), commonly referred to as the *P* value, does not help us determine the probability that the model is correct. In fact, the *P* value is commonly misinterpreted as the probability that the hypothesis is right. This is the so-called "prosecutor's fallacy", which confuses the two conditional probabilities *P*(data|model) for *P*(model|data). It is the latter quantity that is more directly useful and calculated by the Bayesian.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem *Nature Methods* **12**:277-278.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem *Nature Methods* **12**:277-278.

In our first column on Bayesian statistics, we introduce conditional probabilities and Bayes' theorem

*P*(B|A) = *P*(A|B) × *P*(B) / *P*(A)

This relationship between conditional probabilities *P*(B|A) and *P*(A|B) is central in Bayesian statistics. We illustrate how Bayes' theorem can be used to quickly calculate useful probabilities that are more difficult to conceptualize within a frequentist framework.

Using Bayes' theorem, we can incorporate our beliefs and prior experience about a system and update it when data are collected.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem *Nature Methods* **12**:277-278.

Oldford, R.W. & Cherry, W.H. Picturing probability: the poverty of Venn diagrams, the richness of eikosograms. (University of Waterloo, 2006)

Celebrate `pi` Day (March 14th) with splitting its digit endlessly. This year I use a treemap approach to encode the digits in the style of Piet Mondrian.

The art has been featured in Ana Swanson's Wonkblog article at the Washington Post—10 Stunning Images Show The Beauty Hidden in `pi`.

I also have art from 2013 `pi` Day and 2014 `pi` Day.

The split plot design originated in agriculture, where applying some factors on a small scale is more difficult than others. For example, it's harder to cost-effectively irrigate a small piece of land than a large one. These differences are also present in biological experiments. For example, temperature and housing conditions are easier to vary for groups of animals than for individuals.

The split plot design is an expansion on the concept of blocking—all split plot designs include at least one randomized complete block design. The split plot design is also useful for cases where one wants to increase the sensitivity in one factor (sub-plot) more than another (whole plot).

Altman, N. & Krzywinski, M. (2015) Points of Significance: Split Plot Design *Nature Methods* **12**:165-166.

1. Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments *Nature Methods* **11**:597-598.

2. Krzywinski, M. & Altman, N. (2014) Points of Significance: Analysis of variance (ANOVA) and blocking *Nature Methods* **11**:699-700.

3. Blainey, P., Krzywinski, M. & Altman, N. (2014) Points of Significance: Replication *Nature Methods* **11**:879-880.

In an audience of 8 men and 8 women, chances are 50% that at least one has some degree of color blindness^{1}. When encoding information or designing content, use colors that is color-blind safe.

Nature Methods has announced the launch of a new statistics collection for biologists.

As part of that collection, announced that the entire Points of Significance collection is now open access.

This is great news for educators—the column can now be freely distributed in classrooms.