Tango is a sad thought that is danced.think & dance

# science: exciting

Bioinformatics and Genome Analysis Course. Izmir International Biomedicine and Genome Institute, Izmir, Turkey. May 2–14, 2016

# Nature Methods: Points of Significance

Points of Significance column in Nature Methods. (Launch of Points of Significance)

## A Statistics Primer and Best Practices

The Points of Significance column was launched in September 2013 as an educational resource to authors and to provide practical suggestions about best practices in statistical analysis and reporting.

This month we launch a new column "Points of Significance" devoted to statistics, a topic of profound importance for biological research, but one that often doesn’t receive the attention it deserves.

The "aura of exactitude" that often surrounds statistics is one of the main notions that the Points of Significance column will attempt to dispel, while providing useful pointers on using and evaluating statistical measures.
—Dan Evanko, Let's Give Statistics the Attention it Deserves in Biological Research

The column is co-authored with Naomi Altman (Pennsylvania State University). Paul Blainey (Broad) is a contributing co-author.

## Free Access

In February 2015, Nature Methods announced that the entire Points of Significance collection will be free.

When Nature Methods launched the Points of Significance column over a year ago we were hopeful that those biologists with a limited background in statistics, or who just needed a refresher, would find it accessible and useful for helping them improve the statistical rigor of their research. We have since received comments from researchers and educators in fields ranging from biology to meteorology who say they read the column regularly and use it in their courses. Hearing that the column has had a wider impact than we anticipated has been very encouraging and we hope the column continues for quite some time.
—Dan Evanko, Points of Significance now free access

Also, in a recent post on the ofschemesandmemes blog, a new statistics collection for biologists was announced.

The pieces range from comments, to advice on very specific experimental approaches, to the entire collection of the Points of Significance columns that address basic concepts in statistics in an experimental biology context. These columns, originally published in Nature Methods thanks to Martin Krzywinski and guest editor Naomi Altman, have already proven very popular with readers and teachers. Finally, the collection presents a web tool to create box plots among other resources.
—Veronique Kiermer, Statistics for biologists—A free Nature Collection

## continuity and consistency

Each column is written with continuity and consistency in mind. Our goal is to never rely on concepts that we have not previously discussed. We do not assume previous statistical knowledge—only basic math. Concepts are illustrated using practical examples that embody the ideas without extraneous complicated details. All of the figures are designed with the same approach—as simple and self-contained as possible.

# Gene Volume Control

Thu 11-06-2015

I was commissioned by Scientific American to create an information graphic based on Figure 9 in the landmark Nature Integrative analysis of 111 reference human epigenomes paper.

The original figure details the relationships between more than 100 sequenced epigenomes and genetic traits, including disease like Crohn's and Alzheimer's. These relationships were shown as a heatmap in which the epigenome-trait cell depicted the P value associated with tissue-specific H3K4me1 epigenetic modification in regions of the genome associated with the trait.

Figure 9 from Integrative analysis of 111 reference human epigenomes (Nature (2015) 518 317–330). (details)

As much as I distrust network diagrams, in this case this was the right way to show the data. The network was meticulously laid out by hand to draw attention to the layered groups of diseases of traits.

Network diagram redesign of the heatmap for a select set of traits. Only relationships with –log P > 3.9 are displayed. Appears on Graphic Science page in June 2015 issue of Scientific American. (details)

This was my second information graphic for the Graphic Science page. Last year, I illustrated the extent of differences in the gene sequence of humans, Denisovans, chimps and gorillas.

# Sampling distributions and the bootstrap

Thu 11-06-2015

The bootstrap is a computational method that simulates new sample from observed data. These simulated samples can be used to determine how estimates from replicate experiments might be distributed and answer questions about precision and bias.

Nature Methods Points of Significance column: Sampling distributions and the bootstrap. (read)

We discuss both parametric and non-parametric bootstrap. In the former, observed data are fit to a model and then new samples are drawn using the model. In the latter, no model assumption is made and simulated samples are drawn with replacement from the observed data.

Kulesa, A., Krzywinski, M., Blainey, P. & Altman, N (2015) Points of Significance: Sampling distributions and the bootstrap Nature Methods 12:477-478.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Importance of being uncertain. Nature Methods 10:809-810.

# Bayesian statistics

Thu 30-04-2015

Building on last month's column about Bayes' Theorem, we introduce Bayesian inference and contrast it to frequentist inference.

Given a hypothesis and a model, the frequentist calculates the probability of different data generated by the model, P(data|model). When this probability to obtain the observed data from the model is small (e.g. alpha = 0.05), the frequentist rejects the hypothesis.

Nature Methods Points of Significance column: Bayesian Statistics. (read)

In contrast, the Bayesian makes direct probability statements about the model by calculating P(model|data). In other words, given the observed data, the probability that the model is correct. With this approach it is possible to relate the probability of different models to identify one that is most compatible with the data.

The Bayesian approach is actually more intuitive. From the frequentist point of view, the probability used to assess the veracity of a hypothesis, P(data|model), commonly referred to as the P value, does not help us determine the probability that the model is correct. In fact, the P value is commonly misinterpreted as the probability that the hypothesis is right. This is the so-called "prosecutor's fallacy", which confuses the two conditional probabilities P(data|model) for P(model|data). It is the latter quantity that is more directly useful and calculated by the Bayesian.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem Nature Methods 12:277-278.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem Nature Methods 12:277-278.

# Bayes' Theorem

Wed 22-04-2015

In our first column on Bayesian statistics, we introduce conditional probabilities and Bayes' theorem

P(B|A) = P(A|B) × P(B) / P(A)

This relationship between conditional probabilities P(B|A) and P(A|B) is central in Bayesian statistics. We illustrate how Bayes' theorem can be used to quickly calculate useful probabilities that are more difficult to conceptualize within a frequentist framework.

Nature Methods Points of Significance column: Bayes' Theorem. (read)

Using Bayes' theorem, we can incorporate our beliefs and prior experience about a system and update it when data are collected.

Puga, J.L, Krzywinski, M. & Altman, N. (2015) Points of Significance: Bayes' Theorem Nature Methods 12:277-278.

Oldford, R.W. & Cherry, W.H. Picturing probability: the poverty of Venn diagrams, the richness of eikosograms. (University of Waterloo, 2006)

# Happy 2015 Pi Day—can you see pi through the treemap?

Sat 14-03-2015

Celebrate pi Day (March 14th) with splitting its digit endlessly. This year I use a treemap approach to encode the digits in the style of Piet Mondrian.

Digits of pi, phi and e. (details)

The art has been featured in Ana Swanson's Wonkblog article at the Washington Post—10 Stunning Images Show The Beauty Hidden in pi.

I also have art from 2013 pi Day and 2014 pi Day.

# Split Plot Design

Tue 03-03-2015

The split plot design originated in agriculture, where applying some factors on a small scale is more difficult than others. For example, it's harder to cost-effectively irrigate a small piece of land than a large one. These differences are also present in biological experiments. For example, temperature and housing conditions are easier to vary for groups of animals than for individuals.

Nature Methods Points of Significance column: Split plot design. (read)

The split plot design is an expansion on the concept of blocking—all split plot designs include at least one randomized complete block design. The split plot design is also useful for cases where one wants to increase the sensitivity in one factor (sub-plot) more than another (whole plot).

Altman, N. & Krzywinski, M. (2015) Points of Significance: Split Plot Design Nature Methods 12:165-166.

1. Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments Nature Methods 11:597-598.

2. Krzywinski, M. & Altman, N. (2014) Points of Significance: Analysis of variance (ANOVA) and blocking Nature Methods 11:699-700.

3. Blainey, P., Krzywinski, M. & Altman, N. (2014) Points of Significance: Replication Nature Methods 11:879-880.

# Color palettes for color blindness

Tue 03-03-2015

In an audience of 8 men and 8 women, chances are 50% that at least one has some degree of color blindness1. When encoding information or designing content, use colors that is color-blind safe.

A 12-color palette safe for color blindness

# Points of Significance Column Now Open Access

Tue 10-02-2015

Nature Methods has announced the launch of a new statistics collection for biologists.

Nature Methods Points of Significance column is now open access. (column archive)

As part of that collection, announced that the entire Points of Significance collection is now open access.

This is great news for educators—the column can now be freely distributed in classrooms.