Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert Curve
Poetry is just the evidence of life. If your life is burning well, poetry is just the ashLeonard Cohenburn somethingmore quotes

data: fun



EMBO Practical Course: Bioinformatics and Genome Analysis, 5–17 June 2017.


statistics + data

Nature Methods: Points of Significance

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Points of Significance column in Nature Methods. (Launch of Points of Significance)

Access all columns for free at Statistics for Biologists Nature Collection.

A Statistics Primer and Best Practices

The Points of Significance column was launched in September 2013 as an educational resource to authors and to provide practical suggestions about best practices in statistical analysis and reporting.

This month we launch a new column "Points of Significance" devoted to statistics, a topic of profound importance for biological research, but one that often doesn’t receive the attention it deserves.

The "aura of exactitude" that often surrounds statistics is one of the main notions that the Points of Significance column will attempt to dispel, while providing useful pointers on using and evaluating statistical measures.
—Dan Evanko, Let's Give Statistics the Attention it Deserves in Biological Research

The column is co-authored with Naomi Altman (Pennsylvania State University). Paul Blainey (Broad) is a contributing co-author.

Free Access

In February 2015, Nature Methods announced that the entire Points of Significance collection will be free.

When Nature Methods launched the Points of Significance column over a year ago we were hopeful that those biologists with a limited background in statistics, or who just needed a refresher, would find it accessible and useful for helping them improve the statistical rigor of their research. We have since received comments from researchers and educators in fields ranging from biology to meteorology who say they read the column regularly and use it in their courses. Hearing that the column has had a wider impact than we anticipated has been very encouraging and we hope the column continues for quite some time.
—Dan Evanko, Points of Significance now free access

Also, in a recent post on the ofschemesandmemes blog, a new statistics collection for biologists was announced.

The pieces range from comments, to advice on very specific experimental approaches, to the entire collection of the Points of Significance columns that address basic concepts in statistics in an experimental biology context. These columns, originally published in Nature Methods thanks to Martin Krzywinski and guest editor Naomi Altman, have already proven very popular with readers and teachers. Finally, the collection presents a web tool to create box plots among other resources.
—Veronique Kiermer, Statistics for biologists—A free Nature Collection

continuity and consistency

Each column is written with continuity and consistency in mind. Our goal is to never rely on concepts that we have not previously discussed. We do not assume previous statistical knowledge—only basic math. Concepts are illustrated using practical examples that embody the ideas without extraneous complicated details. All of the figures are designed with the same approach—as simple and self-contained as possible.

VIEW ALL

news + thoughts

Happy 2017 `\pi` Day—Star Charts, Creatures Once Living and a Poem

Tue 14-03-2017


on a brim of echo,

capsized chamber
drawn into our constellation, and cooling.
—Paolo Marcazzan

Celebrate `\pi` Day (March 14th) with star chart of the digits. The charts draw 40,000 stars generated from the first 12 million digits.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
12,000,000 digits of `\pi` interpreted as a star catalogue. (details)

The 80 constellations are extinct animals and plants. Here you'll find old friends and new stories. Read about how Desmodus is always trying to escape or how Megalodon terrorizes the poor Tecopa! Most constellations have a story.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Find friends and stories among the 80 constellations of extinct animals and plants. Oh look, a Dodo guardings his eggs! (details)

This year I collaborate with Paolo Marcazzan, a Canadian poet, who contributes a poem, Of Black Body, about space and things we might find and lose there.

Check out art from previous years: 2013 `\pi` Day and 2014 `\pi` Day, 2015 `\pi` Day and and 2016 `\pi` Day.

Data in New Dimensions: convergence of art, genomics and bioinformatics

Tue 07-03-2017

Art is science in love.
— E.F. Weisslitz

A behind-the-scenes look at the making of our stereoscopic images which were at display at the AGBT 2017 Conference in February. The art is a creative collaboration with Becton Dickinson and The Linus Group.

Its creation began with the concept of differences and my writeup of the creative and design process focuses on storytelling and how concept of differences is incorporated into the art.

Oh, and this might be a good time to pick up some red-blue 3D glasses.

BD Genomics 3D art exhibit - AGBT 2017 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A stereoscopic image and its interpretive panel of single-cell transcriptomes of blood cells: diseased versus healthy control.

Interpreting P values

Thu 02-03-2017
A P value measures a sample’s compatibility with a hypothesis, not the truth of the hypothesis.

This month we continue our discussion about `P` values and focus on the fact that `P` value is a probability statement about the observed sample in the context of a hypothesis, not about the hypothesis being tested.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Interpreting P values. (read)

Given that we are always interested in making inferences about hypotheses, we discuss how `P` values can be used to do this by way of the Benjamin-Berger bound, `\bar{B}` on the Bayes factor, `B`.

Heuristics such as these are valuable in helping to interpret `P` values, though we stress that `P` values vary from sample to sample and hence many sources of evidence need to be examined before drawing scientific conclusions.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Interpreting P values. Nature Methods 14:213–214.

Background reading

Krzywinski, M. & Altman, N. (2017) Points of significance: P values and the search for significance. Nature Methods 14:3–4.

Krzywinski, M. & Altman, N. (2013) Points of significance: Significance, P values and t–tests. Nature Methods 10:1041–1042.

...more about the Points of Significance column

Snellen Charts—Typography to Really Look at

Sat 18-02-2017

Another collection of typographical posters. These ones really ask you to look.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Snellen charts designed using physical constants, Braille and elemental abundances in the universe and human body.

The charts show a variety of interesting symbols and operators found in science and math. The design is in the style of a Snellen chart and typset with the Rockwell font.

Essentials of Data Visualization—8-part video series

Fri 17-02-2017
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca

In collaboration with the Phil Poronnik and Kim Bell-Anderson at the University of Sydney, I'm delighted to share with you our 8-part video series project about thinking about drawing data and communicating science.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Essentials of Data Visualization: Thinking about drawing data and communicating science.

We've created 8 videos, each focusing on a different essential idea in data visualization: encoding, shapes, color, uncertainty, design, drawing missing or unobserved data, labels and process.

The videos were designed as teaching materials. Each video comes with a slide deck and exercises.

P values and the search for significance

Mon 16-01-2017
Little P value
What are you trying to say
Of significance?
—Steve Ziliak

We've written about P values before and warned readers about common misconceptions about them, which are so rife that the American Statistical Association itself has a long statement about them.

This month is our first of a two-part article about P values. Here we look at 'P value hacking' and 'data dredging', which are questionable practices that invalidate the correct interpretation of P values.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: P values and the search for significance. (read)

We also illustrate how P values can lead us astray by asking "What is the smallest P value we can expect if the null hypothesis is true but we have done many tests, either explicitly or implicitly?"

Incidentally, this is our first column in which the standfirst is a haiku.

Altman, N. & Krzywinski, M. (2017) Points of Significance: P values and the search for significance. Nature Methods 14:3–4.

Background reading

Krzywinski, M. & Altman, N. (2013) Points of significance: Significance, P values and t–tests. Nature Methods 10:1041–1042.

...more about the Points of Significance column

Intuitive Design

Thu 03-11-2016

Appeal to intuition when designing with value judgments in mind.

Figure clarity and concision are improved when the selection of shapes and colors is grounded in the Gestalt principles, which describe how we visually perceive and organize information.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
One of the Gestalt principles tells us that the magenta and green shapes will be perceived as as two groups, overriding the fact that the shapes within the group might be different. What the principle does not tell us is how the reader is likely to value each group. (read)

The Gestalt principles are value free. For example, they tell us how we group objects but do not speak to any meaning that we might intuitively infer from visual characteristics.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of View column: Intuitive Design. (read)

This month, we discuss how appealing to such intuitions—related to shapes, colors and spatial orientation— can help us add information to a figure as well as anticipate and encourage useful interpretations.

Krzywinski, M. (2016) Points of View: Intuitive Design. Nature Methods 13:895.

...more about the Points of View column

Regularization

Fri 04-11-2016

Constraining the magnitude of parameters of a model can control its complexity.

This month we continue our discussion about model selection and evaluation and address how to choose a model that avoids both overfitting and underfitting.

Ideally, we want to avoid having either an underfitted model, which is usually a poor fit to the training data, or an overfitted model, which is a good fit to the training data but not to new data.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Regularization (read)

Regularization is a process that penalizes the magnitude of model parameters. This is done by not only minimizing the SSE, `\mathrm{SSE} = \sum_i (y_i - \hat{y}_i)^2 `, as is done normally in a fit, but adding to this minimized quantity the sum of the mode's squared parameters, `\mathrm{SSE} + \lambda \sum_i \hat{\beta}^2_i`.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Regularization. Nature Methods 13:803-804.

Background reading

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Model Selection and Overfitting. Nature Methods 13:703-704.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. Nature Methods 13:603-604.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. Nature Methods 13:541-542.

...more about the Points of Significance column