Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography
Thoughts rearrange, familiar now strange.Holly Golightly & The Greenhornes break flowers

science: exciting



Circos at British Library Beautiful Science exhibit—Feb 20–May 26


communication + science

For the month of August 2013, the entire set of 35 columns is available for free.

Nature Methods: Points of View

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The full collection of a 35 Points of View column is now available. (3 years of Points of View)

Practical Tips for Effective Figures

Points of View — History

In its 2.5 year history, the PoV column has established a significant legacy— it is one of the most frequently accessed parts of Nature Methods. The reason I think is clear: the community sees the value in clear and effective visual communication and acknowledges the need for a forum in which best practices in the field are presented practically and accessibly.

Bang Wong, in collaboration with visiting authors (Noam Shoresh, Nils Gehlenborg, Cydney Nielsen and Rikke Schmidt Kjærgaard), has penned 29 columns in the period of August 2010 to December 2012, covering broad topics such as salience, Gestalt principles, color, typography, negative space, layout, and data integration.

When it was A.C. Greyling's turn to speak at a debate in which Christopher Hitchens and Richard Dawkins already made their points, Greyling said

When one gets up to speak this late in a debate, one is a bit tempated to quote that Hungarian M.P. who after a long, long, long discussion in the parliament in Budapest stood up and said, "Everything has been said but not everybody said it yet." (watch on YouTube)

Indeed, this is quite how I feel after being offered to be the new author of Nature Methods Point of View column. Both Bang and Hitchens provide significant inspiration for me, so Greyling's words are particularly fitting.

To improve on the column is impossible. My challenge is to identify useful topics that have not yet been covered. I will be working closely with Nature Methods and Bang to ensure that the columns strike the right balance of topic, tone and timbre.

Don't hesitate to let me know whether PoV continues to hold your interest.

Nature Editors Announce Return of Points of View

The announcement of the return of the column, together with its history and a description of me, the new author, are available at the Nature Methods methagora blog.

Humor is kept by repeated reference to my now-dead-but-once-famous pet rat.

Points of View Collection now Open Access

For the month of August 2013, the entire set of 35 columns is available for free.

Common Challenges in Figure Design

Andreas Dahlin runs a figure making course at Uppsala University. He was kind to share with me common questions and concerns that his students have when creating figures (emphasis is mine).

I face problems for using the tools in power point to make nice illustration figures, and in addition how one can enhance the resolution of the figures to print it in a high quality mode.

In my opinion, the most difficult thing is how to draw the good-looking pictures and design the structure of slide to make it simple and substantial in content.

I find it difficult to find the right software to draw pictures.

The most difficult thing for me, when I make a figure, is to arrange the parts of the figure in a way they look nice and understandable.

I think the most difficult part is creating the concept, how to make a figure easy and fast to understand but not lacking all essential parts.

Stepping outside of my own knowledge of what the picture presents and viewing it as someone who sees it for the first time. It's easy to assume that some things are self evident and not making them clear enough in the pictures.

Figures that not are plots can also be tricky to get to look nice.

Anytime you have to draw something in paint, gimp, or other image program it requires a lot of work to make it look even slightly better than crap.

The most difficult thing (in general) is to include as much information as possible and display it in a way that is easy to understand. Figures should be intuitive for the reader, which is sometimes difficult to achieve. There might also be technical difficulties in achieving what you've visualized.

I think the most difficult part for me is to highlight the main idea I would like to express.

For me the most difficult part is making 3-D figures. Also while making figures its hard to decide on the good colors to choose for the figure.

In my opinion, the most difficult part when making a figure is don't know which software we can use and how to use.

The most difficult part for me is to start it! Because I am so meticulous and I am a painter, then it is not so easy to make decision about my figures and which one is better and so on, then finally I give up and put just one figure which of course I don't like...

I think it is difficult to put together my ideas to something that is connected and makes it easier for the viewer to understand.

It is so easy to just get an image from internet. I don’t know what is ok to do. There seems to be different rules in different communities.

To come up with a figure that does not simplify the concept too much at the same time as it does not overwhelm the viewer. To get some ideas for this is the reason why I take the course. ;-)

To me, how to make it easy to understand is the difficult part.

I think it is to save it in the correct format: Raster or vector, png or jpg or pdf... especially if I want to make some changes in the future to the figure.

I think is to choose the most appropriate figure that really help to transmit the information we want. Then, how many words can be good enough for been part of the message. At the beginning I used to use too many.

Apart from the difficulty of making the figure clear and easy to understand, the biggest problem I'm having is the captions. How long and detailed description is appropriate, so it neither steals attention from the figure nor leaves out too much important information.

I think the most difficult part is to have high resolution image once we want to save it. My experience is when finish with drawing, the file size sometimes to large for high quality image and if we downgrade it, the image becomes bad.

The most difficult part when i making a figure is the software using part, I'm not good at computer so that part is annoying for me all the time.

I think the most difficult is to find out how to condensate many ideas in one picture without making it difficult to understand.

The most difficult part is the get the image to not look too amateurish that people focus on that instead of the message.

The most difficult part when doing a figure is to let it speak for itself, i.e. to not have long caption text.

To be able to depict all the desirable results on a single figure is sometimes not that easy. It becomes more critical when a figure is to be fitted within a certain size frame. An exact placing of a figure in some text editors often comes along with difficulties.

The most difficult part when making a figure is to make it simple and still be informative.

Depends a lot on the kind of figure, but generally it is to get clarity in the design, such that the idea is conceived easily. This requires some good outline (usually an iterative process).

The most difficult part to make a figure is the need to express abstract concepts into drawings.

The compromise between include detailed information and at the same time be readable (figures in articles)

To compress all information and ideas you have in your head into short and clear message.

I feel the difficulty in choosing a right resolution of the picture and the angle that could visualize all the details. And also choosing right test/label colour, size, font. Another difficulty for me is continuation from one slide to another.

I believe that my biggest problem would be making nice flux charts. Generally the ones I draw look too crude, it does not look beautiful. I have no concern about making an image that can represent an idea, but making a beautiful image makes it more pleasing to the eyes of the people who will read my work.

It is very difficult to make the figure delicate. I am still not get used to put all the small components together to integrate the figure by the vector software, instead of drawing it out directly.

I think the most difficult part is to make the image simple but yet informative.

I find it very difficult to make an original clarity picture in a particular format after dimensioning it according to the requirement.

Some times it is difficult to limit the size (Bytes) of the picture when going for high clarity remake.

Making the figure as informative as you want while keeping it simple enough to grasp quickly.

For me, the more difficult part is to create a figure that contains or tells all the information that I want to transmit, but keeping the figure simple, clean and not overloaded.

The most difficult for me is make it easily to be understood meanwhile containing the essential information.

The most difficult thing when developing a figure is ... to remove the bloat but keep the message. (Besides the very most difficult: finding out what I want to tell.)

For me the most difficult part is to choose colors with right contrast and to make it more attractive and catchy.

points of view — bibliography

1Streit M, Gehlenborg N 2014 Bar charts and box plots Nat Methods 11:117.
2Krzywinski M, Cairo A 2013 Storytelling Nat Methods 10:687-687.
3Krzywinski M, Savig E 2013 Multidimensional Data Nat Methods 10:595-595.
4Krzywinski M, Wong B 2013 Plotting symbols Nat Methods 10:451.
5Krzywinski M 2013 Elements of visual style Nat Methods 10:371.
6Krzywinski M 2013 Labels and callouts Nat Methods 10:275.
7Krzywinski M 2013 Axes, ticks and grids Nat Methods 10:183.
8Wong B 2012 Visualizing biological data Nat Methods 9:1131.
9Wong B, Kjaegaard RS 2012 Pencil and paper Nat Methods 9:1037.
10Gehlenborg N, Wong B 2012 Power of the plane Nat Methods 9:935.
11Gehlenborg N, Wong B 2012 Into the third dimension Nat Methods 9:851.
12Gehlenborg N, Wong B 2012 Mapping quantitative data to color Nat Methods 9:769.
13Nielsen C, Wong B 2012 Representing genomic structural variation Nat Methods 9:631.
14Nielsen C, Wong B 2012 Managing deep data in genome browsers Nat Methods 9:521.
15Nielsen C, Wong B 2012 Representing the genome Nat Methods 9:423.
16Gehlenborg N, Wong B 2012 Integrating data Nat Methods 9:315.
17Gehlenborg N, Wong B 2012 Heat maps Nat Methods 9:213.
18Gehlenborg N, Wong B 2012 Networks Nat Methods 9:115.
19Shoresh N, Wong B 2012 Data exploration Nat Methods 9:5.
20Wong B 2011 The design process Nat Methods 8:987.
21Wong B 2011 Salience to relevance Nat Methods 8:889.
22Wong B 2011 Layout Nat Methods 8:783.
23Wong B 2011 Arrows Nat Methods 8:701.
24Wong B 2011 Simplify to clarify Nat Methods 8:611.
25Wong B 2011 Avoiding color Nat Methods 8:525.
26Wong B 2011 Color blindness Nat Methods 8:441.
27Wong B 2011 The overview figure Nat Methods 8:365.
28Wong B 2011 Typography Nat Methods 8:277.
29Wong B 2011 Points of review (part 2) Nat Methods 8:189.
30Wong B 2011 Points of review (part 1) Nat Methods 8:101.
31Wong B 2011 Negative space Nat Methods 8:5.
32Wong B 2010 Gestalt principles (Part 2) Nat Methods 7:941.
33Wong B 2010 Gestalt principles (part 1) Nat Methods 7:863.
34Wong B 2010 Salience Nat Methods 7:773.
35Wong B 2010 Design of data figures Nat Methods 7:665.
36Wong B 2010 Color coding Nat Methods 7:573.

Visualization + Design Resources

Projects

Circos — circular whole-genome information graphics

Circos table viewer — display of tabular data in circular form

Hive plots — rational, quantitative and reproducible network visualization

High dynamic time range photography (HDTR) — imaging the flow of time

Instruction, Tutorials and Talks

Visual Design Principles keynote at VIZBI 2013.

Science design talk at Bloomberg Design Conference 2013 (Bloomberg TV video, and video conversation with Alberto Cairo, moderated by Sam Grobart.)

Needles in stacks of needles keynote at IEEE International Conference on Data Mining 2012

20 imperatives of information design poster at Biovis 2012

Data Visualization: Communicating Clearly talk at Schloss Dahstuhl 2012 Data Visualization in Biology workshop

Brewer palettes — benefit of using perceptual color spaces

Color palettes matter (talk) — learn about color, color spaces and why they matter

Visualization principles tutorial at Vizbi 2012 — learn how we visually interpret and organize information and how to apply these principles to creating figures and software interfaces

Effect of resolution on sequence visualization (handout) — understand how output resolution affects display of highly textured genomic annotations

PSA Genomics Workshop 2011: Designing Effective Visualizations in Biology and Circos and Hive Plots: Challenging visualization paradigms in genomics and network analysis.

news + thoughts

Happy Pi Approximation Day— π, roughly speaking 10,000 times

Wed 23-07-2014

Celebrate Pi Approximation Day (July 22nd) with the art arm waving. This year I take the first 10,000 most accurate approximations (m/n, m=1..10,000) and look at their accuracy.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Accuracy of the first 10,000 m/n approximations of Pi. (details)

I turned to the spiral again after applying it to stack stacked ring plots of frequency distributions in Pi for the 2014 Pi Day.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Frequency distribution of digits of Pi in groups of 4 up to digit 4,988. (details)

Analysis of Variance (ANOVA) and Blocking—Accounting for Variability in Multi-factor Experiments

Mon 07-07-2014

Our 10th Points of Significance column! Continuing with our previous discussion about comparative experiments, we introduce ANOVA and blocking. Although this column appears to introduce two new concepts (ANOVA and blocking), you've seen both before, though under a different guise.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Analysis of variance (ANOVA) and blocking. (read)

If you know the t-test you've already applied analysis of variance (ANOVA), though you probably didn't realize it. In ANOVA we ask whether the variation within our samples is compatible with the variation between our samples (sample means). If the samples don't all have the same mean then we expect the latter to be larger. The ANOVA test statistic (F) assigns significance to the ratio of these two quantities. When we only have two-samples and apply the t-test, t2 = F.

ANOVA naturally incorporates and partitions sources of variation—the effects of variables on the system are determined based on the amount of variation they contribute to the total variation in the data. If this contribution is large, we say that the variation can be "explained" by the variable and infer an effect.

We discuss how data collection can be organized using a randomized complete block design to account for sources of uncertainty in the experiment. This process is called blocking because we are blocking the variation from a known source of uncertainty from interfering with our measurements. You've already seen blocking in the paired t-test example, in which the subject (or experimental unit) was the block.

We've worked hard to bring you 20 pages of statistics primers (though it feels more like 200!). The column is taking a month off in August, as we shrink our error bars.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Analysis of Variance (ANOVA) and Blocking Nature Methods 11:699-700.

Background reading

Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments Nature Methods 11:597-598.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.

...more about the Points of Significance column

Designing Experiments—Coping with Biological and Experimental Variation

Thu 29-05-2014

This month, Points of Significance begins a series of articles about experimental design. We start by returning to the two-sample and paired t-tests for a discussion of biological and experimental variability.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Designing Comparative Experiments. (read)

We introduce the concept of blocking using the paired t-test as an example and show how biological and experimental variability can be related using the correlation coefficient, ρ, and how its value imapacts the relative performance of the paired and two-sample t-tests.

We also emphasize that when reporting data analyzed with the paired t-test, differences in sample means (and their associated 95% CI error bars) should be shown—not the original samples—because the correlation in the samples (and its benefits) cannot be gleaned directly from the sample data.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Designing Comparative Experiments Nature Methods 11:597-598.

Background reading

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.

Have skew, will test

Wed 28-05-2014

Our May Points of Significance Nature Methods column jumps straight into dealing with skewed data with Non Parametric Tests.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Non Parametric Testing. (read)

We introduce non-parametric tests and simulate data scenarios to compare their performance to the t-test. You might be surprised—the t-test is extraordinarily robust to distribution shape, as we've discussed before. When data is highly skewed, non-parametric tests perform better and with higher power. However, if sample sizes are small they are limited to a small number of possible P values, of which none may be less than 0.05!

Krzywinski, M. & Altman, N. (2014) Points of Significance: Non Parametric Testing Nature Methods 11:467-468.

Background reading

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.

Mind your p's and q's

Sat 29-03-2014

In the April Points of Significance Nature Methods column, we continue our and consider what happens when we run a large number of tests.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Comparing Samples — Part II — Multiple Testing. (read)

Observing statistically rare test outcomes is expected if we run enough tests. These are statistically, not biologically, significant. For example, if we run N tests, the smallest P value that we have a 50% chance of observing is 1–exp(–ln2/N). For N = 10k this P value is Pk=10kln2 (e.g. for 104=10,000 tests, P4=6.9×10–5).

We discuss common correction schemes such as Bonferroni, Holm, Benjamini & Hochberg and Storey's q and show how they impact the false positive rate (FPR), false discovery rate (FDR) and power of a batch of tests.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part II — Multiple Testing Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I — t-tests Nature Methods 11:215-216.

Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.