Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - contact me Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert CurveMartin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Pi Day 2020 - Piku
Feel the vibe, feel the terror, feel the painHooverphonicMad about you, orchestrally.more quotes

Scientific graphical abstracts — design guidelines


data visualization + art

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The BC Cancer Agency’s Personalized Oncogenomics Program (POG) is a clinical research initiative applying genomic sequencing to the diagnosis and treatment of patients with incurable cancers.

Art of the Personalized Oncogenomics Program

Nature uses only the longest threads to weave her patterns, so that each small piece of her fabric reveals the organization of the entire tapestry.
— Richard Feynman

Art is Science in Love
— E.F. Weisslitz

what do the circles mean?

The legend can be printed at 4" × 6". The bitmap resolution is 600 dpi.


 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Quick legend. 5 Years of Personalized Oncogenomics Project at Canada's Michael Smith Genome Sciences Centre. The poster shows 545 cancer cases. (zoom)

a case for a visual case summary

For every case, we sequence the DNA to study the genome structure and the RNA to discover which genes are expressed and to what extent. The analysis is quite complex and brings together many steps: sequence alignment, structural variation detection, expression profiling, pathway analysis and so on. Every case is "summarized" by a lengthy report, such as the one below, which can run to over 40 pages.

Personalized Oncogenomics Program at Canada's Michael Smith Genome Sciences Center / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A report for a typical POG case is about 40–50 pages.

One of the goals of the 5-year anniversary art was to represent the cases in a way to clearly show their number, classification as well as diversity. There are many metrics that can be used and I decided to choose the case's correlation to other cancer types.

correlation to TCGA cancer database

For every POG case, the gene expression of 1,744 key genes is compared to that of 1,000's of cases in the TCGA database of cancer samples. For a given cancer type in the TCGA database (e.g. BRCA), we visualize the correlations using box plots. The box plot is ideal for showing the distribution of values in a sample.

Personalized Oncogenomics Program at Canada's Michael Smith Genome Sciences Center / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Every case is compared to a database of 1,000's of cases. Shown here are box plots for the Spearman correlation coefficient between the gene expression of the POG case and cancers of a specific type (e.g. BRCA, LUAD, etc). (zoom)

The 10 largest Spearman correlation coefficients for the case shown above are

case    corr    type     tissue
-----------------------------------------------
POG661	0.436	BRCA	 Breast
POG661	0.371	PRAD	 Urologic
POG661	0.295	OV	 Gynecologic
POG661	0.257	UCEC	 Gynecologic
POG661	0.244	LUAD	 Thoracic
POG661	0.235	CESC_CAD Gynecologic
POG661	0.225	MB_Adult Central Nervous System
POG661	0.222	KICH	 Urologic
POG661	0.219	THCA	 Endocrine
POG661	0.208	UCS	 Gynecologic

In the figure below I show how the final encoding of the correlations is done. First, the top three correlations are taken—using more generates a busy look and diminishes visual impact. The correlations are encoded as concentric rings.

Because in most cases the differences in the top 3 correlations are relatively small, differences are emphasized by non-linearly scaling the encoding (the correlations are first scaled `r^3`).

Personalized Oncogenomics Program at Canada's Michael Smith Genome Sciences Center / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Case POG661. Median gene expression correlations with different cancer types from TCGA database. (A) Top 10 correlations shown as a bar plot. Color coding is by source tissue associated with the cancer type. (B) Top 10 correlations encoded as concentric rings. The width of the ring is proportional to the correlation. (C) Top 3 correlations. (D) Top 3 correlations scaled with a power to emphasize differences. (zoom)

The type face is Proxima Nova. The colors for each tissue source are

         Gastrointestinal  234,62,144
                   Breast  237,75,51
                 Thoracic  242,130,56
              Gynecologic  253,188,61
              Soft tissue  244,217,59
                     Skin  193,216,51
                 Urologic  114,197,49
              Hematologic  29,166,68
            Head and neck  43,168,224
                Endocrine  71,82,178
   Central nervous system  127,65,146
                    Other  150,150,150

VIEW ALL

news + thoughts

Graphical Abstract Design Guidelines

Fri 13-11-2020

Clear, concise, legible and compelling.

Making a scientific graphical abstract? Refer to my practical design guidelines and redesign examples to improve organization, design and clarity of your graphical abstracts.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Graphical Abstract Design Guidelines — Clear, concise, legible and compelling.

"This data might give you a migrane"

Tue 06-10-2020

An in-depth look at my process of reacting to a bad figure — how I design a poster and tell data stories.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A poster of high BMI and obesity prevalence for 185 countries.

He said, he said — a word analysis of the 2020 Presidential Debates

Thu 01-10-2020

Building on the method I used to analyze the 2008, 2012 and 2016 U.S. Presidential and Vice Presidential debates, I explore word usagein the 2020 Debates between Donald Trump and Joe Biden.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Analysis of word usage by parts of speech for Trump and Biden reveals insight into each candidate.

Points of Significance celebrates 50th column

Mon 24-08-2020

We are celebrating the publication of our 50th column!

To all our coauthors — thank you and see you in the next column!

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance: Celebrating 50 columns of clear explanations of statistics. (read)

Uncertainty and the management of epidemics

Mon 24-08-2020

When modelling epidemics, some uncertainties matter more than others.

Public health policy is always hampered by uncertainty. During a novel outbreak, nearly everything will be uncertain: the mode of transmission, the duration and population variability of latency, infection and protective immunity and, critically, whether the outbreak will fade out or turn into a major epidemic.

The uncertainty may be structural (which model?), parametric (what is `R_0`?), and/or operational (how well do masks work?).

This month, we continue our exploration of epidemiological models and look at how uncertainty affects forecasts of disease dynamics and optimization of intervention strategies.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Uncertainty and the management of epidemics. (read)

We show how the impact of the uncertainty on any choice in strategy can be expressed using the Expected Value of Perfect Information (EVPI), which is the potential improvement in outcomes that could be obtained if the uncertainty is resolved before making a decision on the intervention strategy. In other words, by how much could we potentially increase effectiveness of our choice (e.g. lowering total disease burden) if we knew which model best reflects reality?

This column has an interactive supplemental component (download code) that allows you to explore the impact of uncertainty in `R_0` and immunity duration on timing and size of epidemic waves and the total burden of the outbreak and calculate EVPI for various outbreak models and scenarios.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Uncertainty and the management of epidemics. (Interactive supplemental materials)

Bjørnstad, O.N., Shea, K., Krzywinski, M. & Altman, N. (2020) Points of significance: Uncertainty and the management of epidemics. Nature Methods 17.

Background reading

Bjørnstad, O.N., Shea, K., Krzywinski, M. & Altman, N. (2020) Points of significance: Modeling infectious epidemics. Nature Methods 17:455–456.

Bjørnstad, O.N., Shea, K., Krzywinski, M. & Altman, N. (2020) Points of significance: The SEIRS model for infectious disease dynamics. Nature Methods 17:557–558.

Cover of Nature Genetics August 2020

Mon 03-08-2020

Our design on the cover of Nature Genetics's August 2020 issue is “Dichotomy of Chromatin in Color” . Thanks to Dr. Andy Mungall for suggesting this terrific title.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Dichotomy of Chromatin in Color. Nature Genetics, August 2020 issue. (read more)

The cover design accompanies our report in the issue Gagliardi, A., Porter, V.L., Zong, Z. et al. (2020) Analysis of Ugandan cervical carcinomas identifies human papillomavirus clade–specific epigenome and transcriptome landscapes. Nature Genetics 52:800–810.