Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - contact me Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert CurveMartin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Pi Day 2020 - Piku
In your hiding, you're alone. Kept your treasures with my bones.Coeur de Piratecrawl somewhere bettermore quotes

art: revealing


Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca
The Outbreak Poems — artistic emissions in a pandemic


data visualization + art

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The BC Cancer Agency’s Personalized Oncogenomics Program (POG) is a clinical research initiative applying genomic sequencing to the diagnosis and treatment of patients with incurable cancers.

Art of the Personalized Oncogenomics Program

Nature uses only the longest threads to weave her patterns, so that each small piece of her fabric reveals the organization of the entire tapestry.
— Richard Feynman

Art is Science in Love
— E.F. Weisslitz

what do the circles mean?

The legend can be printed at 4" × 6". The bitmap resolution is 600 dpi.


 / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Quick legend. 5 Years of Personalized Oncogenomics Project at Canada's Michael Smith Genome Sciences Centre. The poster shows 545 cancer cases. (zoom)

a case for a visual case summary

For every case, we sequence the DNA to study the genome structure and the RNA to discover which genes are expressed and to what extent. The analysis is quite complex and brings together many steps: sequence alignment, structural variation detection, expression profiling, pathway analysis and so on. Every case is "summarized" by a lengthy report, such as the one below, which can run to over 40 pages.

Personalized Oncogenomics Program at Canada's Michael Smith Genome Sciences Center / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A report for a typical POG case is about 40–50 pages.

One of the goals of the 5-year anniversary art was to represent the cases in a way to clearly show their number, classification as well as diversity. There are many metrics that can be used and I decided to choose the case's correlation to other cancer types.

correlation to TCGA cancer database

For every POG case, the gene expression of 1,744 key genes is compared to that of 1,000's of cases in the TCGA database of cancer samples. For a given cancer type in the TCGA database (e.g. BRCA), we visualize the correlations using box plots. The box plot is ideal for showing the distribution of values in a sample.

Personalized Oncogenomics Program at Canada's Michael Smith Genome Sciences Center / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Every case is compared to a database of 1,000's of cases. Shown here are box plots for the Spearman correlation coefficient between the gene expression of the POG case and cancers of a specific type (e.g. BRCA, LUAD, etc). (zoom)

The 10 largest Spearman correlation coefficients for the case shown above are

case    corr    type     tissue
-----------------------------------------------
POG661	0.436	BRCA	 Breast
POG661	0.371	PRAD	 Urologic
POG661	0.295	OV	 Gynecologic
POG661	0.257	UCEC	 Gynecologic
POG661	0.244	LUAD	 Thoracic
POG661	0.235	CESC_CAD Gynecologic
POG661	0.225	MB_Adult Central Nervous System
POG661	0.222	KICH	 Urologic
POG661	0.219	THCA	 Endocrine
POG661	0.208	UCS	 Gynecologic

In the figure below I show how the final encoding of the correlations is done. First, the top three correlations are taken—using more generates a busy look and diminishes visual impact. The correlations are encoded as concentric rings.

Because in most cases the differences in the top 3 correlations are relatively small, differences are emphasized by non-linearly scaling the encoding (the correlations are first scaled `r^3`).

Personalized Oncogenomics Program at Canada's Michael Smith Genome Sciences Center / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Case POG661. Median gene expression correlations with different cancer types from TCGA database. (A) Top 10 correlations shown as a bar plot. Color coding is by source tissue associated with the cancer type. (B) Top 10 correlations encoded as concentric rings. The width of the ring is proportional to the correlation. (C) Top 3 correlations. (D) Top 3 correlations scaled with a power to emphasize differences. (zoom)

The type face is Proxima Nova. The colors for each tissue source are

         Gastrointestinal  234,62,144
                   Breast  237,75,51
                 Thoracic  242,130,56
              Gynecologic  253,188,61
              Soft tissue  244,217,59
                     Skin  193,216,51
                 Urologic  114,197,49
              Hematologic  29,166,68
            Head and neck  43,168,224
                Endocrine  71,82,178
   Central nervous system  127,65,146
                    Other  150,150,150

VIEW ALL

news + thoughts

Virus Mutations Reveal How COVID-19 Really Spread

Mon 04-05-2020

Genetic sequences of the coronavirus tell story of when the virus arrived in each country and where it came from.

Our graphic in Scientific American's Graphic Science section in the June 2020 issue shows a phylogenetic tree based on a snapshot of the data model from Nextstrain as of 31 March 2020.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Virus Mutations Reveal How COVID-19 Really Spread. Text by Mark Fischetti (Senior Editor), art direction by Jen Christiansen (Senior Graphics Editor), source: Nextstrain (enabled by data from GISAID).

Cover of Nature Cancer April 2020

Mon 27-04-2020

Our design on the cover of Nature Cancer's April 2020 issue shows mutation spectra of patients from the POG570 cohort of 570 individuals with advanced metastatic cancer.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Each ellipse system represents the mutation spectrum of an individual patient. Individual ellipses in the system correspond to the number of base changes in a given class and are layered by mutation count. Ellipse angle is controlled by the proportion of mutations in a class within the sample and its size is determined by a sigmoid mapping of mutation count scaled within the layer. The opacity of each system represents the duration since the diagnosis of advanced disease. (read more)

The cover design accompanies our report in the issue Pleasance, E., Titmuss, E., Williamson, L. et al. (2020) Pan-cancer analysis of advanced patient tumors reveals interactions between therapy and genomic landscapes. Nat Cancer 1:452–468.

Modeling infectious epidemics

Wed 06-05-2020

Every day sadder and sadder news of its increase. In the City died this week 7496; and of them, 6102 of the plague. But it is feared that the true number of the dead this week is near 10,000 ....
—Samuel Pepys, 1665

This month, we begin a series of columns on epidemiological models. We start with the basic SIR model, which models the spread of an infection between three groups in a population: susceptible, infected and recovered.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Modeling infectious epidemics. (read)

We discuss conditions under which an outbreak occurs, estimates of spread characteristics and the effects that mitigation can play on disease trajectories. We show the trends that arise when "flattenting the curve" by decreasing `R_0`.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Modeling infectious epidemics. (read)

This column has an interactive supplemental component that allows you to explore how the model curves change with parameters such as infectious period, basic reproduction number and vaccination level.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Modeling infectious epidemics. (Interactive supplemental materials)

Bjørnstad, O.N., Shea, K., Krzywinski, M. & Altman, N. (2020) Points of significance: Modeling infectious epidemics. Nature Methods 17:455–456.

The Outbreak Poems

Sat 04-04-2020

I'm writing poetry daily to put my feelings into words more often during the COVID-19 outbreak.

That moment
when
you know a moment.
Branch to branch,
flit,
look everywhere,
chirp.
Memory,
scent
of thought fleeting.
Distant pasts
all
ways in plural
form.

Read the poems and learn what a piku is.

Deadly Genomes: Genome Structure and Size of Harmful Bacteria and Viruses

Tue 17-03-2020

A poster full of epidemiological worry and statistics. Now updated with the genome of SARS-CoV-2 and COVID-19 case statistics as of 3 March 2020.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Deadly Genomes: Genome Structure and Size of Harmful Bacteria and Viruses (zoom)

Bacterial and viral genomes of various diseases are drawn as paths with color encoding local GC content and curvature encoding local repeat content. Position of the genome encodes prevalence and mortality rate.

The deadly genomes collection has been updated with a posters of the genomes of SARS-CoV-2, the novel coronavirus that causes COVID-19.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Genomes of 56 SARS-CoV-2 coronaviruses that causes COVID-19.
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Ball of 56 SARS-CoV-2 coronaviruses that causes COVID-19.
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The first SARS-CoV-2 genome (MT019529) to be sequenced appears first on the poster.

Using Circos in Galaxy Australia Workshop

Wed 04-03-2020

A workshop in using the Circos Galaxy wrapper by Hiltemann and Rasche. Event organized by Australian Biocommons.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Using Circos in Galaxy Australia workshop. (zoom)

Download workshop slides.

Galaxy wrapper training materials, Saskia Hiltemann, Helena Rasche, 2020 Visualisation with Circos (Galaxy Training Materials).