Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - contact me Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert CurveMartin Krzywinski / Canada's Michael Smith Genome Sciences Centre / mkweb.bcgsc.ca - Pi Day 2020 - Piku
Where am I supposed to go? Where was I supposed to know?Violet Indianaget lost in questionsmore quotes


Scientific graphical abstracts — design guidelines


visualization + design

Creating the Genome Research November 2012 Cover

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Cover image accompanying Spark: A navigational paradigm for genomic data exploration. Genome Research 22 (11). (zoom, Genome Research)

The Genome Research cover design takes a fun and illustrative approach to visualization. It's both art and science — in a 4:1 ratio.

The cover image accompanies the article by Cydney Nielsen from our visualization group, describing her Spark tool for visualization epigenetics data.

Nielsen CB, Younesy H, O'Geen H, Xu X, Jackson AR, et al. (2012) Spark: A navigational paradigm for genomic data exploration. Genome Res 22: 2262-2269.

Instead of a literal depiction of output from Spark, the final design presents what appears to be necklaces of the kind of tiles that Spark uses for its visual presentation. I took a chance that Genome Research had a sense of humor. Luckily, they did and accepted the design for the cover.

Colored tiles are playfully suspended on vertical strings to illustrate how Spark, presented in this issue, uses clustering to group genomic regions (tiles) with similar data patterns (colored heatmaps) and facilitates genome-wide data exploration.Genome Research 22 (11)

The image was published on the November 2012 issue of cover of Genome Research.

Tools

Illustrator CS5, and a cup (or two) of Galileo coffee from a Rancilio Epoca.

Other Covers

I had two other covers published this year: the PNAS cover accompanied our manuscript about mouse vasculature development and the Trends in Genetics cover was commissioned.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Cover image accompanying our article on mouse vasculature development. Biology turns astrophysical. PNAS 1 May 2012; 109 (18) (zoom, how it was made, PNAS)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Cover image for the human genetics special issue. Trends in Genetics October 2012, 28 (10) (lowres, hires, how it was made, Trends in Genetics)

source of design

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
To lower this computational barrier, particularly in the early data exploration phases, Spark was developed as an interactive pattern discovery and visualization tool for epigenomic data. (Spark)

Thinking about design ideas for the cover, I looked to the kind of visual motifs that Spark used for inspiration. Immediately the colorful tiles, which represent clustered data tracks, stood out.

Spark's output is very stylized, colorful and high contrast. It was important to preserve this aesthetic in the design. I also wanted to incorporate the idea of clustering in the design, as well as the concept that the clusters represented data from different parts of the genome.

While it was not important to illustrate how Spark organizes and analyzed data explicitly — in fact, I wanted these aspects to be subtle — it was important that the cover illustration had connections to Spark at several levels.

Spark

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Many genomics techniques produce measurements that have both a value and a position on a reference genome, for example ChIP-sequencing.

Spark was created by Cydney Nielsen, who works with me at the Genome Sciences Center. It is designed to mitigate the difficulties arising from the fact that genome-wide data is typically scattered across thousands of points of interest.

Genome browsers integrate diverse data sets by plotting them as vertically stacked tracks across a common genomic x-axis. Genome browsers are designed for viewing local regions of interest (e.g. an individual gene) and are frequently used during the initial data inspection and exploration phases.

Most genome browsers support zooming along the genome coordinate. This type of overview is not always useful because it produces a summary across a continuous genomic range (e.g. chromosome 1) and not across the subset of regions that are of interest (e.g. genes on chromosome 1). Spark addresses this shortcoming and provides a way to help answer questions like: What are the common data patterns across genes start sites in my data set?

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Spark's approach to analysis and display of epigenetic data.

Spark's visualization is driven by clustering data tracks (e.g. ChIP-seq coverage) from across equivalent regions (e.g. gene start sites). The clustered tracks are displayed as heatmaps, with each row being a data track and each column a windowed region of the genome.

early comps

With fond memories of Monte Carlo simulations from my physics days, I set out to simulate some realistic-looking, but entirely synthetic, Spark cluster tiles.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A collection of synthetic Spark tiles, each 7x20.

My first idea was a design which would show these tiles falling, perhaps accumulating on a pile on the ground. Quick prototypes of this idea were disappointing. The tiles appeared flimsy and too complex, while the image was largely empty. I spent several hours messing around with the rotation and pseudo-3D layout, but could not find anything that was satisfying.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Spark tiles, falling.
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Early attempt at a design. Meh.

I thought to do this right would require a proper simulation within a 3D system.

refining the design

To address the fact that the tiles felt flimsy and overly complicated and the design lacked depth, I simplified the tile simulation to generate 5x5 tiles. These simpler representations still embodied how Spark displayed data, but did so minimally.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A second attempt at simulating Spark clusters.

To keep with the idea that the clusters come from different regions of the genome, I thought of arranging them along line segments. Unlike the design in which the tiles were falling, this constrained the layout significantly and allowed me to play with the design to make it look like the clusters were draped over it. By casting a light shadow behind each string of tiles, a subtle 3D effect could be achieved while still keeping the design within a plane.

There are 11 orientations of tiles created by rotating a thin square around the vertical axis with a slight forward tilt. There are 5 rotations to the left and right at angles 10, 26, 46, 66 and 80 degrees. The rotation was achieved using Illustrator's Extrude and Bevel 3D filter.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Layout of tiles.
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Rotated tiles with Spark clusters.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Flight and Fall by Rachel Nottingham. (artist's site)

The layout and rotation of the tiles was inspired by Flight and Fall by Rachel Nottingham, a mobile of paper birds.

I wanted to keep the layout of the spark tiles pleasant, without being too organized. I find this to be a difficult balance to achieve — natural randomness is deceptively difficult to create by hand.

final image

Four different versions of the design were submitted to Genome Research. I was happiest with the treatment in which the tiles maintained their color and the Spark clusters were projected as tones of white. This designed felt more solid and punchy — I feel like you can reach out and touch one of those strings.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Final Spark cover designs. The top left one was chosen by Genome Research.

VIEW ALL

news + thoughts

Music for the Moon: Flunk's 'Down Here / Moon Above'

Sat 29-05-2021

The Sanctuary Project is a Lunar vault of science and art. It includes two fully sequenced human genomes, sequenced and assembled by us at Canada's Michael Smith Genome Sciences Centre.

The first disc includes a song composed by Flunk for the (eventual) trip to the Moon.

But how do you send sound to space? I describe the inspiration, process and art behind the work.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The song 'Down Here / Moon Above' from Flunk's new album History of Everything Ever is our song for space. It appears on the Sanctuary genome discs, which aim to send two fully sequenced human genomes to the Moon. (more)

Browse the genome discs.

Happy 2021 `\pi` Day—
A forest of digits

Sun 14-03-2021

Celebrate `\pi` Day (March 14th) and finally see the digits through the forest.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The 26th tree in the digit forest of `\pi`. Why is there a flower on the ground?. (details)

This year is full of botanical whimsy. A Lindenmayer system forest – deterministic but always changing. Feel free to stop and pick the flowers from the ground.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The first 46 digits of `\pi` in 8 trees. There are so many more. (details)

And things can get crazy in the forest.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A forest of the digits of '\pi`, by ecosystem. (details)

Check out art from previous years: 2013 `\pi` Day and 2014 `\pi` Day, 2015 `\pi` Day, 2016 `\pi` Day, 2017 `\pi` Day, 2018 `\pi` Day and 2019 `\pi` Day.

Testing for rare conditions

Sun 30-05-2021

All that glitters is not gold. —W. Shakespeare

The sensitivity and specificity of a test do not necessarily correspond to its error rate. This becomes critically important when testing for a rare condition — a test with 99% sensitivity and specificity has an even chance of being wrong when the condition prevalence is 1%.

We discuss the positive predictive value (PPV) and how practices such as screen can increase it.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Testing for rare conditions. (read)

Altman, N. & Krzywinski, M. (2021) Points of significance: Testing for rare conditions. Nature Methods 18:224–225.

Standardization fallacy

Tue 09-02-2021

We demand rigidly defined areas of doubt and uncertainty! —D. Adams

A popular notion about experiments is that it's good to keep variability in subjects low to limit the influence of confounding factors. This is called standardization.

Unfortunately, although standardization increases power, it can induce unrealistically low variability and lead to results that do not generalize to the population of interest. And, in fact, may be irreproducible.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Standardization fallacy. (read)

Not paying attention to these details and thinking (or hoping) that standardization is always good is the "standardization fallacy". In this column, we look at how standardization can be balanced with heterogenization to avoid this thorny issue.

Voelkl, B., Würbel, H., Krzywinski, M. & Altman, N. (2021) Points of significance: Standardization fallacy. Nature Methods 18:5–6.

Graphical Abstract Design Guidelines

Fri 13-11-2020

Clear, concise, legible and compelling.

Making a scientific graphical abstract? Refer to my practical design guidelines and redesign examples to improve organization, design and clarity of your graphical abstracts.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Graphical Abstract Design Guidelines — Clear, concise, legible and compelling.