Martin Krzywinski / Genome Sciences Center / Martin Krzywinski / Genome Sciences Center / - contact me Martin Krzywinski / Genome Sciences Center / on Twitter Martin Krzywinski / Genome Sciences Center / - Lumondo Photography Martin Krzywinski / Genome Sciences Center / - Pi Art Martin Krzywinski / Genome Sciences Center / - Hilbertonians - Creatures on the Hilbert Curve
Tango is a sad thought that is danced.Enrique Santos Discépolothink & dancemore quotes

spark: exciting

EMBO Practical Course: Bioinformatics and Genome Analysis, 5–17 June 2017.

visualization + design

Creating the Genome Research November 2012 Cover

Martin Krzywinski @MKrzywinski
Cover image accompanying Spark: A navigational paradigm for genomic data exploration. Genome Research 22 (11). (zoom, Genome Research)

The Genome Research cover design takes a fun and illustrative approach to visualization. It's both art and science — in a 4:1 ratio.

The cover image accompanies the article by Cydney Nielsen from our visualization group, describing her Spark tool for visualization epigenetics data.

Nielsen CB, Younesy H, O'Geen H, Xu X, Jackson AR, et al. (2012) Spark: A navigational paradigm for genomic data exploration. Genome Res 22: 2262-2269.

Instead of a literal depiction of output from Spark, the final design presents what appears to be necklaces of the kind of tiles that Spark uses for its visual presentation. I took a chance that Genome Research had a sense of humor. Luckily, they did and accepted the design for the cover.

Colored tiles are playfully suspended on vertical strings to illustrate how Spark, presented in this issue, uses clustering to group genomic regions (tiles) with similar data patterns (colored heatmaps) and facilitates genome-wide data exploration.Genome Research 22 (11)

The image was published on the November 2012 issue of cover of Genome Research.


Illustrator CS5, and a cup (or two) of Galileo coffee from a Rancilio Epoca.

Other Covers

I had two other covers published this year: the PNAS cover accompanied our manuscript about mouse vasculature development and the Trends in Genetics cover was commissioned.

Martin Krzywinski @MKrzywinski
Cover image accompanying our article on mouse vasculature development. Biology turns astrophysical. PNAS 1 May 2012; 109 (18) (zoom, how it was made, PNAS)
Martin Krzywinski @MKrzywinski
Cover image for the human genetics special issue. Trends in Genetics October 2012, 28 (10) (lowres, hires, how it was made, Trends in Genetics)

source of design

Martin Krzywinski @MKrzywinski
To lower this computational barrier, particularly in the early data exploration phases, Spark was developed as an interactive pattern discovery and visualization tool for epigenomic data. (Spark)

Thinking about design ideas for the cover, I looked to the kind of visual motifs that Spark used for inspiration. Immediately the colorful tiles, which represent clustered data tracks, stood out.

Spark's output is very stylized, colorful and high contrast. It was important to preserve this aesthetic in the design. I also wanted to incorporate the idea of clustering in the design, as well as the concept that the clusters represented data from different parts of the genome.

While it was not important to illustrate how Spark organizes and analyzed data explicitly — in fact, I wanted these aspects to be subtle — it was important that the cover illustration had connections to Spark at several levels.


Martin Krzywinski @MKrzywinski
Many genomics techniques produce measurements that have both a value and a position on a reference genome, for example ChIP-sequencing.

Spark was created by Cydney Nielsen, who works with me at the Genome Sciences Center. It is designed to mitigate the difficulties arising from the fact that genome-wide data is typically scattered across thousands of points of interest.

Genome browsers integrate diverse data sets by plotting them as vertically stacked tracks across a common genomic x-axis. Genome browsers are designed for viewing local regions of interest (e.g. an individual gene) and are frequently used during the initial data inspection and exploration phases.

Most genome browsers support zooming along the genome coordinate. This type of overview is not always useful because it produces a summary across a continuous genomic range (e.g. chromosome 1) and not across the subset of regions that are of interest (e.g. genes on chromosome 1). Spark addresses this shortcoming and provides a way to help answer questions like: What are the common data patterns across genes start sites in my data set?

Martin Krzywinski @MKrzywinski
Spark's approach to analysis and display of epigenetic data.

Spark's visualization is driven by clustering data tracks (e.g. ChIP-seq coverage) from across equivalent regions (e.g. gene start sites). The clustered tracks are displayed as heatmaps, with each row being a data track and each column a windowed region of the genome.

early comps

With fond memories of Monte Carlo simulations from my physics days, I set out to simulate some realistic-looking, but entirely synthetic, Spark cluster tiles.

Martin Krzywinski @MKrzywinski
A collection of synthetic Spark tiles, each 7x20.

My first idea was a design which would show these tiles falling, perhaps accumulating on a pile on the ground. Quick prototypes of this idea were disappointing. The tiles appeared flimsy and too complex, while the image was largely empty. I spent several hours messing around with the rotation and pseudo-3D layout, but could not find anything that was satisfying.

Martin Krzywinski @MKrzywinski
Spark tiles, falling.
Martin Krzywinski @MKrzywinski
Early attempt at a design. Meh.

I thought to do this right would require a proper simulation within a 3D system.

refining the design

To address the fact that the tiles felt flimsy and overly complicated and the design lacked depth, I simplified the tile simulation to generate 5x5 tiles. These simpler representations still embodied how Spark displayed data, but did so minimally.

Martin Krzywinski @MKrzywinski
A second attempt at simulating Spark clusters.

To keep with the idea that the clusters come from different regions of the genome, I thought of arranging them along line segments. Unlike the design in which the tiles were falling, this constrained the layout significantly and allowed me to play with the design to make it look like the clusters were draped over it. By casting a light shadow behind each string of tiles, a subtle 3D effect could be achieved while still keeping the design within a plane.

There are 11 orientations of tiles created by rotating a thin square around the vertical axis with a slight forward tilt. There are 5 rotations to the left and right at angles 10, 26, 46, 66 and 80 degrees. The rotation was achieved using Illustrator's Extrude and Bevel 3D filter.

Martin Krzywinski @MKrzywinski
Layout of tiles.
Martin Krzywinski @MKrzywinski
Rotated tiles with Spark clusters.

Martin Krzywinski @MKrzywinski
Flight and Fall by Rachel Nottingham. (artist's site)

The layout and rotation of the tiles was inspired by Flight and Fall by Rachel Nottingham, a mobile of paper birds.

I wanted to keep the layout of the spark tiles pleasant, without being too organized. I find this to be a difficult balance to achieve — natural randomness is deceptively difficult to create by hand.

final image

Four different versions of the design were submitted to Genome Research. I was happiest with the treatment in which the tiles maintained their color and the Spark clusters were projected as tones of white. This designed felt more solid and punchy — I feel like you can reach out and touch one of those strings.

Martin Krzywinski @MKrzywinski
Final Spark cover designs. The top left one was chosen by Genome Research.

news + thoughts

Tabular Data

Tue 11-04-2017
Tabulating the number of objects in categories of interest dates back to the earliest records of commerce and population censuses.

After 30 columns, this is our first one without a single figure. Sometimes a table is all you need.

In this column, we discuss nominal categorical data, in which data points are assigned to categories in which there is no implied order. We introduce one-way and two-way tables and the `\chi^2` and Fisher's exact tests.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Tabular data. Nature Methods 14:329–330.

...more about the Points of Significance column

Happy 2017 `\pi` Day—Star Charts, Creatures Once Living and a Poem

Tue 14-03-2017

on a brim of echo,

capsized chamber
drawn into our constellation, and cooling.
—Paolo Marcazzan

Celebrate `\pi` Day (March 14th) with star chart of the digits. The charts draw 40,000 stars generated from the first 12 million digits.

Martin Krzywinski @MKrzywinski
12,000,000 digits of `\pi` interpreted as a star catalogue. (details)

The 80 constellations are extinct animals and plants. Here you'll find old friends and new stories. Read about how Desmodus is always trying to escape or how Megalodon terrorizes the poor Tecopa! Most constellations have a story.

Martin Krzywinski @MKrzywinski
Find friends and stories among the 80 constellations of extinct animals and plants. Oh look, a Dodo guardings his eggs! (details)

This year I collaborate with Paolo Marcazzan, a Canadian poet, who contributes a poem, Of Black Body, about space and things we might find and lose there.

Check out art from previous years: 2013 `\pi` Day and 2014 `\pi` Day, 2015 `\pi` Day and and 2016 `\pi` Day.

Data in New Dimensions: convergence of art, genomics and bioinformatics

Tue 07-03-2017

Art is science in love.
— E.F. Weisslitz

A behind-the-scenes look at the making of our stereoscopic images which were at display at the AGBT 2017 Conference in February. The art is a creative collaboration with Becton Dickinson and The Linus Group.

Its creation began with the concept of differences and my writeup of the creative and design process focuses on storytelling and how concept of differences is incorporated into the art.

Oh, and this might be a good time to pick up some red-blue 3D glasses.

BD Genomics 3D art exhibit - AGBT 2017 / Martin Krzywinski @MKrzywinski
A stereoscopic image and its interpretive panel of single-cell transcriptomes of blood cells: diseased versus healthy control.

Interpreting P values

Thu 02-03-2017
A P value measures a sample’s compatibility with a hypothesis, not the truth of the hypothesis.

This month we continue our discussion about `P` values and focus on the fact that `P` value is a probability statement about the observed sample in the context of a hypothesis, not about the hypothesis being tested.

Martin Krzywinski @MKrzywinski
Nature Methods Points of Significance column: Interpreting P values. (read)

Given that we are always interested in making inferences about hypotheses, we discuss how `P` values can be used to do this by way of the Benjamin-Berger bound, `\bar{B}` on the Bayes factor, `B`.

Heuristics such as these are valuable in helping to interpret `P` values, though we stress that `P` values vary from sample to sample and hence many sources of evidence need to be examined before drawing scientific conclusions.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Interpreting P values. Nature Methods 14:213–214.

Background reading

Krzywinski, M. & Altman, N. (2017) Points of significance: P values and the search for significance. Nature Methods 14:3–4.

Krzywinski, M. & Altman, N. (2013) Points of significance: Significance, P values and t–tests. Nature Methods 10:1041–1042.

...more about the Points of Significance column

Snellen Charts—Typography to Really Look at

Sat 18-02-2017

Another collection of typographical posters. These ones really ask you to look.

Martin Krzywinski @MKrzywinski
Snellen charts designed using physical constants, Braille and elemental abundances in the universe and human body.

The charts show a variety of interesting symbols and operators found in science and math. The design is in the style of a Snellen chart and typset with the Rockwell font.