Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert Curve
Tango is a sad thought that is danced.Enrique Santos Discépolothink & dancemore quotes

epigenetics: exciting



EMBO Practical Course: Bioinformatics and Genome Analysis, 5–17 June 2017.


visualization + design

Creating the Genome Research November 2012 Cover

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Cover image accompanying Spark: A navigational paradigm for genomic data exploration. Genome Research 22 (11). (zoom, Genome Research)

The Genome Research cover design takes a fun and illustrative approach to visualization. It's both art and science — in a 4:1 ratio.

The cover image accompanies the article by Cydney Nielsen from our visualization group, describing her Spark tool for visualization epigenetics data.

Nielsen CB, Younesy H, O'Geen H, Xu X, Jackson AR, et al. (2012) Spark: A navigational paradigm for genomic data exploration. Genome Res 22: 2262-2269.

Instead of a literal depiction of output from Spark, the final design presents what appears to be necklaces of the kind of tiles that Spark uses for its visual presentation. I took a chance that Genome Research had a sense of humor. Luckily, they did and accepted the design for the cover.

Colored tiles are playfully suspended on vertical strings to illustrate how Spark, presented in this issue, uses clustering to group genomic regions (tiles) with similar data patterns (colored heatmaps) and facilitates genome-wide data exploration.Genome Research 22 (11)

The image was published on the November 2012 issue of cover of Genome Research.

Tools

Illustrator CS5, and a cup (or two) of Galileo coffee from a Rancilio Epoca.

Other Covers

I had two other covers published this year: the PNAS cover accompanied our manuscript about mouse vasculature development and the Trends in Genetics cover was commissioned.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Cover image accompanying our article on mouse vasculature development. Biology turns astrophysical. PNAS 1 May 2012; 109 (18) (zoom, how it was made, PNAS)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Cover image for the human genetics special issue. Trends in Genetics October 2012, 28 (10) (lowres, hires, how it was made, Trends in Genetics)

source of design

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
To lower this computational barrier, particularly in the early data exploration phases, Spark was developed as an interactive pattern discovery and visualization tool for epigenomic data. (Spark)

Thinking about design ideas for the cover, I looked to the kind of visual motifs that Spark used for inspiration. Immediately the colorful tiles, which represent clustered data tracks, stood out.

Spark's output is very stylized, colorful and high contrast. It was important to preserve this aesthetic in the design. I also wanted to incorporate the idea of clustering in the design, as well as the concept that the clusters represented data from different parts of the genome.

While it was not important to illustrate how Spark organizes and analyzed data explicitly — in fact, I wanted these aspects to be subtle — it was important that the cover illustration had connections to Spark at several levels.

Spark

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Many genomics techniques produce measurements that have both a value and a position on a reference genome, for example ChIP-sequencing.

Spark was created by Cydney Nielsen, who works with me at the Genome Sciences Center. It is designed to mitigate the difficulties arising from the fact that genome-wide data is typically scattered across thousands of points of interest.

Genome browsers integrate diverse data sets by plotting them as vertically stacked tracks across a common genomic x-axis. Genome browsers are designed for viewing local regions of interest (e.g. an individual gene) and are frequently used during the initial data inspection and exploration phases.

Most genome browsers support zooming along the genome coordinate. This type of overview is not always useful because it produces a summary across a continuous genomic range (e.g. chromosome 1) and not across the subset of regions that are of interest (e.g. genes on chromosome 1). Spark addresses this shortcoming and provides a way to help answer questions like: What are the common data patterns across genes start sites in my data set?

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Spark's approach to analysis and display of epigenetic data.

Spark's visualization is driven by clustering data tracks (e.g. ChIP-seq coverage) from across equivalent regions (e.g. gene start sites). The clustered tracks are displayed as heatmaps, with each row being a data track and each column a windowed region of the genome.

early comps

With fond memories of Monte Carlo simulations from my physics days, I set out to simulate some realistic-looking, but entirely synthetic, Spark cluster tiles.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A collection of synthetic Spark tiles, each 7x20.

My first idea was a design which would show these tiles falling, perhaps accumulating on a pile on the ground. Quick prototypes of this idea were disappointing. The tiles appeared flimsy and too complex, while the image was largely empty. I spent several hours messing around with the rotation and pseudo-3D layout, but could not find anything that was satisfying.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Spark tiles, falling.
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Early attempt at a design. Meh.

I thought to do this right would require a proper simulation within a 3D system.

refining the design

To address the fact that the tiles felt flimsy and overly complicated and the design lacked depth, I simplified the tile simulation to generate 5x5 tiles. These simpler representations still embodied how Spark displayed data, but did so minimally.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A second attempt at simulating Spark clusters.

To keep with the idea that the clusters come from different regions of the genome, I thought of arranging them along line segments. Unlike the design in which the tiles were falling, this constrained the layout significantly and allowed me to play with the design to make it look like the clusters were draped over it. By casting a light shadow behind each string of tiles, a subtle 3D effect could be achieved while still keeping the design within a plane.

There are 11 orientations of tiles created by rotating a thin square around the vertical axis with a slight forward tilt. There are 5 rotations to the left and right at angles 10, 26, 46, 66 and 80 degrees. The rotation was achieved using Illustrator's Extrude and Bevel 3D filter.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Layout of tiles.
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Rotated tiles with Spark clusters.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Flight and Fall by Rachel Nottingham. (artist's site)

The layout and rotation of the tiles was inspired by Flight and Fall by Rachel Nottingham, a mobile of paper birds.

I wanted to keep the layout of the spark tiles pleasant, without being too organized. I find this to be a difficult balance to achieve — natural randomness is deceptively difficult to create by hand.

final image

Four different versions of the design were submitted to Genome Research. I was happiest with the treatment in which the tiles maintained their color and the Spark clusters were projected as tones of white. This designed felt more solid and punchy — I feel like you can reach out and touch one of those strings.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Final Spark cover designs. The top left one was chosen by Genome Research.
VIEW ALL

news + thoughts

`k` index: a weightlighting and Crossfit performance measure

Wed 07-06-2017

Similar to the `h` index in publishing, the `k` index is a measure of fitness performance.

To achieve a `k` index for a movement you must perform `k` unbroken reps at `k`% 1RM.

The expected value for the `k` index is probably somewhere in the range of `k = 26` to `k=35`, with higher values progressively more difficult to achieve.

In my `k` index introduction article I provide detailed explanation, rep scheme table and WOD example.

Dark Matter of the English Language—the unwords

Wed 07-06-2017

I've applied the char-rnn recurrent neural network to generate new words, names of drugs and countries.

The effect is intriguing and facetious—yes, those are real words.

But these are not: necronology, abobionalism, gabdologist, and nonerify.

These places only exist in the mind: Conchar and Pobacia, Hzuuland, New Kain, Rabibus and Megee Islands, Sentip and Sitina, Sinistan and Urzenia.

And these are the imaginary afflictions of the imagination: ictophobia, myconomascophobia, and talmatomania.

And these, of the body: ophalosis, icabulosis, mediatopathy and bellotalgia.

Want to name your baby? Or someone else's baby? Try Ginavietta Xilly Anganelel or Ferandulde Hommanloco Kictortick.

When taking new therapeutics, never mix salivac and labromine. And don't forget that abadarone is best taken on an empty stomach.

And nothing increases the chance of getting that grant funded than proposing the study of a new –ome! We really need someone to looking into the femome and manome.

Dark Matter of the Genome—the nullomers

Wed 31-05-2017

An exploration of things that are missing in the human genome. The nullomers.

Julia Herold, Stefan Kurtz and Robert Giegerich. Efficient computation of absent words in genomic sequences. BMC Bioinformatics (2008) 9:167

Clustering

Wed 31-05-2017
Clustering finds patterns in data—whether they are there or not.

We've already seen how data can be grouped into classes in our series on classifiers. In this column, we look at how data can be grouped by similarity in an unsupervised way.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Clustering. (read)

We look at two common clustering approaches: `k`-means and hierarchical clustering. All clustering methods share the same approach: they first calculate similarity and then use it to group objects into clusters. The details of the methods, and outputs, vary widely.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Clustering. Nature Methods 14:545–546.

Background reading

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. Nature Methods 13:541-542.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. Nature Methods 13:603-604.

...more about the Points of Significance column

What's wrong with pie charts?

Thu 25-05-2017

In this redesign of a pie chart figure from a Nature Medicine article [1], I look at how to organize and present a large number of categories.

I first discuss some of the benefits of a pie chart—there are few and specific—and its shortcomings—there are few but fundamental.

I then walk through the redesign process by showing how the tumor categories can be shown more clearly if they are first aggregated into a small number groups.

(bottom left) Figure 2b from Zehir et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. (2017) Nature Medicine doi:10.1038/nm.4333