Lips that taste of tears, they say, are the best for kissing.get crankymore quotes

spark: exciting

DNA on 10th — street art, wayfinding and font

Creating the Genome Research November 2012 Cover

Cover image accompanying Spark: A navigational paradigm for genomic data exploration. Genome Research 22 (11). (zoom, Genome Research)

The Genome Research cover design takes a fun and illustrative approach to visualization. It's both art and science — in a 4:1 ratio.

The cover image accompanies the article by Cydney Nielsen from our visualization group, describing her Spark tool for visualization epigenetics data.

Nielsen CB, Younesy H, O'Geen H, Xu X, Jackson AR, et al. (2012) Spark: A navigational paradigm for genomic data exploration. Genome Res 22: 2262-2269.

Instead of a literal depiction of output from Spark, the final design presents what appears to be necklaces of the kind of tiles that Spark uses for its visual presentation. I took a chance that Genome Research had a sense of humor. Luckily, they did and accepted the design for the cover.

Colored tiles are playfully suspended on vertical strings to illustrate how Spark, presented in this issue, uses clustering to group genomic regions (tiles) with similar data patterns (colored heatmaps) and facilitates genome-wide data exploration.Genome Research 22 (11)

The image was published on the November 2012 issue of cover of Genome Research.

Tools

Illustrator CS5, and a cup (or two) of Galileo coffee from a Rancilio Epoca.

Other Covers

I had two other covers published this year: the PNAS cover accompanied our manuscript about mouse vasculature development and the Trends in Genetics cover was commissioned.

Cover image accompanying our article on mouse vasculature development. Biology turns astrophysical. PNAS 1 May 2012; 109 (18) (zoom, how it was made, PNAS)
Cover image for the human genetics special issue. Trends in Genetics October 2012, 28 (10) (lowres, hires, how it was made, Trends in Genetics)

source of design

To lower this computational barrier, particularly in the early data exploration phases, Spark was developed as an interactive pattern discovery and visualization tool for epigenomic data. (Spark)

Thinking about design ideas for the cover, I looked to the kind of visual motifs that Spark used for inspiration. Immediately the colorful tiles, which represent clustered data tracks, stood out.

Spark's output is very stylized, colorful and high contrast. It was important to preserve this aesthetic in the design. I also wanted to incorporate the idea of clustering in the design, as well as the concept that the clusters represented data from different parts of the genome.

While it was not important to illustrate how Spark organizes and analyzed data explicitly — in fact, I wanted these aspects to be subtle — it was important that the cover illustration had connections to Spark at several levels.

Spark

Many genomics techniques produce measurements that have both a value and a position on a reference genome, for example ChIP-sequencing.

Spark was created by Cydney Nielsen, who works with me at the Genome Sciences Center. It is designed to mitigate the difficulties arising from the fact that genome-wide data is typically scattered across thousands of points of interest.

Genome browsers integrate diverse data sets by plotting them as vertically stacked tracks across a common genomic x-axis. Genome browsers are designed for viewing local regions of interest (e.g. an individual gene) and are frequently used during the initial data inspection and exploration phases.

Most genome browsers support zooming along the genome coordinate. This type of overview is not always useful because it produces a summary across a continuous genomic range (e.g. chromosome 1) and not across the subset of regions that are of interest (e.g. genes on chromosome 1). Spark addresses this shortcoming and provides a way to help answer questions like: What are the common data patterns across genes start sites in my data set?

Spark's approach to analysis and display of epigenetic data.

Spark's visualization is driven by clustering data tracks (e.g. ChIP-seq coverage) from across equivalent regions (e.g. gene start sites). The clustered tracks are displayed as heatmaps, with each row being a data track and each column a windowed region of the genome.

early comps

With fond memories of Monte Carlo simulations from my physics days, I set out to simulate some realistic-looking, but entirely synthetic, Spark cluster tiles.

A collection of synthetic Spark tiles, each 7x20.

My first idea was a design which would show these tiles falling, perhaps accumulating on a pile on the ground. Quick prototypes of this idea were disappointing. The tiles appeared flimsy and too complex, while the image was largely empty. I spent several hours messing around with the rotation and pseudo-3D layout, but could not find anything that was satisfying.

Spark tiles, falling.
Early attempt at a design. Meh.

I thought to do this right would require a proper simulation within a 3D system.

refining the design

To address the fact that the tiles felt flimsy and overly complicated and the design lacked depth, I simplified the tile simulation to generate 5x5 tiles. These simpler representations still embodied how Spark displayed data, but did so minimally.

A second attempt at simulating Spark clusters.

To keep with the idea that the clusters come from different regions of the genome, I thought of arranging them along line segments. Unlike the design in which the tiles were falling, this constrained the layout significantly and allowed me to play with the design to make it look like the clusters were draped over it. By casting a light shadow behind each string of tiles, a subtle 3D effect could be achieved while still keeping the design within a plane.

There are 11 orientations of tiles created by rotating a thin square around the vertical axis with a slight forward tilt. There are 5 rotations to the left and right at angles 10, 26, 46, 66 and 80 degrees. The rotation was achieved using Illustrator's Extrude and Bevel 3D filter.

Layout of tiles.
Rotated tiles with Spark clusters.

Flight and Fall by Rachel Nottingham. (artist's site)

The layout and rotation of the tiles was inspired by Flight and Fall by Rachel Nottingham, a mobile of paper birds.

I wanted to keep the layout of the spark tiles pleasant, without being too organized. I find this to be a difficult balance to achieve — natural randomness is deceptively difficult to create by hand.

final image

Four different versions of the design were submitted to Genome Research. I was happiest with the treatment in which the tiles maintained their color and the Spark clusters were projected as tones of white. This designed felt more solid and punchy — I feel like you can reach out and touch one of those strings.

Final Spark cover designs. The top left one was chosen by Genome Research.
VIEW ALL

Scientific data visualization: Aesthetic for diagrammatic clarity

Mon 13-01-2020

The scientific process works because all its output is empirically constrained.

My chapter from The Aesthetics of Scientific Data Representation, More than Pretty Pictures, in which I discuss the principles of data visualization and connect them to the concept of "quality" introduced by Robert Pirsig in Zen and the Art of Motorcycle Maintenance.

Yearning for the Infinite — Aleph 2

Mon 18-11-2019

Discover Cantor's transfinite numbers through my music video for the Aleph 2 track of Max Cooper's Yearning for the Infinite (album page, event page).

Yearning for the Infinite, Max Cooper at the Barbican Hall, London. Track Aleph 2. Video by Martin Krzywinski. Photo by Michal Augustini. (more)

I discuss the math behind the video and the system I built to create the video.

Hidden Markov Models

Mon 18-11-2019

Everything we see hides another thing, we always want to see what is hidden by what we see.
—Rene Magritte

A Hidden Markov Model extends a Markov chain to have hidden states. Hidden states are used to model aspects of the system that cannot be directly observed and themselves form a Markov chain and each state may emit one or more observed values.

Hidden states in HMMs do not have to have meaning—they can be used to account for measurement errors, compress multi-modal observational data, or to detect unobservable events.

Nature Methods Points of Significance column: Hidden Markov Models. (read)

In this column, we extend the cell growth model from our Markov Chain column to include two hidden states: normal and sedentary.

We show how to calculate forward probabilities that can predict the most likely path through the HMM given an observed sequence.

Grewal, J., Krzywinski, M. & Altman, N. (2019) Points of significance: Hidden Markov Models. Nature Methods 16:795–796.

Altman, N. & Krzywinski, M. (2019) Points of significance: Markov Chains. Nature Methods 16:663–664.

Hola Mundo Cover

Sat 21-09-2019

My cover design for Hola Mundo by Hannah Fry. Published by Blackie Books.

Hola Mundo by Hannah Fry. Cover design is based on my 2013 $\pi$ day art. (read)

Curious how the design was created? Read the full details.

Markov Chains

Tue 30-07-2019

You can look back there to explain things,
but the explanation disappears.
You'll never find it there.
Things are not explained by the past.
They're explained by what happens now.
—Alan Watts

A Markov chain is a probabilistic model that is used to model how a system changes over time as a series of transitions between states. Each transition is assigned a probability that defines the chance of the system changing from one state to another.

Nature Methods Points of Significance column: Markov Chains. (read)

Together with the states, these transitions probabilities define a stochastic model with the Markov property: transition probabilities only depend on the current state—the future is independent of the past if the present is known.

Once the transition probabilities are defined in matrix form, it is easy to predict the distribution of future states of the system. We cover concepts of aperiodicity, irreducibility, limiting and stationary distributions and absorption.

This column is the first part of a series and pairs particularly well with Alan Watts and Blond:ish.

Grewal, J., Krzywinski, M. & Altman, N. (2019) Points of significance: Markov Chains. Nature Methods 16:663–664.