The Genome Research cover design takes a fun and illustrative approach to visualization. It's both art and science — in a 4:1 ratio.
Nielsen CB, Younesy H, O'Geen H, Xu X, Jackson AR, et al. (2012) Spark: A navigational paradigm for genomic data exploration. Genome Res 22: 2262-2269.
Instead of a literal depiction of output from Spark, the final design presents what appears to be necklaces of the kind of tiles that Spark uses for its visual presentation. I took a chance that Genome Research had a sense of humor. Luckily, they did and accepted the design for the cover.
Colored tiles are playfully suspended on vertical strings to illustrate how Spark, presented in this issue, uses clustering to group genomic regions (tiles) with similar data patterns (colored heatmaps) and facilitates genome-wide data exploration. — Genome Research 22 (11)
The image was published on the November 2012 issue of cover of Genome Research.
Thinking about design ideas for the cover, I looked to the kind of visual motifs that Spark used for inspiration. Immediately the colorful tiles, which represent clustered data tracks, stood out.
Spark's output is very stylized, colorful and high contrast. It was important to preserve this aesthetic in the design. I also wanted to incorporate the idea of clustering in the design, as well as the concept that the clusters represented data from different parts of the genome.
While it was not important to illustrate how Spark organizes and analyzed data explicitly — in fact, I wanted these aspects to be subtle — it was important that the cover illustration had connections to Spark at several levels.
Spark was created by Cydney Nielsen, who works with me at the Genome Sciences Center. It is designed to mitigate the difficulties arising from the fact that genome-wide data is typically scattered across thousands of points of interest.
Genome browsers integrate diverse data sets by plotting them as vertically stacked tracks across a common genomic x-axis. Genome browsers are designed for viewing local regions of interest (e.g. an individual gene) and are frequently used during the initial data inspection and exploration phases.
Most genome browsers support zooming along the genome coordinate. This type of overview is not always useful because it produces a summary across a continuous genomic range (e.g. chromosome 1) and not across the subset of regions that are of interest (e.g. genes on chromosome 1). Spark addresses this shortcoming and provides a way to help answer questions like: What are the common data patterns across genes start sites in my data set?
Spark's visualization is driven by clustering data tracks (e.g. ChIP-seq coverage) from across equivalent regions (e.g. gene start sites). The clustered tracks are displayed as heatmaps, with each row being a data track and each column a windowed region of the genome.
With fond memories of Monte Carlo simulations from my physics days, I set out to simulate some realistic-looking, but entirely synthetic, Spark cluster tiles.
My first idea was a design which would show these tiles falling, perhaps accumulating on a pile on the ground. Quick prototypes of this idea were disappointing. The tiles appeared flimsy and too complex, while the image was largely empty. I spent several hours messing around with the rotation and pseudo-3D layout, but could not find anything that was satisfying.
I thought to do this right would require a proper simulation within a 3D system.
To address the fact that the tiles felt flimsy and overly complicated and the design lacked depth, I simplified the tile simulation to generate 5x5 tiles. These simpler representations still embodied how Spark displayed data, but did so minimally.
To keep with the idea that the clusters come from different regions of the genome, I thought of arranging them along line segments. Unlike the design in which the tiles were falling, this constrained the layout significantly and allowed me to play with the design to make it look like the clusters were draped over it. By casting a light shadow behind each string of tiles, a subtle 3D effect could be achieved while still keeping the design within a plane.
There are 11 orientations of tiles created by rotating a thin square around the vertical axis with a slight forward tilt. There are 5 rotations to the left and right at angles 10, 26, 46, 66 and 80 degrees. The rotation was achieved using Illustrator's Extrude and Bevel 3D filter.
The layout and rotation of the tiles was inspired by Flight and Fall by Rachel Nottingham, a mobile of paper birds.
I wanted to keep the layout of the spark tiles pleasant, without being too organized. I find this to be a difficult balance to achieve — natural randomness is deceptively difficult to create by hand.
Four different versions of the design were submitted to Genome Research. I was happiest with the treatment in which the tiles maintained their color and the Spark clusters were projected as tones of white. This designed felt more solid and punchy — I feel like you can reach out and touch one of those strings.
Outliers can degrade the fit of linear regression models when the estimation is performed using the ordinary least squares. The impact of outliers can be mitigated with methods that provide robust inference and greater reliability in the presence of anomalous values.
We discuss MM-estimation and show how it can be used to keep your fitting sane and reliable.
Greco, L., Luta, G., Krzywinski, M. & Altman, N. (2019) Points of significance: Analyzing outliers: Robust methods to the rescue. Nature Methods 16:275–276.
Altman, N. & Krzywinski, M. (2016) Points of significance: Analyzing outliers: Influential or nuisance. Nature Methods 13:281–282.
Two-level factorial experiments, in which all combinations of multiple factor levels are used, efficiently estimate factor effects and detect interactions—desirable statistical qualities that can provide deep insight into a system.
They offer two benefits over the widely used one-factor-at-a-time (OFAT) experiments: efficiency and ability to detect interactions.
Since the number of factor combinations can quickly increase, one approach is to model only some of the factorial effects using empirically-validated assumptions of effect sparsity and effect hierarchy. Effect sparsity tells us that in factorial experiments most of the factorial terms are likely to be unimportant. Effect hierarchy tells us that low-order terms (e.g. main effects) tend to be larger than higher-order terms (e.g. two-factor or three-factor interactions).
Smucker, B., Krzywinski, M. & Altman, N. (2019) Points of significance: Two-level factorial experiments Nature Methods 16:211–212.
Krzywinski, M. & Altman, N. (2014) Points of significance: Designing comparative experiments.. Nature Methods 11:597–598.
Celebrate `\pi` Day (March 14th) and set out on an exploration explore accents unknown (to you)!
This year is purely typographical, with something for everyone. Hundreds of digits and hundreds of languages.
A special kids' edition merges math with color and fat fonts.
One moment you're
:) and the next you're
Make sense of it all with my Tree of Emotional life—a hierarchical account of how we feel.