Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert Curve
And she looks like the moon. So close and yet, so far.Future Islandsaim highmore quotes

EMBO Practical Course: Bioinformatics and Genome Analysis, 5–17 June 2017.


visualization + design

Brewer Palettes

Brewer Palettes at a Glance

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
All the Brewer palettes: qualitative, sequential and diverging. For each palette (e.g. spectral) the source colors are shown as well as all its n-color subsets. (zoom)

Presentation About Color and Brewer Palettes

If you're new to Brewer palettes, or color, catch up with this presentation. Color palettes matter - Brewer palettes and perceptual uniformity - Martin Krzywinski

COLOR NAME DATABASE

I maintain a comprehensive database of named colors (3,116 colors), compiled from a variety of color name lists.

Visualization and Perception

Why Should Engineers and Scientists Be Worried About Color? by Bernice E. Rogowitz and Lloyd A. Treinish (IBM Thomas J. Watson Research Center, Yorktown Heights, NY).

Perception in Visualization by Christopher G. Healey (Department of Computer Science, North Carolina State University)

LAB and LCH gradient picker

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Interactively create LAB and LCH color gradients interpolated across any number of colors.

Lch and Lab colour and gradient picker is a great tool by David Johnstone. It's a great way to generate color ramps—go ahead, go crazy!—and compare how the ramps look in different color spaces. Shame on you, HSV!

PaletteView — create continuous Brewer palettes

PaletteView is an exceptional tool by Magnaview to create continuous Brewer palettes. This tool is described in [1] and operationalizes Cyntha Brewer's color selection method into an algorithm that selects customizable color palettes from LCH space.

[1] Wijffelaars M, Vliegen R, Van Wijk JJ et al. 2008 Generating Color Palettes using Intuitive Parameters Computer Graphics Forum 27:743-750.

Brewer Palette Adobe Swatch Files

You can import Brewer palettes into Adobe applications such as Illustrator, Photoshop and InDesign using either the .ase or .ai swatch files.

download

Brewer palette ase swatch file for Adobe Illustrator Brewer palette ai swatch file for Adobe Illustrator Brewer palette pdf color file Brewer palette txt color file

install

In Illustrator, load the swatches from the swatch window menu. The swatch window can be accessed using Window > Swatches.

Select Open swatch library

then choose Other library...

and load either the .ase or .ai file — both contain the same content.

Brewer palettes are color combinations selected for their special properties for use in data visualization and information design.

The challenge

Selecting effective colors for bar plots, pie charts, and heat maps is made more difficult by the fact that the way we select color in software does not reflect how we perceive the color.

There are many examples of poor color combinations in published figures. For example, if categories are encoded with a combination of bright and dark colors, the bright colors will dominate the reader's attention. On the other hand, if two colors appear similar, the reader will instinctively perceive them as belonging to a group and infer that the underlying variables are related.

Colors with poor contrast (colors with similar perceived brightness) or simultaneous contrast (pure colors) also interfere with interpreting figures.

Selecting Colors in RGB and HSV

Most people select colors using RGB sliders, which is just about the worst way to pick a color! Consider the fact that when we look at a color, we cannot easily decompose it into its red, green and blue components. This limits usefulness of RGB for color selection.

HSV is a better color space, which defines a color based on hue, saturation and value. These are three properties that we intuitively assess when we see a color. We think of a "dark rich blue" and "light faded red", making HSV a reasonably useful model for color selection. Unfortunately, HSV has a nagging problem — although it is based on intuitive parameters, it is not perceptually uniform.

Perceptual Uniformity

A color space that is perceptually uniform defines colors based on how we perceive them. Distances between colors in the space are proportional to their perceived difference.

Above, we saw that HSV was not perceptually uniform. Moving the hue slider by 60 can have a small or large effect on a color, depending on where the slider is positioned.

Consider the following example. You have a chart that uses two colors, and orange and green. Both were chosen with S=V=100%. You now need to select a second color for each that is brighter. You cannot directly use HSV because both orange and green colors are already at full value. How do you intuitively increase brightness?

The reason why you cannot in do this in HSV is because V does not directly correspond to the color's perceived brightness. You are stuck fiddling with the saturation and value to try to select a brighter pairing.

What would be useful here is a color space which uses the intuitive parameters of HSV, but is perceptually based. In other words, instead of value, the space would define a color based on its perceived brightness. Luckily, this space exists — LCH, which defines color based on its luminance (perceived brightness), chroma (purity) and hue. Unfortunately, design and presentation software do not have LCH sliders and we cannot easily take advantage of this color space.

This is where the Brewer palettes come in.

Brewer Palettes

Brewer palettes were selected for their perceptual properties. These palettes were created by Cynthia Brewer for the purpose in cartography, but have found use in other fields.

Types of Brewer Palettes

There are three types of Brewer palettes

  • qualitative — colors do not have a perceived order
  • sequential — colors have a perceived order and perceived difference between successive colors is uniform
  • diverging — two back-to-back sequential palettes starting from a common color

Swatches of Brewer Palettes

I have prepared Brewer palette swatches in .ase or .ai format. For programming, use the plain-text version.

The image below (zoom) shows all the Brewer palettes.

Brewer palette colors - all swatches

Uses of Brewer Palettes

Qualitative palettes are excellent for bar plots and pie charts, where colors correspond to categories.

Grayscale Brewer palettes are available and are perfect for achieving good tone separation in black-and-white figures.

Sequential and diverging palettes are useful for heatmaps.

Brewer Palettes and Color Blindness

Some Brewer palettes are safe for color blindness — the pink-yellow-green (piyg) is one. For others, see colorbrewer.

I have designed 15-color palettes for color blindess for each of the three common types of color blindness.

VIEW ALL

news + thoughts

Classification and regression trees

Fri 28-07-2017
Decision trees are a powerful but simple prediction method.

Decision trees classify data by splitting it along the predictor axes into partitions with homogeneous values of the dependent variable. Unlike logistic or linear regression, CART does not develop a prediction equation. Instead, data are predicted by a series of binary decisions based on the boundaries of the splits. Decision trees are very effective and the resulting rules are readily interpreted.

Trees can be built using different metrics that measure how well the splits divide up the data classes: Gini index, entropy or misclassification error.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Classification and decision trees. (read)

When the predictor variable is quantitative and not categorical, regression trees are used. Here, the data are still split but now the predictor variable is estimated by the average within the split boundaries. Tree growth can be controlled using the complexity parameter, a measure of the relative improvement of each new split.

Individual trees can be very sensitive to minor changes in the data and even better prediction can be achieved by exploiting this variability. Using ensemble methods, we can grow multiple trees from the same data.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. Nature Methods 14:757–758.

Background reading

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. Nature Methods 13:541-542.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Multiple Linear Regression Nature Methods 12:1103-1104.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. Nature Methods 13:603-604.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Model Selection and Overfitting. Nature Methods 13:703-704.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Regularization. Nature Methods 13:803-804.

...more about the Points of Significance column

Personal Oncogenomics Program 5 Year Anniversary Art

Wed 26-07-2017

The artwork was created in collaboration with my colleagues at the Genome Sciences Center to celebrate the 5 year anniversary of the Personalized Oncogenomics Program (POG).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
5 Years of Personalized Oncogenomics Program at Canada's Michael Smith Genome Sciences Centre. The poster shows 545 cancer cases. (left) Cases ordered chronologically by case number. (right) Cases grouped by diagnosis (tissue type) and then by similarity within group.

The Personal Oncogenomics Program (POG) is a collaborative research study including many BC Cancer Agency oncologists, pathologists and other clinicians along with Canada's Michael Smith Genome Sciences Centre with support from BC Cancer Foundation.

The aim of the program is to sequence, analyze and compare the genome of each patient's cancer—the entire DNA and RNA inside tumor cells— in order to understand what is enabling it to identify less toxic and more effective treatment options.

Principal component analysis

Thu 06-07-2017
PCA helps you interpret your data, but it will not always find the important patterns.

Principal component analysis (PCA) simplifies the complexity in high-dimensional data by reducing its number of dimensions.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Principal component analysis. (read)

To retain trend and patterns in the reduced representation, PCA finds linear combinations of canonical dimensions that maximize the variance of the projection of the data.

PCA is helpful in visualizing high-dimensional data and scatter plots based on 2-dimensional PCA can reveal clusters.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Principal component analysis. Nature Methods 14:641–642.

Background reading

Altman, N. & Krzywinski, M. (2017) Points of Significance: Clustering. Nature Methods 14:545–546.

...more about the Points of Significance column

`k` index: a weightlighting and Crossfit performance measure

Wed 07-06-2017

Similar to the `h` index in publishing, the `k` index is a measure of fitness performance.

To achieve a `k` index for a movement you must perform `k` unbroken reps at `k`% 1RM.

The expected value for the `k` index is probably somewhere in the range of `k = 26` to `k=35`, with higher values progressively more difficult to achieve.

In my `k` index introduction article I provide detailed explanation, rep scheme table and WOD example.