view updates

Distractions and amusements, with a sandwich and coffee.

music + dance + projected visuals
•
• marvel at perfect timing
• more quotes

If you're new to Brewer palettes, or color, catch up with this presentation.

I maintain a comprehensive database of named colors (3,116 colors), compiled from a variety of color name lists.

Why Should Engineers and Scientists Be Worried About Color? by Bernice E. Rogowitz and Lloyd A. Treinish (IBM Thomas J. Watson Research Center, Yorktown Heights, NY).

Perception in Visualization by Christopher G. Healey (Department of Computer Science, North Carolina State University)

Lch and Lab colour and gradient picker is a great tool by David Johnstone. It's a great way to generate color ramps—go ahead, go crazy!—and compare how the ramps look in different color spaces. Shame on you, HSV!

PaletteView is an exceptional tool by Magnaview to create continuous Brewer palettes. This tool is described in [1] and operationalizes Cyntha Brewer's color selection method into an algorithm that selects customizable color palettes from LCH space.

[1] 2008 Generating Color Palettes using Intuitive Parameters Computer Graphics Forum 27:743-750.

You can import Brewer palettes into Adobe applications such as Illustrator, Photoshop and InDesign using either the .ase or .ai swatch files.

In Illustrator, load the swatches from the swatch window menu. The swatch window can be accessed using Window > Swatches.

Select *Open swatch library*

then choose *Other library...*

and load either the .ase or .ai file — both contain the same content.

Brewer palettes are color combinations selected for their special properties for use in data visualization and information design.

Selecting effective colors for bar plots, pie charts, and heat maps is made more difficult by the fact that the way we select color in software does not reflect how we perceive the color.

There are many examples of poor color combinations in published figures. For example, if categories are encoded with a combination of bright and dark colors, the bright colors will dominate the reader's attention. On the other hand, if two colors appear similar, the reader will instinctively perceive them as belonging to a group and infer that the underlying variables are related.

Colors with poor contrast (colors with similar perceived brightness) or simultaneous contrast (pure colors) also interfere with interpreting figures.

Most people select colors using RGB sliders, which is just about the worst way to pick a color! Consider the fact that when we look at a color, we cannot easily decompose it into its red, green and blue components. This limits usefulness of RGB for color selection.

HSV is a better color space, which defines a color based on hue, saturation and value. These are three properties that we intuitively assess when we see a color. We think of a "dark rich blue" and "light faded red", making HSV a reasonably useful model for color selection. Unfortunately, HSV has a nagging problem — although it is based on intuitive parameters, it is not perceptually uniform.

A color space that is perceptually uniform defines colors based on how we perceive them. Distances between colors in the space are proportional to their perceived difference.

Above, we saw that HSV was not perceptually uniform. Moving the hue slider by 60 can have a small or large effect on a color, depending on where the slider is positioned.

Consider the following example. You have a chart that uses two colors, and orange and green. Both were chosen with S=V=100%. You now need to select a second color for each that is brighter. You cannot directly use HSV because both orange and green colors are already at full value. How do you intuitively increase brightness?

The reason why you cannot in do this in HSV is because V does not directly correspond to the color's perceived brightness. You are stuck fiddling with the saturation and value to try to select a brighter pairing.

What would be useful here is a color space which uses the intuitive parameters of HSV, but is perceptually based. In other words, instead of value, the space would define a color based on its perceived brightness. Luckily, this space exists — LCH, which defines color based on its luminance (perceived brightness), chroma (purity) and hue. Unfortunately, design and presentation software do not have LCH sliders and we cannot easily take advantage of this color space.

This is where the Brewer palettes come in.

Brewer palettes were selected for their perceptual properties. These palettes were created by Cynthia Brewer for the purpose in cartography, but have found use in other fields.

There are three types of Brewer palettes

*qualitative*— colors do not have a perceived order*sequential*— colors have a perceived order and perceived difference between successive colors is uniform*diverging*— two back-to-back sequential palettes starting from a common color

I have prepared Brewer palette swatches in .ase or .ai format. For programming, use the plain-text version.

The image below (zoom) shows all the Brewer palettes.

Qualitative palettes are excellent for bar plots and pie charts, where colors correspond to categories.

Grayscale Brewer palettes are available and are perfect for achieving good tone separation in black-and-white figures.

Sequential and diverging palettes are useful for heatmaps.

Some Brewer palettes are safe for color blindness — the pink-yellow-green (piyg) is one. For others, see colorbrewer.

I have designed 15-color palettes for color blindess for each of the three common types of color blindness.

VizUm: Colin Ware and Martin Krzywinski — register for free.

*With four parameters I can fit an elephant and with five I can make him wiggle his trunk. —John von Neumann.*

By increasing the complexity of a model, it is easy to make it fit to data perfectly. Does this mean that the model is perfectly suitable? No.

When a model has a relatively large number of parameters, it is likely to be influenced by the noise in the data, which varies across observations, as much as any underlying trend, which remains the same. Such a model is overfitted—it matches training data well but does not generalize to new observations.

We discuss the use of training, validation and testing data sets and how they can be used, with methods such as cross-validation, to avoid overfitting.

Altman, N. & Krzywinski, M. (2016) Points of Significance: Model Selection and Overfitting. *Nature Methods* **13**:703-704.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. *Nature Methods* **13**:603-604.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. *Nature Methods* **13**:541-542.

*It is important to understand both what a classification metric expresses and what it hides.*

We examine various metrics use to assess the performance of a classifier. We show that a single metric is insufficient to capture performance—for any metric, a variety of scenarios yield the same value.

We also discuss ROC and AUC curves and how their interpretation changes based on class balance.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. *Nature Methods* **13**:603-604.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. *Nature Methods* **13**:541-542.

Today is the day and it's hardly an approximation. In fact, `22/7` is 20% more accurate of a representation of `\pi` than `3.14`!

Time to celebrate, graphically. This year I do so with perfect packing of circles that embody the approximation.

By warping the circle by 8% along one axis, we can create a shape whose ratio of circumference to diameter, taken as twice the average radius, is 22/7.

If you prefer something more accurate, check out art from previous `\pi` days: 2013 `\pi` Day and 2014 `\pi` Day, 2015 `\pi` Day, and 2016 `\pi` Day.

*Regression can be used on categorical responses to estimate probabilities and to classify.*

The next column in our series on regression deals with how to classify categorical data.

We show how linear regression can be used for classification and demonstrate that it can be unreliable in the presence of outliers. Using a logistic regression, which fits a linear model to the log odds ratio, improves robustness.

Logistic regression is solved numerically and in most cases, the maximum-likelihood estimates are unique and optimal. However, when the classes are perfectly separable, the numerical approach fails because there is an infinite number of solutions.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. *Nature Methods* **13**:541-542.

Altman, N. & Krzywinski, M. (2016) Points of Significance: Regression diagnostics? *Nature Methods* **13**:385-386.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Multiple Linear Regression *Nature Methods* **12**:1103-1104.

Altman, N. & Krzywinski, M. (2015) Points of significance: Simple Linear Regression *Nature Methods* **12**:999-1000.