latest news

Distractions and amusements, with a sandwich and coffee.

I'm not real and I deny I won't heal unless I cry.
•
• let it go
• more quotes

Some of the images in this writeup are part of Ana Swanson's Wonk Blot post How a dog sees a rainbow, and 12 other images that explain how we see color at the Washington Post.

In an audience of 8 men and 8 women, chances are 50% that at least one has some degree of color blindness^{1,2}. When encoding information or designing content, use colors that is color-blind safe.

^{1}About 8% of males and 0.5% of females are affected with some kind of color blindness in populations of European descent (wikipedia, Worldwide prevalence of red-green color deficiency, JOSAA). The rate for other races is lower Asians and Africans is lower (Caucasian Boys Show Highest Prevalence of Color Blindness Among Preschoolers, AAO).

^{2}The probability that among `N=8` men and `N=8` women at least one person is affected by color blindness is `P(men,women) = P(8,8) = 1 - (1-0.08)^8 * (1-0.005)^8 = 0.51`. For `N=34` (i.e., 68 people in total), this probability is `P(34,34)=0.95`. Because the rate of color blindness in women is so low, for most groups of mixed gender we can approximate the probability by only counting the men. For example, in a group of 17 women the probability that at least one of them is color blind is `P(0,17) = 0.082`, which is the same probability as for 1 man, `P(1,0)`.

Color Oracle is a good and free color blindness simulator for Windows, Mac and Linux.

You can download the RGB transformation table for deuteranopia, protanopia and tritanopia. It is available for all (r,g,b) colors in steps of 5 in each of the channels. The mapping for all other RGB colors can be interpolated.

Transformation for *all* 16.8 million RGB colors (interpolated from the table above) are also available independently for each type of color blindness:
deuteranopia,
protanopia,
and
tritanopia.

The normal human eye is a 3-channel color detector^{3}. There are three types of photoreceptors, each sensitive to a different part of the spectrum. Their combined response to a given wavelength produces a unique response that is the basis of the perception of color.

^{3}Compared to hearing, the color vision is a primitive detector. While we can hear thousands of distinct frequencies and process them simultaneously, we have only three independent color inputs. While the ear can distinguish pure tones from complex sounds that have multiple frequencies the eye is relatively unsophisticated in separating a color sensation into its three constituent primary stimuli.

People with color blindness have one of the photo receptor groups either reduced in number or entirely missing. With only two groups of photoreceptors, the perception of hue is drastically altered.

For example, in *deuteranopia*, the most common type of color blindness, the medium (M) wavelength photoreceptors are reduced in number or missing. This results in the loss of perceived difference between reds and greens because only one group of photoreceptors (L) are sensitive to the wavelengths of these colors. The spectrum appears to be split into two hues along the blue-green boundary (see figure below).

Visible light is in the range of 390-700 nm. The exact definition of the upper limit varies, with some sources giving as high as 760 nm. Shorter wavelengths are absorbed by the cornea (<295nm) and lens (315-390nm). Some near infrared light also reaches the retina (760-1400nm).

The opposite condition to color blindness exists too—tetrachromacy. In this case, an individual has an extra type of color receptor which improves discrimination in the red part of the spectrum. While the anatomy of their retina can be described, how true tetrachromats subjectively perceive color is unknown. And, perhaps, even unknowable.

Tetrachromacy is common in other animals, such as fish (e.g. goldfish, zebrafish) and birds (e.g. finch, starling). The dimensionality of the perceived color space isn't necessarily proportional to the number of different receptors. If the signal from 3 color receptors are combined by the brain and each processor has a weighted response to a broad range of wavelengths, then a color can be modeled by a point in 3-dimensional space, in which the receptors are the axes. This system can perceive a large number of colors.

In the extreme case where the receptors respond to a very narrow range, of which none overlap with the other, a color is one of three points in a 1-dimensional space. This sytem can perceive only 3 colors.

For example, although the mantis shrimp has 12 different color receptors, the receptors work independently, their color discrimination is poorer than ours.

If you use Color Oracle to transform your screen colors to simulate color blindness, you can see that none of the equivalent swatches in one kind of color blindness are equivalent in another. This is particularly interesting when applied to a duotone image which is drawn using equivalent colors. In the figure below^{4}, a row of Mr. Spocks disappears (or is difficult to see) to people with color blindness.

^{4}In tribute to Leonard Nimoy, 1931–2015

To people with color blindness, some colors appear the same. This equivalence can be used to identify distinct colors which are unique to those with normal and color blind vision.

The seven colors (and black) in the figure below are perceived as reasonably distinct by both normal and color blind individuals. The table on the left is reproduced from Nature Method's Points of View: Color blindness by Bang Wong.

For more tips about designing with color blindness in mind, see Color Universal Design (CUD) — How to make figures and presentations that are friendly to Colorblind people.

The figure below shows the mapping of different colors to six different grades of each of the two hues seen by deuteranopes. It offers more distinct options than the 7-color palette above.

Even more color choices for color blindess, including colors that map onto greys.

If you're looking to encode quantitative information, I suggest using the subset of Brewer palettes that are safe for color blindess (e.g. pink-yellow-green, brown-blue-green).

We introduce two common ensemble methods: bagging and random forests. Both of these methods repeat a statistical analysis on a bootstrap sample to improve the accuracy of the predictor. Our column shows these methods as applied to Classification and Regression Trees.

For example, we can sample the space of values more finely when using bagging with regression trees because each sample has potentially different boundaries at which the tree splits.

Random forests generate a large number of trees by not only generating bootstrap samples but also randomly choosing which predictor variables are considered at each split in the tree.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Ensemble methods: bagging and random forests. *Nature Methods* **14**:933–934.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. *Nature Methods* **14**:757–758.

Decision trees classify data by splitting it along the predictor axes into partitions with homogeneous values of the dependent variable. Unlike logistic or linear regression, CART does not develop a prediction equation. Instead, data are predicted by a series of binary decisions based on the boundaries of the splits. Decision trees are very effective and the resulting rules are readily interpreted.

Trees can be built using different metrics that measure how well the splits divide up the data classes: Gini index, entropy or misclassification error.

When the predictor variable is quantitative and not categorical, regression trees are used. Here, the data are still split but now the predictor variable is estimated by the average within the split boundaries. Tree growth can be controlled using the complexity parameter, a measure of the relative improvement of each new split.

Individual trees can be very sensitive to minor changes in the data and even better prediction can be achieved by exploiting this variability. Using ensemble methods, we can grow multiple trees from the same data.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. *Nature Methods* **14**:757–758.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. *Nature Methods* **13**:541-542.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Multiple Linear Regression *Nature Methods* **12**:1103-1104.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. *Nature Methods* **13**:603-604.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Model Selection and Overfitting. *Nature Methods* **13**:703-704.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Regularization. *Nature Methods* **13**:803-804.

The artwork was created in collaboration with my colleagues at the Genome Sciences Center to celebrate the 5 year anniversary of the Personalized Oncogenomics Program (POG).

The Personal Oncogenomics Program (POG) is a collaborative research study including many BC Cancer Agency oncologists, pathologists and other clinicians along with Canada's Michael Smith Genome Sciences Centre with support from BC Cancer Foundation.

The aim of the program is to sequence, analyze and compare the genome of each patient's cancer—the entire DNA and RNA inside tumor cells— in order to understand what is enabling it to identify less toxic and more effective treatment options.

Principal component analysis (PCA) simplifies the complexity in high-dimensional data by reducing its number of dimensions.

To retain trend and patterns in the reduced representation, PCA finds linear combinations of canonical dimensions that maximize the variance of the projection of the data.

PCA is helpful in visualizing high-dimensional data and scatter plots based on 2-dimensional PCA can reveal clusters.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Principal component analysis. *Nature Methods* **14**:641–642.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Clustering. *Nature Methods* **14**:545–546.