Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert Curve
This love's a nameless dream.Cocteau Twinstry to figure it outmore quotes

science: beautiful



EMBO Practical Course: Bioinformatics and Genome Analysis, 5–17 June 2017.


art + science activism

Watch the video of this project, which features the participants who have a BRCA mutation and their interaction with the piece. The video also highlights the design and construction of the mural.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Science and art and personal stories of cancer survivors combine into this beautiful depiction of the complexity and individuality of the genome. (Free the Data)

Human Genome Art by Humans with Genomes

I recently took part in a deeply meaningful collaboration of science, art and personal stories of cancer survivors.

Together with Joanna Rudnick and Aaron De La Cruz, we sought to create a work of art that combines the science of cancer genomics and the individuals whose lives are affected by genetic mutations in the BRCA1 and BRCA2 genes, where genomic changes drastically increase one's chances of breast and ovarian cancer.

We wanted to make something that is scientifically accurate, artistically beautiful and emotionally engaging. The complexity of the genome, the multitudes of other genes and possible mutations and the millions of personal stories of hardship and survival were just a few of the elements we wanted to include the the piece.

My role was to provide the scientific direction behind the design and incorporate it into the aesthetic of Aaron De La Cruz, a street artist from San Francisco whose work echoes information, complexity, interaction and continuity. We all have a genome — a different genome. The ways in which our genomes are different is what gives us traits like hair and eye color, but is also what makes some of us predisposed to diseases like cancer.

The mural, which includes elements drawn by the cancer survivors, is part of the Free the Data campaign, which is advocating for an open access model of genome mutation databases so that scientists everywhere can analyze it and help women make informed choices about their breast-cancer risk.

The piece Importance of Data Sharing by Nature Methods illustrated the point:

Imagine you are a physician or researcher and seek to get more confirmation on the clinical impact of particular genetic variants. If your search of public databases comes up empty this does not necessarily mean that nothing is known about the mutations in question. Rather, the information may be locked away as a trade secret in a genetic testing company’s proprietary database.

The New York Times article DNA Project Aims to Make Public a Company’s Data on Cancer Genes captures the current state of the situation.

The mural was constructed on location at InVitae in San Francisco.

A video of the project is available.

Beautiful, meaningful and personal

This work will be, as far as I know, the first human annotation of mutations in the human genome by humans whose genomes have the mutations. That's quite a term!

I've always been mindful of the necessity of the mingling of art and science. In my work I tried to add things I felt about the science I thought to create work that combines our objective understanding of the world we live in with the subjective experience of living in it. This project, by far, has been the most keenly felt.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Adding emotion, keeping the science. (Free the Data)

the design

The mural was created in San Francisco on Saturday, July 13th, 2013. We are starting with a 11' x 6' wood canvas. These dimensions reflect the ratio of lengths of BRCA1 and BRCA2 proteins (1,863 and 3,418 amino acids, respectively)

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The canvas aspect ratio reflects the ratio of BRCA1 and BRCA2 protein lengths. The proteins are represented on the canvas as lines. (Free the Data)

The BRCA1 and BRCA2 proteins are drawn on the canvas as straight-line sections.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The genes are depicted on the canvas as their protein products. (Free the Data)

The locations of the participants mutations are positioned on the protein lines as circles. For individuals with large deletions, the circle is placed at the first affected amino acid. Because BRCA1 is location on the opposite strand (anti-sense), its start on the canvas is on the right.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
11 mutations, one for each of the cancer previvor and survivor participants, are placed on the protein lines as circles. The start of BRCA1 is on the right to reflect that this gene is on the anti-sense strand. (Free the Data)

The rest of the genome is now drawn. Aaron's style is perfect for depicting information and the endless complexity of the genome and its interacting elements. We were careful to include elements that indicate that the story told today is not complete. Millions of others have mutations in thousands of other genes, each potentially life-threatening. Just as the stories of our participants will continue to evolve, other stories are waiting to be told.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
BRCA1 and BRCA2 proteins and their mutations, together with the rest of the genome. Other lines and circles hint at other genes, other mutations, as well as the biochemical interactions in the cells and personal interactions of those affected by the mutations. (Free the Data)

Once the "reference" genome is depicted, participants with BRCA1 and BRCA2 mutations will complete the art work by individually marking the positions of their mutations on the art using personalized colors. With Aaron's help, everyone created their own color by mixing primary colors.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Participants fill in their mutation circles with their personalized color. (Free the Data)

From base pair, to genome, to person, to life. All it takes is one tiny change in the genome to change a life forever.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The mutations of 11 people in the vastness of the genome. What's your story? (Free the Data)

creation of free the data mural

The BRCA1 and BRCA2 lines were placed on the canvas by first pinning two pieces of string, marked with the positions of the mutations.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
String was used to mark the placing of lines and mutations. (Free the Data)

After drawing the protein lines, it was time to fill the canvas.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Aaron De La Cruz creating the art work. Here, he is filling the space in the canvas around the BRCA1 and BRCA2 segments with his design. The project was shot with a Red Camera—this is a sequence from its render application. (Free the Data)

Over the next 4 hours, Aaron filled in the canvas with the "rest" of the genome.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Aaron De La Cruz creating the art work. Here, he is filling the space in the canvas around the BRCA1 and BRCA2 segments with his design. The project was shot with a Red Camera—this is a sequence from its render application. (Free the Data)

Participants

Lucy, Karen, Steve, Ghecemy, Joanna, Jill, Lisa, Lynn, Ruth, Jenica, Susan

Cancer previvors and survivors who have been diagnosed with a mutation on BRCA1 or BRCA2 genes.

Joanna Rudnick (director/producer)

Joanna made her directorial debut with the Emmy-nominated In the Family, a deeply personal film about coming to terms with testing positive for the breast cancer gene BRCA1 mutation and following the storylines of other women and families facing the same hard choices. In the Family premiered at Silverdocs in 2008, was broadcast nationally on PBS P.O.V. the same year and was a finalist for the NIHCM Foundation’s Health Care Radio and Television Journalism Award.

Joanna received a master’s degree in Science and Environmental Journalism from New York University and a bachelor’s degree in English from Northwestern University. Joanna loves the opportunity to teach and mentor and served as an adjunct professor at Northwestern University’s Medill School of Journalism in the past.

She has written for several publications including Audubon Magazine, The Artful Mind, The Berkshire Record and Humanities. Before finding her way to the wonderful world of documentaries, Joanna served as an Americorps volunteer, implementing project-based environmental curricula in the San Francisco Public School System.

Joanna is one of the cancer survivors whose mutations are encoded in the art.

http://kartemquin.com/about/joanna-rudnick

Aaron De La Cruz (artist)

Aaron De La Cruz's work, though minimal and direct at first, tends to overcome barriers of separation and freely steps in and out of the realms of design, graffiti, and illustration.

The parameters he has chosen to work within actually allow him to free himself and react to the very limitations he has created. This overriding structure and the lack of deliberation while moving within creates a tension when encountering his work due to the almost computer generated grid like systems he creates by unplanned markmaking. The act and the marks themselves are very primal in nature but tend to take on distinct and sometimes higher meanings in the broad range of mediums and contexts they appear in and on.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Work by Aaron De La Cruz. (Aaron De La Cruz)

His work finds strengths in the reduction of his interests in life to minimal information. De La Cruz gains from the idea of exclusion, just because you don't literally see it doesn't mean that its not there.

http://www.aarondelacruz.com

VIEW ALL

news + thoughts

Machine learning: a primer

Tue 05-12-2017
Machine learning extracts patterns from data without explicit instructions.

In this primer, we focus on essential ML principles— a modeling strategy to let the data speak for themselves, to the extent possible.

The benefits of ML arise from its use of a large number of tuning parameters or weights, which control the algorithm’s complexity and are estimated from the data using numerical optimization. Often ML algorithms are motivated by heuristics such as models of interacting neurons or natural evolution—even if the underlying mechanism of the biological system being studied is substantially different. The utility of ML algorithms is typically assessed empirically by how well extracted patterns generalize to new observations.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Machine learning: a primer. (read)

We present a data scenario in which we fit to a model with 5 predictors using polynomials and show what to expect from ML when noise and sample size vary. We also demonstrate the consequences of excluding an important predictor or including a spurious one.

Bzdok, D., Krzywinski, M. & Altman, N. (2017) Points of Significance: Machine learning: a primer. Nature Methods 14:1119–1120.",

...more about the Points of Significance column

Snowflake simulation

Tue 14-11-2017
Symmetric, beautiful and unique.

Just in time for the season, I've simulated a snow-pile of snowflakes based on the Gravner-Griffeath model.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A few of the beautiful snowflakes generated by the Gravner-Griffeath model. (explore)

Gravner, J. & Griffeath, D. (2007) Modeling Snow Crystal Growth II: A mesoscopic lattice map with plausible dynamics.

Genes that make us sick

Thu 02-11-2017
Where disease hides in the genome.

My illustration of the location of genes in the human genome that are implicated in disease appears in The Objects that Power the Global Economy, a book by Quartz.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The location of genes implicated in disease in the human genome, shown here as a spiral. (more...)

Ensemble methods: Bagging and random forests

Mon 16-10-2017
Many heads are better than one.

We introduce two common ensemble methods: bagging and random forests. Both of these methods repeat a statistical analysis on a bootstrap sample to improve the accuracy of the predictor. Our column shows these methods as applied to Classification and Regression Trees.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Ensemble methods: Bagging and random forests. (read)

For example, we can sample the space of values more finely when using bagging with regression trees because each sample has potentially different boundaries at which the tree splits.

Random forests generate a large number of trees by not only generating bootstrap samples but also randomly choosing which predictor variables are considered at each split in the tree.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Ensemble methods: bagging and random forests. Nature Methods 14:933–934.

Background reading

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. Nature Methods 14:757–758.

...more about the Points of Significance column

Classification and regression trees

Mon 16-10-2017
Decision trees are a powerful but simple prediction method.

Decision trees classify data by splitting it along the predictor axes into partitions with homogeneous values of the dependent variable. Unlike logistic or linear regression, CART does not develop a prediction equation. Instead, data are predicted by a series of binary decisions based on the boundaries of the splits. Decision trees are very effective and the resulting rules are readily interpreted.

Trees can be built using different metrics that measure how well the splits divide up the data classes: Gini index, entropy or misclassification error.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Classification and decision trees. (read)

When the predictor variable is quantitative and not categorical, regression trees are used. Here, the data are still split but now the predictor variable is estimated by the average within the split boundaries. Tree growth can be controlled using the complexity parameter, a measure of the relative improvement of each new split.

Individual trees can be very sensitive to minor changes in the data and even better prediction can be achieved by exploiting this variability. Using ensemble methods, we can grow multiple trees from the same data.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. Nature Methods 14:757–758.

Background reading

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. Nature Methods 13:541-542.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Multiple Linear Regression Nature Methods 12:1103-1104.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. Nature Methods 13:603-604.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Model Selection and Overfitting. Nature Methods 13:703-704.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Regularization. Nature Methods 13:803-804.

...more about the Points of Significance column