Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert Curve
Love itself became the object of her love.Jonathan Safran Foercount sadnessesmore quotes

data: beautiful



See you at Shonan Meeting 167 — Formalizing Biomedical Visualization


genomics + data mining

ICDM2012 Keynote

Needles in Stacks of Needles: genomics + data mining

Download talk

visual abstract

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The talk introduces genomics and cancer biology to computer scientists and outlines areas in which data mining methods are being used to further our understanding of the genome. The theme is one of complexity and relevance — computers manage the former, but we are the ultimate judges of the latter. (download talk, ICDM2012)

abstract

In 2001, the first human genome sequence was published. Now, just over 10 years later, we capable of sequencing a genome in just a few days. Massive parallel sequencing projects now make it possible to study the cancers of thousands of individuals. New data mining approaches are required to robustly interrogate the data for causal relationships among the inherently noisy biology. How does one identify genetic changes that are specific and causal to a disease within the rich variation that is either natural or merely correlated? The problem is one of finding a needle in a stack of needles. I will provide a non-specialist introduction to data mining methods and challenges in genomics, with a focus on the role visualization plays in the exploration of the underlying data.

references

The title of the talk was drawn from the paper

Gregory M. Cooper & Jay Shendure Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data Nature Reviews Genetics 12, 628-640 (September 2011)

I will be posting a full list of references for the talk shortly.

VIEW ALL

news + thoughts

Using Circos in Galaxy Australia Workshop

Thu 20-02-2020

A workshop in using the Circos Galaxy wrapper by Rasche and Hiltemann. Event organized by Australian Biocommons.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Using Circos in Galaxy Australia workshop. (zoom)

Download workshop slides.

Galaxy wrapper training materials, Saskia Hiltemann, Helena Rasche, 2020 Visualisation with Circos (Galaxy Training Materials).

Essence of Data Visualization in Bioinformatics Webinar

Thu 20-02-2020

My webinar on fundamental concepts in data visualization and visual communication of scientific data and concepts. Event organized by Australian Biocommons.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Essence of Data Visualization in Bioinformatics webinar. (zoom)

Download webinar slides.

Markov models — training and evaluation of hidden Markov models

Thu 20-02-2020

With one eye you are looking at the outside world, while with the other you are looking within yourself.
—Amedeo Modigliani

Following up with our Markov Chain column and Hidden Markov model column, this month we look at how Markov models are trained using the example of biased coin.

We introduce the concepts of forward and backward probabilities and explicitly show how they are calculated in the training process using the Baum-Welch algorithm. We also discuss the value of ensemble models and the use of pseudocounts for cases where rare observations are expected but not necessarily seen.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Markov models — training and evaluation of hidden Markov models. (read)

Grewal, J., Krzywinski, M. & Altman, N. (2019) Points of significance: Markov models — training and evaluation of hidden Markov models. Nature Methods 17:121–122.

Background reading

Altman, N. & Krzywinski, M. (2019) Points of significance: Hidden Markov models. Nature Methods 16:795–796.

Altman, N. & Krzywinski, M. (2019) Points of significance: Markov Chains. Nature Methods 16:663–664.

Genome Sciences Center 20th Anniversary Clothing, Music, Drinks and Art

Tue 28-01-2020

Science. Timeliness. Respect.

Read about the design of the clothing, music, drinks and art for the Genome Sciences Center 20th Anniversary Celebration, held on 15 November 2019.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Luke and Mayia wearing limited edition volunteer t-shirts. The pattern reproduces the human genome with chromosomes as spirals. (zoom)

As part of the celebration and with the help of our engineering team, we framed 48 flow cells from the lab.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Precisely engineered frame mounts of flow cells used to sequence genomes in our laboratory. (zoom)

Each flow cell was accompanied by an interpretive plaque explaining the technology behind the flow cell and the sample information and sequence content.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The plaque at the back of one of the framed Illumina flow cell. This one has sequence from a patient's lymph node diagnosed with Burkitt's lymphoma. (zoom)

Scientific data visualization: Aesthetic for diagrammatic clarity

Mon 13-01-2020

The scientific process works because all its output is empirically constrained.

My chapter from The Aesthetics of Scientific Data Representation, More than Pretty Pictures, in which I discuss the principles of data visualization and connect them to the concept of "quality" introduced by Robert Pirsig in Zen and the Art of Motorcycle Maintenance.