Tango is a sad thought that is danced.think & dancemore quotes

EMBO Practical Course: Bioinformatics and Genome Analysis, 5–17 June 2017.

Needles in Stacks of Needles: genomics + data mining

visual abstract

The talk introduces genomics and cancer biology to computer scientists and outlines areas in which data mining methods are being used to further our understanding of the genome. The theme is one of complexity and relevance — computers manage the former, but we are the ultimate judges of the latter. (download talk, ICDM2012)

abstract

In 2001, the first human genome sequence was published. Now, just over 10 years later, we capable of sequencing a genome in just a few days. Massive parallel sequencing projects now make it possible to study the cancers of thousands of individuals. New data mining approaches are required to robustly interrogate the data for causal relationships among the inherently noisy biology. How does one identify genetic changes that are specific and causal to a disease within the rich variation that is either natural or merely correlated? The problem is one of finding a needle in a stack of needles. I will provide a non-specialist introduction to data mining methods and challenges in genomics, with a focus on the role visualization plays in the exploration of the underlying data.

references

The title of the talk was drawn from the paper

Gregory M. Cooper & Jay Shendure Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data Nature Reviews Genetics 12, 628-640 (September 2011)

I will be posting a full list of references for the talk shortly.

VIEW ALL

Happy 2017 $\pi$ Day—Star Charts, Creatures Once Living and a Poem

Tue 14-03-2017

on a brim of echo,

capsized chamber
drawn into our constellation, and cooling.
—Paolo Marcazzan

Celebrate $\pi$ Day (March 14th) with star chart of the digits. The charts draw 40,000 stars generated from the first 12 million digits.

12,000,000 digits of $\pi$ interpreted as a star catalogue. (details)

The 80 constellations are extinct animals and plants. Here you'll find old friends and new stories. Read about how Desmodus is always trying to escape or how Megalodon terrorizes the poor Tecopa! Most constellations have a story.

Find friends and stories among the 80 constellations of extinct animals and plants. Oh look, a Dodo guardings his eggs! (details)

This year I collaborate with Paolo Marcazzan, a Canadian poet, who contributes a poem, Of Black Body, about space and things we might find and lose there.

Check out art from previous years: 2013 $\pi$ Day and 2014 $\pi$ Day, 2015 $\pi$ Day and and 2016 $\pi$ Day.

Data in New Dimensions: convergence of art, genomics and bioinformatics

Tue 07-03-2017

Art is science in love.
— E.F. Weisslitz

A behind-the-scenes look at the making of our stereoscopic images which were at display at the AGBT 2017 Conference in February. The art is a creative collaboration with Becton Dickinson and The Linus Group.

Its creation began with the concept of differences and my writeup of the creative and design process focuses on storytelling and how concept of differences is incorporated into the art.

Oh, and this might be a good time to pick up some red-blue 3D glasses.

A stereoscopic image and its interpretive panel of single-cell transcriptomes of blood cells: diseased versus healthy control.

Interpreting P values

Thu 02-03-2017
A P value measures a sample’s compatibility with a hypothesis, not the truth of the hypothesis.

This month we continue our discussion about $P$ values and focus on the fact that $P$ value is a probability statement about the observed sample in the context of a hypothesis, not about the hypothesis being tested.

Nature Methods Points of Significance column: Interpreting P values. (read)

Given that we are always interested in making inferences about hypotheses, we discuss how $P$ values can be used to do this by way of the Benjamin-Berger bound, $\bar{B}$ on the Bayes factor, $B$.

Heuristics such as these are valuable in helping to interpret $P$ values, though we stress that $P$ values vary from sample to sample and hence many sources of evidence need to be examined before drawing scientific conclusions.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Interpreting P values. Nature Methods 14:213–214.

Krzywinski, M. & Altman, N. (2017) Points of significance: P values and the search for significance. Nature Methods 14:3–4.

Krzywinski, M. & Altman, N. (2013) Points of significance: Significance, P values and t–tests. Nature Methods 10:1041–1042.

Snellen Charts—Typography to Really Look at

Sat 18-02-2017

Another collection of typographical posters. These ones really ask you to look.

Snellen charts designed using physical constants, Braille and elemental abundances in the universe and human body.

The charts show a variety of interesting symbols and operators found in science and math. The design is in the style of a Snellen chart and typset with the Rockwell font.

Essentials of Data Visualization—8-part video series

Fri 17-02-2017

In collaboration with the Phil Poronnik and Kim Bell-Anderson at the University of Sydney, I'm delighted to share with you our 8-part video series project about thinking about drawing data and communicating science.

Essentials of Data Visualization: Thinking about drawing data and communicating science.

We've created 8 videos, each focusing on a different essential idea in data visualization: encoding, shapes, color, uncertainty, design, drawing missing or unobserved data, labels and process.

The videos were designed as teaching materials. Each video comes with a slide deck and exercises.