This tutorial took place on Monday Mar 5th 2012 at VIZBI 2012 in Heidelberg Germany.
Jessie Kennedy · We will present fundamental principles of graphic design and visual communication that will help you create more effective interactive and print visualizations. You will learn how the purposeful use of salience, color, consistency and layout can help communicate large data sets and complex ideas with greater immediacy and clarity.
Cydney Nielsen · We will illustrate how these principles were implemented in ABySS-Explorer to visualize genome assemblies, an example to show you ways to apply design ideas to your own project.
Martin Krzywinski · At the end of the tutorial, you will apply what you have learned in an interactive group session in which you will design a figure illustrating a biological process.
|9:30 – 10:15||45 min||Jessie Kennedy
|10:15 – 10:25||10 min||break|
|10:25 – 11:10||45 min||Cydney Nielsen
|11:10 – 11:20||10 min||form teams + select figure to critique|
|11:20 – 11:30||10 min||break|
|11:30 – 12:00||30 min||Martin Krzywinski
Practical — Breakout session
|12:00 – 13:00||60 min||team presentations
It is not necessary to read the paper from which your figure was selected. I have included the papers only if you are interested in learning about the figure's context.
Designing effective visualizations in the biological sciences (PSA Genomics Workshop, Seattle, 12 July 2011)
Designing effective visualizations in the biological sciences (Genome Sciences Center bioinformatics seminar, 26 August 2011)
Drawing Data: Creaing information-rich, informative and appealing figures for publication and presentation (BCCA workshop, 8 Jun 2011)
Visualizing Quantitative Information (Genome Sciences Center bioinformatics seminar)
Look for my chapter on visualization principles in the upcoming Visualizing Biological Data — a Practical Guide. This book is being written by VIZBI 2011 participants and edited by Sean O'Donoghue and Jim Procter.
In this primer, we focus on essential ML principles— a modeling strategy to let the data speak for themselves, to the extent possible.
The benefits of ML arise from its use of a large number of tuning parameters or weights, which control the algorithm’s complexity and are estimated from the data using numerical optimization. Often ML algorithms are motivated by heuristics such as models of interacting neurons or natural evolution—even if the underlying mechanism of the biological system being studied is substantially different. The utility of ML algorithms is typically assessed empirically by how well extracted patterns generalize to new observations.
We present a data scenario in which we fit to a model with 5 predictors using polynomials and show what to expect from ML when noise and sample size vary. We also demonstrate the consequences of excluding an important predictor or including a spurious one.
Bzdok, D., Krzywinski, M. & Altman, N. (2017) Points of Significance: Machine learning: a primer. Nature Methods 14:1119–1120.",
Just in time for the season, I've simulated a snow-pile of snowflakes based on the Gravner-Griffeath model.
Gravner, J. & Griffeath, D. (2007) Modeling Snow Crystal Growth II: A mesoscopic lattice map with plausible dynamics.
We introduce two common ensemble methods: bagging and random forests. Both of these methods repeat a statistical analysis on a bootstrap sample to improve the accuracy of the predictor. Our column shows these methods as applied to Classification and Regression Trees.
For example, we can sample the space of values more finely when using bagging with regression trees because each sample has potentially different boundaries at which the tree splits.
Random forests generate a large number of trees by not only generating bootstrap samples but also randomly choosing which predictor variables are considered at each split in the tree.
Krzywinski, M. & Altman, N. (2017) Points of Significance: Ensemble methods: bagging and random forests. Nature Methods 14:933–934.
Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. Nature Methods 14:757–758.