Circos is software that generates circularly composited views of genomic data and annotations.
Figures created by Circos are engaging, pretty and informative.
Circos is particularly suited for visualizing alignments, conservation and intra and inter-chromosomal relationships. (presentations on Circos; drawn heavily from Tufte's Visual Display of Quantitative Information)
Hive plots are a type of layout algorithm that is designed to make sense out of very large networks. The method is quantitative — placement of nodes depends only on network properties.
Hive plots are an answer to the challenge of uninformative network hairball visualization.
I had the opportunity to design the cover of the Genome Informatics 2010 Conference program book. The cover shows sequences of some of the genes and viruses that appear in this conference's abstracts and uses the genome path algorithm previously used in the Deadly Genomes poster.
The Deadly Genomes is a visualization of the size and structure of genomes of viruses and bacteria that are agents of prevalent human diseases. Their genomes are visualized as a path, and each organism is spaced on the poster according to the incidence and mortality of the disease.
This image reached the finalist stage at the 2009 National Science Foundation Visualization Challenge.
December 2009 saw the 10th Anniversary of the Genome Sciences Center. Some commemorative swag was handed out, among which was a stainless steel water bottle with the following image.
The image contains a barcode called QR Code (learn more) which encodes the names of all current employees at the Center.
Lexical analysis of 2008 US Presidential and Vice-Presidential Debates indicates that the speech patterns between candidates (especially those paired in a debate) are extremely similar and that the complexity of vice-presidential candidates is lower than presidential candidates (uniqueness is lower, repetition is higher).
Palin has the longest sentences, Biden repeats himself the most and has the smallest vocabulary, while patterns for Obama and McCain are eerily similar.
carpalx is a keyboard optimizer which rearranges letter positions on a keyboard to minimize typing effort. Discover the magical XBUL keyboard layouts which minimizes typing of English text. Or, if you dare, venture into the land of the disfigured TNWCLR keyboard layout which makes typing English text excruciatingly painful.
High Dynamic Time Range images (HDTR) are single-frame composites of a set of time-lapse photos.
The bioinformatics Perl workshop offers courses to help you learn Perl and apply it to your work. We have courses on introductory Perl, intermediate Perl, and others. Learn how to use map, grep and sort more efficiently or how to perform data analysis at the command line. The workshop is open to the public (given at the GSC 570 W 7th location) and all slides from each lecture are available online.
schemaball generates circularly composited views of SQL database schemas
High-resolution 32k BAC array for aCGH studies of human genome.
clusterpunch is a mini-benchmarker for clusters designed to monitor availability of resources
portknocking is a network authentication method in which a client establishes a connection to a host which presents no open ports
alex is a very famous pet rat, who had appearances in Genome Research and Maximum PC.
color encoding of vectors Color::TupleEncode - Mapping tuples to colors and visually comparing numbers
short-read sequencing genome coverage tables tables of read coverage for haploid, diploid and triploid genomes for a given sequencing redundancy
genome coverage simulator explore whole genome shotgun statistics
Image color summarizer produces statistics about an image's mean/median hue, saturation and intensity values. It's fun to play with and can be (eventually) used to auto-tag images based on color content.
Lumondo Photography is my commercial front-end.
Canon EF Lenses A f/ vs mm chart of all Canon EF lenses, and a few links to useful lens resources.
UBC model rocket launch competition was not without accidents.
We look at what happens how uncertainty of two variables combines and how this impacts the increased uncertainty when two samples are compared and highlight the differences between the two-sample and paired t-tests.
When performing any statistical test, it's important to understand and satisfy its requirements. The t-test is very robust with respect to some of its assumptions, but not others. We explore which.
Krzywinski, M. & Altman, N. (2014) Points of Significance: Comparing Samples — Part I Nature Methods 11:215-216.
Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.
Beautiful Science explores how our understanding of ourselves and our planet has evolved alongside our ability to represent, graph and map the mass data of the time. The exhibit runs 20 February — 26 May 2014 and is free to the public. There is a good Nature blog writeup about it, a piece in The Guardian, and a great video that explains the the exhibit narrated by Johanna Kieniewicz, the curator.
I am privileged to contribute an information graphic to the exhibit in the Tree of Life section. The piece shows how sequence similarity varies across species as a function of evolutionary distance. The installation is a set of 6 30x30 cm backlit panels. They look terrific.
Quick, name three chart types. Line, bar and scatter come to mind. Perhaps you said pie too—tsk tsk. Nobody ever thinks of the box plot.
Box plots reveal details about data without overloading a figure with a full frequency distribution histogram. They're easy to compare and now easy to make with BoxPlotR (try it). In our fifth Points of Significance column, we take a break from the theory to explain this plot type and—I hope— convince you that they're worth thinking about.
The February issue of Nature Methods kicks the bar chart two more times: Dan Evanko's Kick the Bar Chart Habit editorial and a Points of View: Bar charts and box plots column by Mark Streit and Nils Gehlenborg.
Krzywinski, M. & Altman, N. (2014) Points of Significance: Visualizing samples with box plots Nature Methods 11:119-120.
For specialists, visualizations should expose detail to allow for exploration and inspiration. For enthusiasts, they should provide context, integrate facts and inform. For the layperson, they should capture the essence of the topic, narrate a story and deligt.
Wired's Brandon Keim wrote up a short article about me and some of my work—Circle of Life: The Beautiful New Way to Visualize Biological Data.
Experimental designs that lack power cannot reliably detect real effects. Power of statistical tests is largely unappreciated and many underpowered studies continue to be published.
This month, Naomi and I explain what power is, how it relates to Type I and Type II errors and sample size. By understanding the relationship between these quantities you can design a study that has both low false positive rate and high power.
Krzywinski, M. & Altman, N. (2013) Points of Significance: Power and Sample Size Nature Methods 10:1139-1140.
20 Tips for Interpreting Scientific Claims is a wonderful comment in Nature warning us about the limits of evidence.
Sutherland WJ, Spiegelhalter D & Burgman M (2013) Policy: Twenty tips for interpreting scientific claims. Nature 503:335–337.
Have you wondered how statistical tests work? Why does everyone want such a small P value?
This month, Naomi and I explain how significance is measured in statistics and remind you that it does not imply biological significance. You'll also learn why the t-distribution is so important and why its shape is similar to that of a normal distribution, but not quite.
Krzywinski, M. & Altman, N. (2013) Points of Significance: Significance, P values and t-tests Nature Methods 10:1041-1042.
Your slides are not your presentation. They are a representation of your presentation.
Effective presentations require that you have a clear narrative—control detail and emphasis to deliver your message. Engage the audience early. Don't dump on them.
Effective slides are visual cues. Show only what you can't easily say. Text should acts as emphasis. Don't read.
Error bar overlap does not imply significance. Error bar gap does not imply lack of significance. Chances are you find these statements surprising.
You've seen and used error bars. But do you understand how to interpret them in the context of statistical signifiance? This month we address the most common (and commonly misunderstood) method of visualizing uncertainty.
We discuss error bars based on standard deviation, standard error of the mean and confidence intervals. It turns out that none of these behave as our intuition would wish.
Krzywinski, M. & Altman, N. (2013) Points of Significance: Error Bars Nature Methods 10:921-922.
This month, Nature Method is launching Points of Significance a new column to educate, enlighten and, if possible, entertaining bench scientists about statistics.
Our first publication — The Importance of Being Uncertain — acknowledges not only the imperative of being right about how we're wrong, but also our appreciation for Oscar Wilde.
Krzywinski, M. & Altman, N. (2013) Points of Significance: Importance of Being Uncertain Nature Methods 10:809-810.
Interested in data visualization? The Points of View columns are an excellent way to learn practical tips and design principles that help you communicate clearly. All the columns are now available as a collection, and open access during August 2013.
The columns were written by Bang Wong, Martin Krzywinski, Nils Gehlenborg, Cydney Nielsen, Noam Shoresh, Rikke Schmidt Kjærgaard, Erica Savig and Alberto Cairo.
Instead of "explain, not merely show," seek to "narrate, not merely explain."
The distinction between the specialist and the communicator was made by Albert Cairo at 2013 Bloomberg Design Conference. I have used this principle to structure my talk to the UBC Tableau Users Group.
Design is algorithmics for the page. Use its principles to inform how to choose from among the options offered by your software. Recognize the limitations of your tool, as well as those features that are ineffective.
Don't practise visual intuitics—use shapes whose size and proportion can be well judged.
The science of cancer genomics will be interpreted by individuals whose lives are affected by genomic mutations using the art style of Aaron De La Cruz.
Beautiful, meaningful and personal.
This month, Erica Savig and I look at the design process for a figure from her paper Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators. The underlying data set has 1.2 billion individual observations, categorized by drug, cell line, protein and stimulation condition.
2012 Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators Nature Biotechnology 30:858-867.
Although spatial encoding is the most perceptually accurate, in this case it's not the best channel to display quantitative information. Instead, the x/y position on the page is used to organize small multiples of the network of affected proteins.
Choose symbols that overlap without ambiguity and communicate relationships in data.
Using Strunk's Elements of Style as an example of writing guidelines, I look how these can be translated to creating figures.
When we create figures, we must communicate and design. In my talk I discuss some of the rules that turn graphical improvisation into a structured and reproducible process.
Celebrate Pi Day (March 14th) with a funky modern posters. Transcend, don't repeat, yourself and watch the dots shimmer.
I am always drawn to type and periodically I must do something about it.
I take over from Bang Wong as primary contributor to the Points of View column, a monthly advice and opinion piece about data visualization and information and figure design in molecular biology.
Together with Alberto Cairo, and then in conversation with Sam Grobart, I presented about science and design at Bloomberg's Businessweek Design Conference in San Francisco.
Creating strings of genome jewelery. Read about how it was done.
Building on the method I used to analyze the 2008 debates, I look at the 2012 Debates between Obama and Romney, lexically speaking. Obama speaks to "folks", while Romney fearmongers with "kill" and "hurt".
Making things round, not square. Read about how it was done.
And usually, really long and funny ones.
My neologisms were picked up by James Gorman of the New York Times in an article Ome, the sound of the scientific universe expanding.
Biology or astrophysics? Read about how it was done.
The image was published on the cover of PNAS (PNAS 1 May 2012; 109 (18))
Numerology is bogus but art based on numbers has a beautiful random quality. Oh, and none of the metaphysical baggage.
The quantity formed by the overlap of two or more numbers.
How much 4ness does π have?
Compare the iness of π to that of the other famous transcendental number, e, and the mysterious but attractive Golden Ratio, φ.
I have found a way to combine my curiosity about space, fear of large sequence assemblies and love of typography in a single illustration. Inspired by typographical portraits, I wanted to automate representing an image with multiple font weights, while sampling characters from a quote or debate transcripts.
If you made widgets, you could be justified in campaigning a widget of the year. Business acumen suggests it should be one of your widgets. Pantone has done exactly that, naming their 17-1463 color (tangerine tango), as color of the year 2012.
I prefer green—green jive.
I really like the world's most expensive photograph, Rhein II by Andreas Gursky. Cautious use of the word "expensive" should be practised — in this case, merely meaning that only one person saw the $4.3 million price tag. Others saw lower prices, or no price tag at all.
Here's my own attempt at such compositions.
I could not find Illustrator swatch files for this awesome color resource, so I created them myself.
If you're interested in color and design and don't know about Brewer palettes, see my presentation.
World-wide Google searches, categorized by one of 21 languages, are visualized with WebGL, available from Chrome Experiments. The data offers some fascinating insights such as (a) in what two places in the US are Google searches in Chinese are performed? (b) what are the most remote locations are from which Google searches were detected? (c) Why is Istanbul the 3rd top location for searches? Why is Miami in the top 10?
In a recent conversation, I was challenged to name as many organisms with the same genus and species as I could. Neither a biologist, and especially not a taxonomist, my responses were limited to organisms with sequenced genomes I had come across in the literature. Immediately to mind sprung Gallus gallus (chicken) and ... nothing else. Well, that was embarrassing.
I was suddently taken up by the urge to find all instances of this occurrence. Using resources at the NCBI Taxonomy Browser I downloaded the NCBI taxonomy table which contains 1,097,405 entries in the names.dmp file (not all of these are unique genus/species combinations).
To my suprise I discovered that my performance in this challenge was beyond dysmal. In fact, there are 380 genuses which contain organisms that have the same genus and species name. Most of them (317) include a single organism, but some have many. For example the genus Salamandra has 14 organisms with the species salamandra, including Salamandra salamandra, Salamandra salamandra crespoi and Salamandra salamandra morenica. The genus Regulus has 13 organisms, including Regulus regulus azoricus, Regulus regulus japonensis and Regulus regulus regulus (these are all Goldcrests).
In total, there are 546 unique entries, when organisms with a unique subspecies name are considered distinct. If subspecies is not considered, the number of organisms with the same genus as species (i.e., regardless of subspecies) is 383. Here are organisms whose genus/species name is shorter than 6 letters (82 entries).
The nematode worm Macropostrongylus macropostrongylus has the honour of being the longest genus/species duplicate organism. Given this distinction, it is surprising that Pubmed returns only 2 papers that refer to it.
Download the full list. The number next to each ENTRY field is the NCBI Taxonomy ID for the organism. In a small number of cases there are ambiguities in parsing the data file (e.g. Troglodytes cf. troglodytes PS-2, Troglodytes sp. troglodytes PS-1). I left these in.
Visual acuity limits of the human eye restrict the resolution at which we can comfortably visualize data.
In this short guide, I explain why dividing a scale into no more than 500 divisions is a good idea.
Recently, I was surprised to find out that the following domains were available
All these now point to the Circos site.
ee spammings are spam edited into a format reminiscent of the poetry of ee cummings. Unwanted solicitations for questionable endeavours and products suddenly turn into heady words of the new literature. Art suddenly freed from the husk of spam.
Literature 2.0 — from unlikely origins.
Here's one example that emphasizes that today is ok.
i got to touch you i like us and know the more. believe recontact me today ok! but matters waiting for happy
I now have over 20 ee spammings — enjoy them all.
What do inconversible, mystific, postpetizer, prenopsis and suscitate have in common?
They are words that don't exist, but should. Learn new words.
What are the world's top questions?
Using Google's autocomplete feature, I have tabulated the world's most popular questions. By combining a interrogative term, such as what, who or why, with a term from a related set, such as do I, can I, and can't I, it is possible to sample the space of questions and obtain the most popular for a given start word combination.
I have tabulated the most popular questions by category.
|general||limits & desires|
|career & education||health|
|sizes & extremes||religion & faith|
What kind of questions about science are people asking? From the Career & Education section,
What are the strangest questions? I'll let you explore, but these have me wondering:
1,000s of tables have already been visualized. Has yours?
Hive plots are excellent at visualizing ratios. They're not just an anti-hairball network visualization agent.
Below are visualized 3 x 8 x 27 = 648 (axes, ribbons, plots) ratios visualized.
The image above compares the relative ratios of region annotations in human, mouse and dog genomes.
Cáceres is a small city of 100,000 inhabitants in western Spain, where the city government is promoting Cáceres Creativa, a project to build citizens collaboratively sustainable future for the city based on activating the creative capacity of the population.
The project has been published as a book (excerpt), which provides a basis for working with city residents and businesses in this collaborative design.
Circos proved useful in showing the complex relationships that are established in such an environment is a city which combines flows of energy and resources, physical items and intellectual concepts. The online Circos tableviewer was used to generate the images.
Taking photos of inanimate objects is rewarding. Your subject doesn't complain, nor move, and a coffee break fits naturally into the workflow at any time. In this case, the inanimate object is over 3 Pb (3,000 Tb) of storage composed of a variety of Netapp appliances.
Using three gelled Hensel Integras (500 Ws monoheads — here I'm using only the modelling light for illumination along with red, blue and green filters) (lighting details), I spent some time getting to know the components up close.
See more photos.
All photos by Martin Krzywinski (Lumondo Photography).
Our new compute cluster has been released to the user community.
This cluster consists of 420 compute nodes each with 12 cores and 48GB RAM, totaling 5,040 cores and 20TB RAM. Each node has 160GB local /tmp space and all nodes are tied together over an Inifiniband 40Gbs network.
The nodes all have access to a dedicated storage system over the Infiniband Network running GPFS with a total 700TB of usable scratch space. The filesystem is served by 8 IBM x3850 servers. All nodes are running CentOS5.4 and are using open source Grid Engine 6.2u5 as their scheduler.
All photos by Martin Krzywinski (Lumondo Photography).
1 First the server room was expanded 2 It was empty and without racks, and the lights were dim. Sysadmins scurried about and unpacked equipment 3 The circuit was closed and there were electrons 4 IT staff were pleased and accounts were handed out to users 5 Who had work they called "important" 6 But which the IT staff merely called "jobs".
Periodically, I take my camera, point it at things. Here, I'll share a favourite from my creations.
This image — I will keep the subject a mystery — gives me the same feeling as some of the Hubble images. For this shot, I didn't need to reach orbit.
Other images in this series are available on flickr.
and an assortment of baggage carts at St Pancreas station (London) which catches the eye.
I like to collect time in a photo, be it uniformly as in this diptych of street and traffic lights from a moving car
or blended, as in this skyline of Vancouver showing the flow of time from 5.30pm to 9.30pm.
DNA is composed of two strands, which are complementary. Given a sequence, its reverse complement is created by swapping A/T and G/C and writing the remapped sequence backwards (e.g. ATGC is first remapped to TACG and then reversed to GCAT).
Consider the corresponding concept applied to English words (or any language, for that matter). First, construct the complementarity map, which assigns to the nth letter of the alphabet the N-n letter, given an alphabet of N letters.
abcdefghijklmnopqrstuvwxyz |||||||||||||||||||||||||| zyxwvutsrqponmlkjihgfedcba
For example, a becomes z, b becomes y, and so on. To create a reverse complement of a word, apply this mapping and then reverse the new word (e.g. 'dog' is remapped to 'wlt' and then reversed to obtain 'tlw').
So far, that's not very exciting.
But consider the question: What is the longest English word that is a palindrome under this set of rules (reverse complementarity). In other words, it's the same forward and backward after complementing the letters. Clearly "dog" is not such a palindrome since its reverse complement is "tlw".
The answer? wizard and hovels.
wizard |||||| draziw -> 'wizard' backwards
It's an amazingly fitting answer, since a wizard is someone with special powers.
A few interesting 4-letter words that are their own reverse complement palindromes are bevy, grit, trig and wold. Common surnames that match are Ghrist, Elizarov and Prawdzik. Female first name Zola and male first name Iver are also reverse complement palindromes, as are trolig (Norwegian for 'likely', as well as an IKEA curtain product) and aviverez (2nd person plural future of 'aviver', French for 'brighten').
Finding just the right font is hard work. There are so many to choose from. Or are there?
You'll notice a rotating image of type faces at the top of this page. Here's the full list.
I love Gotham and have used it in visualization projects. It's more rational than Helvetica and still enjoys a freshness that has evapourated from Helvetica after near-ubiquitous use. Don't get me wrong, there is still not enough Helvetica in the world, but more Gotham would be nice.
Anyone who has met me, quickly learns that I have a personal and antagonistic relationship with Comic Sans, the type face that shouldn't have been.
In a recent article in the journal Cognition, Fortune favours the bold (and the italicized): Effects of disfluence on educational outcomes, Diemand-Yauman et al. suggest that rendering educational materials in a hard-to-read font, and thereby recruiting the effects of the disfluency ("the subjective experience of difficulty associated with cognitive operations"), improves retention of material.
Regardless whether the effect is real, there must be better ways to improve education than through bad design.
In a cosmically improbable confluence of multidisciplinary pursuits, my work on keyboard layouts, which as one of its fruits has produced the TNWMLC keyboard layout — the most difficult for English typing — has been incorporated into the eponymously named Brazilian fashion line by Julia Valle.