Joint Cold Spring Harbor Laboratory/Wellcome Trust Conference
genome informatics
September 15-19, 2010 / Hinxton, UK v 0.11 / Sep 16 09.26BST

view cover: front back

design by Martin Krzywinski, BCCA (Genome Sciences Center)

The program cover shows sequences of some of the genes and viruses that appear in this conference's abstracts.

sequence as a path

Each sequence is represented by a continuous path. The length of the path is proportional to the length of the sequence. At each point on the path, color is used to show the GC content computed using the 20 bases at that position.

Because the GC content doesn't vary greatly, values in the range 0.2-0.6 are mapped onto hues 0-300, with GC values outside that range assigned to the start and end hues. To smooth the color mpaping, a running average is calculated across 10 adjacent samples.

Path direction is determined by the GC content relative to the average GC content of the human genome. Path curvature is informed by the repeat content near that location, calculated by determining the average frequency of 10-mers sampled within a window of 200 bases relative to their frequency in the human exon sequence. This quantity is expressed relative to the chance of observing these 10-mers randomly and used to inform the angle of the path. Regions that are composed of 10-mers that are relatively rare are straighter than those which contain repetitive regions.

The path is confined within a circular area to keep it compact, at the cost of losing translational and rotational invariance of the representation. This limitation is due to the fact that the segments of the path depend on the angle and position at which the path approaches the circular boundary.

interpreting structure

For genes, the transcribed sequence is shown, which includes both introns and exons.

The overall effect of the path encoding is a qualitative, artistic interpretation of local sequence structure. Two paths can be directly compared to interrogate differences in their corresponding sequence.

genome as a path

The Deadly Genomes poster demonstrates how entire genomes appear when encoded as paths. The poster compares the incidence rates and mortality of harmful viruses and bacteria, such as malaria, syphilis, AIDS and SARS.

The poster was a finalist in the 2009 National Science Foundation Visualization Challenge.