If you are using Circos, please cite as

Krzywinski, M. et al. Circos: an Information Aesthetic for Comparative Genomics. Genome Res (2009) 19:1639-1645. (download XML Endnote citation)

Use Circos to create concise, explanatory, unique and print-ready visualizations of your data.

Current version is 0.52.

Try the online version.


Google Groups

Email:

Visit this group



Current version is 0.52.
This is a bug release. Issues with ideogram ordering and axis breaks have been addressed.
Current tools version is v0.13.
New is categoryviewer, used to visualize categorical data and a significantly updated tableviewer. Read about the theory and practise of visualizing tabular data.


Circos Online An online version of Circos is now available to visualize tabular data. Turn your tables into informative images!


3x3 table

4x4 table

6x6 table

8x8 table


Circos in Conde Nast Portfolio

Conde Nast Porfolio features an article on personalized genome sequencing. An image by Circos appears in a double-page spread. The art direction called for something primarily artistic and visually appealing, but also connected to the content of the article.


Circos in American Scientist

An image created by Circos appears on the cover of the Sept/Oct issue of American Scientist.

The image accompanies an article by Elaine Ostrander about dog genetics and illustrates the deep sequence similarity between the human and dog genomes. Read about the figure.


Circos in New York Times

Circos is used to generate a visualization of the 2007 Democratic and Republican Debates. The image was created by Jonathan Corum and Farhana Hossain, who discuss how the image was conceptualized.

Circos is featured in the Science section of the New York Times (22 Jan 2007).


Circos Presentations

Circos - a data viewer with comparative genomics in mind Circos presentation - Martin Krzywinski

Visualizing quantitative information - featuring work of Tufte Circos presentation - Martin Krzywinski


Circos Posters

A variety of Circos posters are available.

American Scientist Cover Image

<a href='images/amsci/amsci-cover-tear.png'>zoom</a> | Cover tearsheet.
Figure | zoom | Cover tearsheet.

Below is a short description of the data processing and design of the American Scientist Sept/Oct 2007 cover image. The image accompanies the article Genetics and the Shape of Dogs by Elaine Ostrander.

Although a great deal of differences between dogs and humans exist, such as, for example, the curious lack of dignity in the canine species (as anyone with a dog can attest), the genomes of human and dog show similiarity. This similarity, called synteny when comparison is made across species, is due to the fact that the dog and human share a distant common ancestor. Examination of the genomic sequence suggests that the dog and human diverged from a common ancestor about 90-100 million years ago (Springer MS, Murphy WJ, Eizirik E, O'Brien SJ: Placental mammal diversification and the Cretaceous-Tertiary boundary. Proc Natl Acad Sci U S A 2003, 100:1056-1061) in the Cretaceous period.

The magazine's cover image illustrates a distilled set of similarity relationships between the human genome, whose chromosomes are coded in blue, and the dog genome, whose chromosomes are coded in orange. Regions of synteny (sequence similarity) are linked using grey ribbons. To simplify the final image, neighbouring individual regions of synteny, which are relatively small (on the order of kb) are bundled to comprise one ribbon.

Syntenic relationships between dog chromosome 15 and the human genome are highlighted in colour. This chromosome is of specific interest in the Ostrander lab (Ostrander EA, Wayne RK: The canine genome. Genome Res 2005, 15:1706-1716).

Visualizing Comparative Genomic Data

Early versions of the cover image. I was searching for visually interesting patterns in the bundles of homology. The first panel is extremely complex and, although colourful, carries little interpretable information. The next two panels are more concise but inefficient and contain a lot of empty space. The last panel illustrates that color can be a powerful way to encode data, but too much color (first panel) overwhelms the eye.
Figure | Early versions of the cover image. I was searching for visually interesting patterns in the bundles of homology. The first panel is extremely complex and, although colourful, carries little interpretable information. The next two panels are more concise but inefficient and contain a lot of empty space. The last panel illustrates that color can be a powerful way to encode data, but too much color (first panel) overwhelms the eye.

Visually exploring comparative genomic data is difficult. Not only is the task made difficult by the fact that there is a very large number of genomes that have been sequenced, or are in the process of being sequenced, but also by the fact that the genomes themselves are large and the similarity data is sparse. There have been many efforts to generate visual representations of genome-to-genome relationships. Circos is one such project.

The difficulty in generating graphical representations of comparative data quickly becomes apparent when one explores the data itself. Using the UCSC Genome Viewer Table Browser regions of sequence similarity between dog and human number over 3,700,000. In total these nearly 4 million pairs of related regions provide more than 1-fold coverage of the dog and human genomes - this is possible because coordinates of similarity pairs overlap. Out of all the pairs, the vast majority are comprised of small regions (90% of regions are <400bp on dog and <330bp on human). Adjacent bundles of such pairs are frequent, indicating similarity across large regions of genomes. However, many gaps in similarity exist within the bundles, and sometimes short runs of similarity to other regions break up the bundles.

Below is an example of a small fraction of such data, which relates a region on a dog chromosomes (cfN, N=1..38,X) to a human chromosome (hsM, M=1..22,X). Notice that there is a pair of similarity pairs that indicate a region on cf1 shows sequence similarity to regions of hs19 and hs3. Closely downstream, a single region of homology between cf1 and hs2 breaks up what appears to be a bundle of homology between cf1:112Mb and hs19:51Mb.

cf1 112324823 112331694 6872 hs19 51713100 51718054 4955
cf1 112328159 112330938 2780 hs19 51715464 51716592 1129
cf1 112328700 112329092 393 hs3 198134679 198134817 139
cf1 112418235 112463291 45057 hs19 51320175 51443212 123038
cf1 112582829 112601354 18526 hs19 51115819 51121833 6015
cf1 112852364 112853418 1055 hs19 50799219 50801066 1848
cf1 113508037 113509480 1444 hs19 49956706 49958137 1432
cf1 113638063 113642450 4388 hs2 184178863 184181447 2585
cf1 113900245 113901596 1352 hs19 49504154 49505831 1678

Attempting to draw all of these data results in a jumble that is difficult to interpret. Although zooming into pairs of regions offers detailed accounting of regions of homology, the large picture (the homology bundles) cannot be appreciated at this scale.

The challenge for the American Scientist figure was to depict sequence similarity between the dog and human genomes in a manner that was both informative and visually appealing. Due to the large number of individual regions of homology, across a large range of sizes, some data filtering and collating was necessary to strike a balance between clarity and complexity. Because dog chromosome 15 is of particular interest to Elaine Ostrander, the author of the article, and her group, it was deemed that the figure should also draw attention to the relationship between chromosome 15 and the human genome.

data processing

Effect of bin size on complexity of figure. Shown here is homology between dog chromosome 15 and the human genome. Results with bins of size 5, 10, 25 and 50kb are shown. The cover image used 100kb bins.
Figure | Effect of bin size on complexity of figure. Shown here is homology between dog chromosome 15 and the human genome. Results with bins of size 5, 10, 25 and 50kb are shown. The cover image used 100kb bins.

I started with the dog vs human sequence similarity available from UCSC Table Viewer. These data were in pairs

i,j,k l,m,n
indicating sequence homology between ith dog chromosome's region j-k and lth human chromosome region m-n. To limit the complexity in the data, I binned each data pair by dividing the dog genome into bins of 100kb. Within each bin, I examined each data pair and collated the target human regions that were associated with the dog genome bin and for each human chromosome, I computed coverage by homologous regions and filled in any gaps between regions as long as the gap was <0.25 in size of regions on either side. For a given bin and human chromosome, I created an intermediate list of the largest 5 human homologous regions. For example, here are the largest 5 regions of homologous human regions to a bin on the dog genome at cf15:11.5Mb.
cf15 11500000 11600000 100000 hs18 49966279 49971915 5637
cf15 11500000 11600000 100000 hs6 29604229 29609758 5530
cf15 11500000 11600000 100000 hs19 33858498 33863344 4847
cf15 11500000 11600000 100000 hs6_cox_hap1 952557 957072 4516
cf15 11500000 11600000 100000 hs1 43380571 43384085 3515

At this point I had reduced the 3,700,000 data pairs to just over 42,700. I chose a bin size of 100kb, since such a bin would cover about 0.07 seconds of arc if the entire dog genome was represented along half of the circle. Thus, if the circle image had a radius of about 8,000 pixels, a 100 kb bin would occupy one pixel. This seemed like the right ball-park for the bin size, although the final figure does not significantly change if the bin size is somewhat increased.

The next step was to find the homology bundles. I did this by associating adjacent regions of similarity (adjacent on both the dog and human genomes) together. I allowed up to 500kb of gap between regions. This was done to give better illustration of bundles of homology on a larger scale.

Once the bundle structure was computed, I went back to the binned data and for each binned data pair checked which bundle it overlapped with. At this point only data pairs that overlapped with the largest bundle for the region were accepted for drawing in the figure. This limited the number of small, isolated regions of homology within larger runs of regions that linked the same dog-human regions. Links corresponding to regions belonging to smaller bundles were drawn behind (and in lighter grey tone) links associated with larger bundles.

Cover Image

<a href='http://mkweb.bcgsc.ca/circos/images/amsci/amsci-cover-large.png'>zoom</a> | Final American Scientist cover image. Regions of similarity between human (top, blue [A]) and dog (bottom, orange [C]) chromosomes. One dimensional similarity mapping between human [B] and dog [D] chromosomes. This mapping provides the chromosome color coding associated with grey ribbons [F]. These grey ribbons are composed of binned homology regions that fall in the same bundle (see above). The level of grey is proportional to the size of the homologous regions. Homology on chromosome 15 is highlighted with colored ribbons [E]. Ribbons that twist such as [F2] indicate inversions, whereas those that don't [F1] indicate regions of homology on the same strand.
Figure | zoom | Final American Scientist cover image. Regions of similarity between human (top, blue [A]) and dog (bottom, orange [C]) chromosomes. One dimensional similarity mapping between human [B] and dog [D] chromosomes. This mapping provides the chromosome color coding associated with grey ribbons [F]. These grey ribbons are composed of binned homology regions that fall in the same bundle (see above). The level of grey is proportional to the size of the homologous regions. Homology on chromosome 15 is highlighted with colored ribbons [E]. Ribbons that twist such as [F2] indicate inversions, whereas those that don't [F1] indicate regions of homology on the same strand.

The final image is shown on the right. For the cover image, it was decided to selected a subset of dog and human chromosomes to limit the visual complexity of the figure.

Starting with dog chromosomes 1,2,3,4 (largest - seemed like a good start) and 15 (Elaine's favourite), I began constructing the figure by adding human chromosomes that formed ribbon connections. By adding more dog and human chromosomes to the set, I obtained a figure in which the ribbons provide near total coverage of all the chromosomes around the circle. As an added bonus, the selected dog chromosomes occupy about 1/2 of the circle (the length scale is the same for each ideogram in the circle).

The circular composition of the ideograms allows for rapid exploration of the data, at least on a large scale such as this. Notice that large-scale inversions are easy to spot because the ribbons appear to twist (ribons like F2). Ribons that connect dog and human chromosomes without twisting F1 indicate similarity between the same strands.

Furthermore, existing color schemes can be easily integrated into Circos' approach to visualizing data. The color scheme used here is the standard chromosome color palate that relates a fixed color to each chromosome for consistent display. Browsers that are founded on a linear representation use this color scheme to indicate the mapping between two regions. With a circular representation the mapping is made explicit by lines, or bundles of lines.

Circular composion can be expanded to the third dimension. Here a gimbal of two genomes is shown.
Figure | Circular composion can be expanded to the third dimension. Here a gimbal of two genomes is shown.

The intrepid reader might at this point imagine a figure drawn in 3-dimensions in which each of the two genomes occupy a full circle. For example, the dog genome could be arranged alone one great circle, and the human long another, as to meet at right angles at the two points of intersection. In a globe analogy, the dog genome might be the equator and the human genome a line of longitude. By rotating the genomes around the formed sphere, different relationships could be illustrated.

Extended Cover Image

<a href='http://mkweb.bcgsc.ca/circos/images/amsci/amsci-cover-complex-large.png'>zoom</a> | Cover image with all chromosomes and additional data elements, showing conservation, breed and morphology QTL data.
Figure | zoom | Cover image with all chromosomes and additional data elements, showing conservation, breed and morphology QTL data.

The balance of complexity and visual appeal seemed right in the figure above, and thus this figure was accepted as a cover image. However, my natural instinct was to throw caution to the wind and extend the figure to include all human and dog chromosomes (human Y was removed). This image is shown on the right.

Not satisfied, I added conservation information [G] as well as dog breed marker data [H] (courtesy of Heidi Parker, see PMID 15155949) and morphology QTL [I] (courtesy of Kevin Chase, see PMID 9987902 and MPID 16934357 and PMID 12114542).

The conservation data is shown as two histograms. The blue histogram shows the degree of conservation over bins of 3Mb between dog and human for a given region of the human chromosome. The orange histogram indicates average conservation between human and other vertebrate species. The bins across which conservation was computed are very large (3Mb), and much detail is lost. However, the data track illustrates effective integration of standard plot types (here the histogram), into a Circos image.

Breed Marker and Morphology QTL Data

<a href='http://mkweb.bcgsc.ca/circos/images/amsci/amsci-cover-complex-large.png'>zoom</a> | Breed cluster data [H] and morphology QTL location [I] are shown as data tracks around the dog genome. Each QTL was associated with a principal component, encoded by the level of grey in [I] glyphs. Format of breed data is described below.
Figure | zoom | Breed cluster data [H] and morphology QTL location [I] are shown as data tracks around the dog genome. Each QTL was associated with a principal component, encoded by the level of grey in [I] glyphs. Format of breed data is described below.

In the Science publication by Parker et al, data is presented which groups 85 domestic dog breeds into four clusters of breeds. The author graciously shared her data with me to include in the figure.

The figure shows the result of clustering sequence information derived from markers (a,b,c,...) across different breeds. The clustering resulted in four distinct groups (yellow, blue, green, red - same color scheme as in the Parker et al publication). Each marker (a,b,c,...) was associated with multiple alleles and each allele had a breed cluster frequence (f1...f4). For a given marker (e.g. e), I separated the alleles which had largest frequency component for the yellow breed cluster (A, ancient breeds, e.g. eA1 eA2 eA3 eA4), from those that had the largest component for the blue cluster (B, bulldog/mastiff types, eB1 eB2 eB3), green cluster (C, wolfhound/collie types, eC1 eC2) and red cluster (D, terriers, eD1). Each allele is represented by a stack of rectangles which represent the frequencies of that allele in each breed cluster. The rectangles are ordered by frequency (e.g. allele fC1 has a large green cluster frequency f1 followed by a smaller blue cluster frequency f2, yellow f3 and finally red f4).

<a href='http://mkweb.bcgsc.ca/circos/images/amsci/amsci-cover-complex-large.png'>zoom</a> | Organization of breed marker data. Markers (a,b,c,d) contain multiple alleles, which are grouped by breed cluster frequency (A,B,C,D). For a given allele (e.g. eA1), frequency in each breed cluster is shown by a stacked rectangles. The rectangle color encodes the breed cluster.
Figure | zoom | Organization of breed marker data. Markers (a,b,c,d) contain multiple alleles, which are grouped by breed cluster frequency (A,B,C,D). For a given allele (e.g. eA1), frequency in each breed cluster is shown by a stacked rectangles. The rectangle color encodes the breed cluster.

The morphological variation of dogs is one of their most curious qualities. Who can resist chuckling at the notion of a chiwawa riding on a great dane. Regions of the dog genome have been associated with morphological traits (limb or skeleton size, for example). These regions affect morphology in a complex way, but can be grouped into four principal components, as described in the Chase et al publication. The authors kindly shared with me the published QTL locations and I encoded them in the figure as ticks near the dog ideograms. The ticks associate the QTL with a principal component (PC1 very dark grey, PC2 dark grey, PC3 grey, PC4 light grey).

dog vs human - one chromosome at a time

Click on the image to obtain a larger version (800 x 800 px). Click on zoom to obtain a very high resolution version.

single dog chromosomes vs human genome

Below you'll find one image for each of the dog chromosomes, showing regions of homology to human genome. The scale here is finer than in the cover image, since only one dog chromosome is shown at a time. The length scale in each image is adjusted so that the dog chromosome occupies half of the circle.

cf1 zoom cf2 zoom cf3 zoom cf4 zoom cf5 zoom cf6 zoom cf7 zoom cf8 zoom cf9 zoom cf10 zoom cf11 zoom cf12 zoom cf13 zoom cf14 zoom cf15 zoom cf16 zoom cf17 zoom cf18 zoom cf19 zoom cf20 zoom cf21 zoom cf22 zoom cf23 zoom cf24 zoom cf25 zoom cf26 zoom cf27 zoom cf28 zoom cf29 zoom cf30 zoom cf31 zoom cf32 zoom cf33 zoom cf34 zoom cf35 zoom cf36 zoom cf37 zoom cf38 zoom cfX zoom


dog genome vs human genome - by dog chromosome

The dog genome is smaller than the human (2.4 Gb vs 3.1 Gb) and is scaled by a factor of 1.2 in the image to subtend half of the circle.

cf1 zoom cf2 zoom cf3 zoom cf4 zoom cf5 zoom cf6 zoom cf7 zoom cf8 zoom cf9 zoom cf10 zoom cf11 zoom cf12 zoom cf13 zoom cf14 zoom cf15 zoom cf16 zoom cf17 zoom cf18 zoom cf19 zoom cf20 zoom cf21 zoom cf22 zoom cf23 zoom cf24 zoom cf25 zoom cf26 zoom cf27 zoom cf28 zoom cf29 zoom cf30 zoom cf31 zoom cf32 zoom cf33 zoom cf34 zoom cf35 zoom cf36 zoom cf37 zoom cf38 zoom cfX zoom


dog genome vs human genome - by human chromosome

The dog genome is smaller than the human (2.4 Gb vs 3.1 Gb) and is scaled by a factor of 1.2 in the image to subtend half of the circle.

hs1 zoom hs2 zoom hs3 zoom hs4 zoom hs5 zoom hs6 zoom hs7 zoom hs8 zoom hs9 zoom hs10 zoom hs11 zoom hs12 zoom hs13 zoom hs14 zoom hs15 zoom hs16 zoom hs17 zoom hs18 zoom hs19 zoom hs20 zoom hs21 zoom hs22 zoom hsX zoom


afterthought

I have a dog and her name is Bernie. She cares very little about Circos and prefers snacks and petting.