A 2- or 4-day practical mini-course in Circos, command-line parsing and scripting. This material is part of the Bioinformatics and Genome Analysis course held at the Institut Pasteur Tunis.
BCGA 2018 | 1-day Circos course | Circos documentation best practices getting started | Brewer palette swatches | Color resources | Nature Methods Points of View Points of Significance
Tuesday 11 December 2018 — Day 2
9h00 - 10h30 | Lecture 1 — Drawing the human genome
11h00 - 12h30 | Lecture (practical) 2 — Downloading and drawing human genes
14h00 - 15h30 | Lecture (practical) 3 — Downloading and drawing segmental duplications
16h00 - 18h00 | Lecture (practical) 4 — Creating an image montage
drawing the human genome, karyotypes included in Circos distribution, generating random data with randomdata, heatmaps, color lists, interpolating colors with colorinterpolate, drawing human genes, drawing segmental duplications as links, creating image montages
We'll now draw some data on the human genome.
We'll be using the human genome karyotype file that is predefined in the Circos installation. These are in data/karyotype in the Circos installation. Here are some of them:
karyotype.arabidopsis.txt
karyotype.chimp.txt
karyotype.drosophila.txt
karyotype.human.txt
karyotype.mouse.txt
karyotype.oryzasativa.txt
karyotype.rat.txt
karyotype.sorghum.txt
karyotype.yeast.txt
karyotype.zeamays.txt
The chromosomes for data/karyotype/karyotype.human.txt are
chr - hs1 1 0 249250621 chr1
chr - hs2 2 0 243199373 chr2
chr - hs3 3 0 198022430 chr3
chr - hs4 4 0 191154276 chr4
chr - hs5 5 0 180915260 chr5
chr - hs6 6 0 171115067 chr6
chr - hs7 7 0 159138663 chr7
chr - hs8 8 0 146364022 chr8
chr - hs9 9 0 141213431 chr9
chr - hs10 10 0 135534747 chr10
chr - hs11 11 0 135006516 chr11
chr - hs12 12 0 133851895 chr12
chr - hs13 13 0 115169878 chr13
chr - hs14 14 0 107349540 chr14
chr - hs15 15 0 102531392 chr15
chr - hs16 16 0 90354753 chr16
chr - hs17 17 0 81195210 chr17
chr - hs18 18 0 78077248 chr18
chr - hs19 19 0 59128983 chr19
chr - hs20 20 0 63025520 chr20
chr - hs21 21 0 48129895 chr21
chr - hs22 22 0 51304566 chr22
chr - hsX x 0 155270560 chrx
chr - hsY y 0 59373566 chry
These are based on the hg38 assembly — these coordinates will vary for each assembly. You can find human karyotype files for hg16, hg17, hg18, and hg19 as well (e.g. karyotype.human.hg17.txt).
There is no difference between drawing the human genome and any other genome or sequence. Simply, the number of chromosomes is larger (e.g. as compared to yeast, for example) and they are larger.
You'll notice that the names of the chromosomes don't have a chr prefix but rather hs. This stands for Homo sapien. This is necessary when drawing chromosomes from different genomes because Circos requires that each chromosome have a unique name. Thus, chromosome 1 of human is hs1, from mouse is mm1 and for rat is rn1.
The human chromosome colors are named after the chromosome, such as chr1 (there is also a color named hs1 which has the same definition). These colors are taken after the chromosome color scheme used by the UCSC genome browser.
There are also luminance-normalized versions of these colors. See etc/colors.ucsc.conf in the Circos installation. For example, for chromosome 15 we have
chr15 = 102,153,255
lum70chr15 = 121,169,255
lum80chr15 = 152,196,255
lum90chr15 = 182,224,255
And the synonyms for these colors using hs prefix
hs15 = chr15
lum70hs15 = lum70chr15
lum80hs15 = lum80chr15
lum90hs15 = lum90chr15
I've generated some random data on the human genome in bins of 10 Mb. This was done using the generate.random.data.sh script which in turn uses the randomdata Circos tool.
> head random.1.txt
hs1 0 9999999 0.0367
hs1 10000000 19999999 -0.0802
hs1 20000000 29999999 -1.3039
hs1 30000000 39999999 -0.4919
hs1 40000000 49999999 0.3342
hs1 50000000 59999999 0.3232
hs1 60000000 69999999 -1.3592
hs1 70000000 79999999 0.5711
hs1 80000000 89999999 0.2533
hs1 90000000 99999999 0.1329
These tracks are plotted as histograms.
You can just as easily draw them as a scatter plot or line plot.
http://www.circos.ca/documentation/tutorials/2d_tracks/scatter_plots/
http://www.circos.ca/documentation/tutorials/2d_tracks/line_plots/
#!/bin/bash
CTOOLS=/home/martink/work/circos/svn/tools/
# Generate some random data on the human chromosome.
#
# To list predefined rules that define data position and value, see
#
# $CTOOLS/randomdata/bin/randomdata -listrules
#
# To understand how the rules work, see the script's manpage
#
# $CTOOLS/randomdata/bin/randomdata -man
# Make 5 random data sets based on default rule (10 mb bins, standard normal distribution).
for i in `seq 1 5` ; do
$CTOOLS/randomdata/bin/randomdata -karyotype karyotype.human.txt -ruleset default > random.$i.txt
done
# Convince yourself that the values are indeed standard normal (mean 0, sd 1)
cat random.*.txt| cut -d " " -f 4 | ../../../scripts/histogram > histogram.txt
-3.4376> 0 0.000
-3.4376 -3.0932 2 0.001 0.001
-3.0932 -2.7488 3 0.002 0.003
-2.7488 -2.4044 12 0.007 0.011 *
-2.4044 -2.0600 14 0.009 0.019 *
-2.0600 -1.7155 31 0.019 0.039 ***
-1.7155 -1.3711 71 0.044 0.083 *******
-1.3711 -1.0267 109 0.068 0.150 ***********
-1.0267 -0.6823 136 0.085 0.235 **************
-0.6823 -0.3379 183 0.114 0.349 *******************
-0.3379 0.0065 233 0.145 0.493 *************************
0.0065 0.3509 232 0.144 0.638 ************************
0.3509 0.6953 192 0.119 0.757 ********************
0.6953 1.0397 153 0.095 0.852 ****************
1.0397 1.3841 104 0.065 0.917 ***********
1.3841 1.7286 53 0.033 0.950 *****
1.7286 2.0730 40 0.025 0.975 ****
2.0730 2.4174 27 0.017 0.991 **
2.4174 2.7618 8 0.005 0.996
2.7618 3.1062 5 0.003 0.999
3.1062 3.4506 1 0.001 1.000
3.4506< 0 0.000
n 1610
average 0.01853
sd 1.00523
min -3.43760
max 3.45060
sum 29.84100
Circos comes with several karyotype files for common genomes, such as human, mouse, rat and so on. These can be found in data/karyotype directory of the Circos installation. In general, when you reference a file and Circos cannot find it relative to your current directory, it will look in its installation directory.
karyotype = data/karyotype/karyotype.human.txt
Define a parameter we'll be using later. This parameter can be recalled using conf(width). There are some parameters that you must define to draw an image (e.g. karyotype), you're free to define any others as well for your use later in the file via conf().
width = 0.08
<plots>
These parameters will be inherited by all <plot> blocks and possibly overwritten
type = histogram
fill_color = black
stroke_thickness = 0
min = -2 # clip to [-2,2]
max = 2 #
To make the tracks a scatter plot, uncomment the lines below type = scatter color = black glyph_size = 3 To make the tracks a line plot, uncomment the lines below You can comment out the fill_color above to turn off the fill under the line type = line color = black stroke_thickness = 1
<plot>
file = random.1.txt
Set a parameter for this track...
r = 0.90
...and calculate the start r0 and end r1 of the track based on its value. We're reference the value of r in this block using conf(.,r) and then the value of width at the root of the configuration using conf(width). We're then formatting the string to have a suffix r to indicate relative positioning. To see how this works, dump the plots configuration block after all the interpolations have been made
circos -cdump plots
r1 = eval(sprintf("%fr",conf(.,r)+conf(width)))
r0 = eval(sprintf("%fr",conf(.,r)))
Add blue/red background. This definition is reused in other tracks, so it's easier to define it in a separate file and include it here
<<include background.conf>>
</plot>
<plot>
file = random.2.txt
r = 0.80
The r0 and r1 definitions above are used by other tracks. We can store them in a file and use include.
<<include r0r1.conf>>
Add blue/red background. This definition is reused in other tracks, so it's easier to define it in a separate file and include it here
<<include background.conf>>
</plot>
See how terse a track definition can be? Always try to inherit parameters and include things from files.
<plot>
file = random.3.txt
r = 0.70
<<include r0r1.conf>>
<<include background.conf>>
</plot>
<plot>
show = no # remove this or set to yes to show the track
file = random.4.txt
r = 0.60
<<include r0r1.conf>>
<<include background.conf>>
</plot>
<plot>
show = no
file = random.5.txt
r = 0.50
<<include r0r1.conf>>
<<include background.conf>>
</plot>
</plots>
Heatmaps are a great way to show data in small spaces. They're also a good way to review how Circos uses colors and get into more details about color definitions.
http://www.circos.ca/documentation/tutorials/2d_tracks/heat_maps/
The heatmap will automatically map a value onto a color, as described on the page above. There is a variety of settings that control this and you should familiarize yourself with them.
The key definition is color which is now a list as opposed to a single color. Thus, for a heatmap if we want to map value onto the 9-color spectral brewer palette, use
type = heatmap
color = spectral-9-div
To see how the list works, dump the <colors> block in the configuration and grep out spectral-9-div.
> circos -cdump colors | grep spectral-9-div
'spectral-9' => 'spectral-9-div-(\\d+)',
'spectral-9-div' => 'spectral-9-div-(\\d+)',
'spectral-9-div-1' => '213,62,79',
'spectral-9-div-2' => '244,109,67',
'spectral-9-div-3' => '253,174,97',
'spectral-9-div-4' => '254,224,139',
'spectral-9-div-5' => '255,255,191',
'spectral-9-div-6' => '230,245,152',
'spectral-9-div-7' => '171,221,164',
'spectral-9-div-8' => '102,194,165',
'spectral-9-div-9' => '50,136,189',
'spectral-9-div-rev' => 'rev(spectral-9-div-(\\d+))',
'spectral-9-rev' => 'rev(spectral-9-div-(\\d+))',
You'll see that individual colors are ones with RGB values and these are named spectral-9-div-N. The definition named spectral-9-div is a regular expression — this is how Circos defines lists, which are composed of all the colors that match the regular expression. A synonym for the list is spectral-9.
The reverse of the list is also available as spectral-9-div-rev or spectral-9-rev
Heatmaps are pretty cool in that you can define your own color list to map the values onto. For example,
color = black,dred,red,blue,dblue,black
will use these 6 colors. The exact way in which the values are divided up into color bands is described at
http://www.circos.ca/documentation/tutorials/2d_tracks/heat_maps/images
You can create your own color ramps with an online tool like
http://davidjohnstone.net/pages/lch-lab-colour-gradient-picker
or with the Circos colorinterpolate tool. I've included an updated version of this tool in this directory, along with its configuration file colorinterpolate.conf.
For example, let's create a 10-color ramp between red and blue
> ./colorinterpolate -steps 9 -start 255,0,0 -end 0,0,255 -rootname mycolorlab | tee etc/mycolors.lab.conf
mycolorlab000 = 255,0,0 # rgb 255,0,0 rgbhex FF0000 idx 0 deltaE 0.00 lab(53.2,80,67.2)
mycolorlab001 = 245,0,45 # rgb 245,0,45 rgbhex F5002D idx 1 deltaE 19.59 lab(51,80,47.8)
mycolorlab002 = 235,0,73 # rgb 235,0,73 rgbhex EB0049 idx 2 deltaE 19.59 lab(48.6,79.8,28.2)
mycolorlab003 = 223,0,99 # rgb 223,0,99 rgbhex DF0063 idx 3 deltaE 19.59 lab(46.2,79.8,8.8)
mycolorlab004 = 209,0,124 # rgb 209,0,124 rgbhex D1007C idx 4 deltaE 19.59 lab(44,79.6,-10.6)
mycolorlab005 = 193,0,149 # rgb 193,0,149 rgbhex C10095 idx 5 deltaE 19.59 lab(41.6,79.6,-30)
mycolorlab006 = 173,0,175 # rgb 173,0,175 rgbhex AD00AF idx 6 deltaE 19.59 lab(39.2,79.4,-49.6)
mycolorlab007 = 147,0,201 # rgb 147,0,201 rgbhex 9300C9 idx 7 deltaE 19.59 lab(37,79.4,-69)
mycolorlab008 = 109,0,228 # rgb 109,0,228 rgbhex 6D00E4 idx 8 deltaE 19.59 lab(34.6,79.2,-88.4)
mycolorlab009 = 0,0,255 # rgb 0,0,255 rgbhex 0000FF idx 9 deltaE 19.59 lab(32.2,79.2,-107.8)
For the meaning of deltaE see the manpage
./colorinterpolate -man
As you can see from the format of the output of colorinterpolate can be used directly in Circos. The list above has been stored in etc/mycolors.lab.conf and included into the configuration like this
<colors>
<<include mycolors.lab.conf>>
</colors>
We need to wrap the <<include>> directive in a <color> block because all colors definitions are expected to be inside this block and etc/mycolors.lab.conf only includes the color = rgb definition and not the block name.
You can verify that these colors have been read in
> circos --cdump colors| grep mycolorlab
mycolorlab000 => '255,0,0',
mycolorlab001 => '245,0,45',
mycolorlab002 => '235,0,73',
mycolorlab003 => '223,0,99',
mycolorlab004 => '209,0,124',
mycolorlab005 => '193,0,149',
mycolorlab006 => '173,0,175',
mycolorlab007 => '147,0,201',
mycolorlab008 => '109,0,228',
mycolorlab009 => '0,0,255',
You can create a named color list of these colors by defining a regular expression that matches the colors, just like for the Brewer palettes above.
<colors>
<<include mycolors.lab.conf>>
mycolorlistlab = mycolorlab(\\d+)
</colors>
For implementation reasons, definitions of color names and lists are cached in the system's /tmp directory. When you redefine colors or lists you want to either delete (/tmp/circos.colorlist.*) or run Circos with
circos -color_cache_rebuild
Take a look at the make.color.ramp.sh script in this directory. It calls colorinterpolate and makes two ramps between red and blue. mycolorlab* are colors defined by interpolating between red and blue in Lab color space and colors mycolorhsv* colors were interpolated in HSV color space.
The last two heatmaps draw the same data from random.4.txt but use these two different ramps. Notice the difference between the interpolations — those done in Lab look much better and are actually perceptually uniform. The deltaE values in the HSV interplation vary and indicate that the HSV-based ramp is not perceptually uniform.
https://en.wikipedia.org/wiki/Color_difference
For fun, you can run
circos -randomcolor white,black
to randomize all color definitions except, in this case, white and black.
#!/bin/bash
START=255,0,0
END=0,0,255
STEPS=9
./colorinterpolate -steps $STEPS -start $START -end $END -rootname mycolorlab | \
tee etc/mycolors.lab.conf
./colorinterpolate -steps $STEPS -start $START -end $END -calcspace hsv -rootname mycolorhsv | \
tee etc/mycolors.hsv.conf
<colors>
<<include mycolors.lab.conf>>
<<include mycolors.hsv.conf>>
mycolorlablist = mycolorlab(\\d+)
mycolorhsvlist = mycolorhsv(\\d+)
</colors>
Circos comes with several karyotype files for common genomes, such as human, mouse, rat and so on. These can be found in data/karyotype directory of the Circos installation. In general, when you reference a file and Circos cannot find it relative to your current directory, it will look in its installation directory.
karyotype = data/karyotype/karyotype.human.txt
Define a parameter we'll be using later. This parameter can be recalled using conf(width). There are some parameters that you must define to draw an image (e.g. karyotype), you're free to define any others as well for your use later in the file via conf().
width = 0.05
<plots>
type = heatmap
color = spectral-9-div
stroke_thickness = 1 # adding a narrow white stroke to heatmaps
stroke_color = white # makes them look nicer, if the bins are big enough
<plot>
file = ../1/random.1.txt
r = 0.90
r1 = eval(sprintf("%fr",conf(.,r)+conf(width)))
r0 = eval(sprintf("%fr",conf(.,r)))
</plot>
<plot>
file = ../1/random.2.txt
r = 0.85
r1 = eval(sprintf("%fr",conf(.,r)+conf(width)))
r0 = eval(sprintf("%fr",conf(.,r)))
overwrite the value of color from the parent block to use the reverse palette (this won't be obvious from the image since the data is random and symmetric about the mean)
color = spectral-9-div-rev
</plot>
<plot>
file = ../1/random.3.txt
r = 0.80
r1 = eval(sprintf("%fr",conf(.,r)+conf(width)))
r0 = eval(sprintf("%fr",conf(.,r)))
custom color list — use a comma-separated list of any previously defined colors
color = black,dred,red,blue,dblue,black
</plot>
<plot>
file = ../1/random.4.txt
r = 0.75
r1 = eval(sprintf("%fr",conf(.,r)+conf(width)))
r0 = eval(sprintf("%fr",conf(.,r)))
color = mycolorlablist
</plot>
<plot>
file = ../1/random.4.txt
r = 0.70
r1 = eval(sprintf("%fr",conf(.,r)+conf(width)))
r0 = eval(sprintf("%fr",conf(.,r)))
color = mycolorhsvlist
</plot>
</plots>