|
After reading wombat's entry in our experimentbank.org I thought it would be an interesting diversion to colour base pairs and make sequence into a GIF thumbview. Anything as an excuse to Perl one more time. It was clear that I was going to see something - something more than coloured blocks. The human eye looks for patterns and a few emerged, bands and stripes. None of this analysis is essentially serious, since all of the visualization depends on casting the 1d sequence into a 2d object and with the second dimension arbitrary in size, patterns can just as easily appear as disappear. It turns out to be a colourful foray into data exploration. What kinds of other ways can the sequence be visually annotated? I picked the GC content, its running average, the mode basepair an a 2- and 4-lagging sum, a variant on the lag plot. Have a good time looking at the coloured dots. Maybe you'll see something?
The code is written in Perl. You will need the following modules
> tar xvfz colortiles.tgz # now edit colortiles.conf # try with random 20kb sequence (see result) > /path/to/your/perl colortiles -random 20000 -gc 0.41
|
|
C. elegans chromosome I offset=1e6 bp (download)
|
C05D11 elegans cosmid GC = 34% (download)
|
random sequence GC = 34% (download)
|