Genome Sequence Centre - Web Pages
About the GSCGSC Contact InformationEmployment OpportunitiesIntranet PagesGo BackStart of SectionInformation on Section
Back to Public Pages

BC Cancer Agency
Back to Public Pages

GENOME COVERAGE SIMULATION This script simulates the manner in which a genome would be covered by the process of mapping or sequencing. In both cases, elements much smaller than the genome size are used to successively cover parts of the genome until (ideally) every area of the genome has been sampled.

The simulation proceeds roughly as follows. The cover element length is chosen, along with the packing and element cover values. The packing value determines the length of the genome. The element cover value determines how many cover elements are sampled. Elements are placed randomly on the genome, with no consideration to the placement of the previous element. The element length is sampled according to a normal distribution, if a non-zero standard deviation is chosen. After all elements are placed, statistics are computed about contig, gap and efficiency of coverage.

Cover element length
A cover element represents a small section of the genome placed randomly on the genome. This can be either a clone or a read. The cover element length serves as the basic unit of sampling. The larger the length, the more fine the sampling.
    average
    standard deviation (enter 0 for constant)
Element packing
The element packing determines the length of the genome. The genome length is length = (element length)*(element packing). Or in another way, if the unit of length is the cover element, the genome is element packing units long. You want to pick a sufficiently large number so that the genome is much larger than the element length ( > 1000 ). Currently, genomes cannot be larger than 1,000,000.
    packing
Element cover
The coverage determines how many elements are chosen. For a given cover, N, the number of elements will be such that the total length of the elements will cover N genomes.
    coverage
Size of anchor element in 1x gap-close shot gun
After the cover elements are picked, a 1x shotgun is performed using an anchor element.
    coverage

S I M U L A T I O N    O U T P U T

Representative genome diagram. Uncovered areas are shown in   light grey . Coverage is coded by colour, with darker colours representing more overlapping coverage.
                                                                                                                             
                                                                                                                             
                                                                                                                             
                                                                                                                             


Coverage Statistics
Simulation
Parameters
50 average cover element size ( 0 standard deviation)
5000 genome elements ( 100 packing parameter)
100 cover elements chosen ( 1 desired coverage)
Actual
Coverage
67.5 % of the genome covered
Contig
Number
39 contigs
avg size 1.73 cover elements ( 86 genome elements)
Coverage
Efficiency
1627 ( 33 %) no coverage
2131 ( 43 %) 1-cover ( 63 % 1-cover efficiency)
912 ( 18 %) 2-cover
330 ( 7 %) k-cover (k>2)
Gap
Statistics
39 gaps
avg size 0.83 cover elements ( 41 genome elements)
gap size distribution
< 1 elements: gaps: 24
< 2 elements: gaps: 11
< 3 elements: gaps: 4
Gap-close
Shotgun Elements
size 50 genome elements
100 elements
1 x cover
efficiency: 13.6 % shotgun elements closed gaps
gaps covered: 15 (38.5 % gaps)

Contig length distribution
Distribution of the length of contiguous covered areas.
    0 -    25 
   25 -    50 
   50 -    75 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   75 -   100 XXXXXXXXXXXXXX
  100 -   125 XXXXXXXXXX
  125 -   150 XXXXXX
  150 -   175 XXXXXXXX
  175 -   200 XX
Genome coverage
Distribution of the coverage of genome elements (basepairs). The coverage of an element is defined as the number of cover elements overlapping with the genome element.
 0 -  1   1627  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 1 -  2   2131  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 2 -  3    912  XXXXXXXXXXXXXXXXX
 3 -  4    284  XXXXX
 4 -  5     47  
 5 -  6      1  
Contig membership
Distribution of the number of cover elements in contigs. As coverage increases, contig number rises, and then falls with very high coverage. Contig membership increases with coverage.
    0 -     1      0  
    1 -     2     15  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    2 -     3      9  XXXXXXXXXXXXXXXXXXXXXXXX
    3 -     4      4  XXXXXXXXXX
    4 -     5      5  XXXXXXXXXXXXX
    5 -     6      3  XXXXXXXX
    6 -     7      3  XXXXXXXX
Gap distribution
Distribution of the gaps - regions of the genome not covered by the cover elements.
    0 -    10      6  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   10 -    20      7  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   20 -    30      5  XXXXXXXXXXXXXXXXXXXXXXXXXXXX
   30 -    40      3  XXXXXXXXXXXXXXXXX
   40 -    50      3  XXXXXXXXXXXXXXXXX
   50 -    60      6  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   60 -    70      3  XXXXXXXXXXXXXXXXX
   70 -    80      2  XXXXXXXXXXX
   80 -    90      0  
   90 -   100      0  
  100 -   110      0  
  110 -   120      2  XXXXXXXXXXX
  120 -   130      2  XXXXXXXXXXX
Gap spanning
Distribution of the gap sizes closed by the anchor shot gun. This histogram bins only cases in which the anchor spanned a single gap.
    0 -    41     15  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX



Quick Links Quick Links CGI script modified on 14:04:30 23-04-2009 by GSC Webmaster (server mkweb02 ---GSCWEB TIMING---
rating: optimal
render: 0.06

gscweb call: 0.07
)
Back to Public Pages Send an e-mail to GSC