|
GENOME COVERAGE SIMULATION This script simulates the manner in which a genome would be covered by the process of mapping or sequencing. In both cases, elements much smaller than the genome size are used to successively cover parts of the genome until (ideally) every area of the genome has been sampled.
The simulation proceeds roughly as follows. The cover element length is chosen, along with the packing and element cover values. The packing value determines the length of the genome. The element cover value determines how many cover elements are sampled. Elements are placed randomly on the genome, with no consideration to the placement of the previous element. The element length is sampled according to a normal distribution, if a non-zero standard deviation is chosen. After all elements are placed, statistics are computed about contig, gap and efficiency of coverage.
S I M U L A T I O N O U T P U T
Representative genome diagram. Uncovered areas are shown in light grey . Coverage is coded by colour, with darker colours representing more overlapping coverage.
Coverage Statistics
Simulation Parameters | 50 average cover element size ( 0 standard deviation) 5000 genome elements ( 100 packing parameter) 100 cover elements chosen ( 1 desired coverage)
| Actual Coverage | 67.5 % of the genome covered | Contig Number | 39 contigs avg size 1.73 cover elements ( 86 genome elements) | Coverage Efficiency | 1627 ( 33 %) no coverage 2131 ( 43 %) 1-cover ( 63 % 1-cover efficiency) 912 ( 18 %) 2-cover 330 ( 7 %) k-cover (k>2)
| Gap Statistics | 39 gaps avg size 0.83 cover elements ( 41 genome elements) gap size distribution < 1 elements: gaps: 24 < 2 elements: gaps: 11 < 3 elements: gaps: 4
| Gap-close Shotgun Elements | size 50 genome elements 100 elements 1 x cover efficiency: 13.6 % shotgun elements closed gaps gaps covered: 15 (38.5 % gaps) |
Contig length distribution
Distribution of the length of contiguous covered areas. | 0 - 25
25 - 50
50 - 75 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
75 - 100 XXXXXXXXXXXXXX
100 - 125 XXXXXXXXXX
125 - 150 XXXXXX
150 - 175 XXXXXXXX
175 - 200 XX
|
Genome coverage
Distribution of the coverage of genome elements (basepairs). The coverage of an element is defined as the number of cover elements overlapping with the genome element. | 0 - 1 1627 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
1 - 2 2131 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2 - 3 912 XXXXXXXXXXXXXXXXX
3 - 4 284 XXXXX
4 - 5 47
5 - 6 1
|
Contig membership
Distribution of the number of cover elements in contigs. As coverage increases, contig number rises, and then falls with very high coverage. Contig membership increases with coverage. | 0 - 1 0
1 - 2 15 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2 - 3 9 XXXXXXXXXXXXXXXXXXXXXXXX
3 - 4 4 XXXXXXXXXX
4 - 5 5 XXXXXXXXXXXXX
5 - 6 3 XXXXXXXX
6 - 7 3 XXXXXXXX
|
Gap distribution
Distribution of the gaps - regions of the genome not covered by the cover elements. | 0 - 10 6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
10 - 20 7 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
20 - 30 5 XXXXXXXXXXXXXXXXXXXXXXXXXXXX
30 - 40 3 XXXXXXXXXXXXXXXXX
40 - 50 3 XXXXXXXXXXXXXXXXX
50 - 60 6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
60 - 70 3 XXXXXXXXXXXXXXXXX
70 - 80 2 XXXXXXXXXXX
80 - 90 0
90 - 100 0
100 - 110 0
110 - 120 2 XXXXXXXXXXX
120 - 130 2 XXXXXXXXXXX
|
Gap spanning
Distribution of the gap sizes closed by the anchor shot gun. This histogram bins only cases in which the anchor spanned a single gap. | 0 - 41 15 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
|
|
 |
 |
CGI script modified on 14:04:30 23-04-2009 by GSC Webmaster (server mkweb02
) |
 |
 |
|
|