The distinctive Perl camel is (c) O'Reilly
Perl Workshop Home Page
Home of the Bioinformatics Perl Workshop perl workshop > courses > introduction to perl (1.0.1.8) > Hashes and Sorting (.4/8) > sequences (.c1)

course 1.0.1.8

Level: beginner
1.0.1.8.4
Introduction to hashes: keys, values, exist; introduction to sorting and shuffling.

legend

course code

cat.course.level.sessions.session

e.g. 1.0.1.8

categories

0 | introduction and orientation

1 | perl fundamentals

2 | shell and prompt tools

3 | web development

4 | CPAN Modules

5 | Ruby

levels

level: all all ( 0 )

level: beginner beginner ( 1 )

level: intermediate intermediate ( 2 )

level: advanced advanced ( 3 )

[ Perl makes a perfect low-calorie meal or snack ]

lecture code viewer

downloads

Code
Hashes and Sorting
Hashes and Sorting
Martin Krzywinski
#!/usr/local/bin/perl $\= "\n"; use Data::Dumper; # this is the list from which we will draw the random base pair - add more if you like :) @bp = qw(a t g c); # explicitly initialize the list of sequences @sequences = (); for (1..1000) { # set the sequence to an empty string $seq = ""; for (1..4) { # add a random base pair $seq = qq{$seq$bp[rand(@bp)]}; } push @sequences, $seq; } # this will be the hash that will store the count of each sequence %sequence_count = (); # iterate through the list of sequences, and for each sequence increment # its count, stored in the hash, by one for $seq (@sequences) { $sequence_count{$seq} = $sequence_count{$seq} + 1; } for $seq (keys %sequence_count) { print qq{sequence $seq seen $sequence_count{$seq} times}; } for $seq (keys %sequence_count) { if ($seq =~ /aaa|ccc|ggg|ttt/) { print qq(3-homo polymer sequence $seq seen $sequence_count{$seq} times); } } %bp_count = (); # method 1 – iterate across sequences, split each sequence into list of characters for $seq (@sequences) { for $bp (split("",$seq)) { $bp_count{$bp} = $bp_count{$bp} + 1; } } print Dumper(\%bp_count); %bp_count = (); for $seq (keys %sequence_count) { for $bp (split("",$seq)) { $bp_count{$bp} = $bp_count{$bp} + $sequence_count{$seq}; } } print Dumper(\%bp_count); $sum = 0; for $count (values %sequence_count) { $sum = $sum + $count; } print "averge sequence count is ",$sum / keys %sequence_count; for $seq (sort {$sequence_count{$b} <=> $sequence_count{$a}} keys %sequence_count) { print qq{sequence $seq seen $sequence_count{$seq} times}; }

4 | Hashes and Sorting | 1.0.1.8.4

1.0.1.8.4.p1 | Hashes and Sorting | Martin Krzywinski | ppt
1.0.1.8.4.c1 | sequences | Martin Krzywinski | code
1.0.1.8.4.c2 | sort | Martin Krzywinski | code
1.0.1.8.4.a1 | Hashes and Sorting | Martin Krzywinski | pdf
1.0.1.8.4.a2 | Hashes and Sorting | Martin Krzywinski | pdf
1.0.1.8.4.s1 | Hashes and Sorting | Martin Krzywinski | slides