2024 π Daylatest newsbuy art
Safe, fallen down this way, I want to be just what I am.Cocteau Twinssafe at lastmore quotes
very clickable
data + munging

The Perl Journal

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

The Perl Journal
#6
Summer 1997
vol 2
num 2
Perl-fect Sundials
Build your own sundials, accurate to the minute.
Just the FAQs: Sorting and Hashes
TPJ's first ever tutorial teaches you the intricacies of sorting.
CGI: Rating Web Page Tastefulness
NPH scripts and creating Web robots with LWP.
The AutoLoader
How to load parts of Perl modules on demand.
Perl News
What's new in the Perl community.
Perl/Tk: Another Wild Widget Tour
Balloons, file selectors, pulldown and popup menus...
1st Annual Obfuscated Perl Contest
I suppose you thought this was obfuscated?
Frossie Economou (1997) Just the FAQs: Sorting and Hashes. The Perl Journal, vol 2(2), issue #6, Summer 1997.

Just the FAQs: Sorting and Hashes

TPJ's first ever tutorial teaches you the intricacies of sorting.

Frossie Economou


Comparators

"Shall I compare thee to a summer's day? Thou art more lovely and more temperate..."
                    William Shakespeare, Sonnet XVIII

"'I'd like to know if I could compare you to a summer's day. Because well, June 12th was quite nice, and...'"
                   Terry Pratchett, Wyrd Sisters

Perl has two types of comparators, so called because they compare something to something else. Half of them compare one number to another, and the other half compare strings. Don't mix them! The numeric comparators are made of symbols, and the string comparators are made of letters:

    Number           String          Meaning
	
    ==                  eq           equal
    !=                  ne           not equal
    <                   lt           less than
    >                   gt           greater than
    <=                  le           less or equal
    >=                  ge           greater or equal
    <=>                 cmp          compare

Unless your eyesight is so bad that you picked the Perl Journal because it was shelved next to Playboy, you've seen them all before, with the possible exception of <=> and cmp.

Here's how those two work. Suppose you were interviewing a politician. Instead of saying:

Q. Will you raise taxes? ($newtax > $oldtax)
A. No.

Q. So you will lower taxes? ($newtax < $oldtax)
A. No.

Q. Ah, you'll keep them the same? ($newtax == $oldtax)

you could have saved breath by asking

Q.Will you raise/lower/maintain taxes? ($newtax <=> $oldtax)

The expression $foo <=> $bar returns -1, 0, or 1 depending on whether $foo is smaller, equal to, or greater than $bar.

*> Common pitfall: using <=> when you mean cmp!

Sorting

Comparators are often used to help sort a lit of numbers or strings. Perl's sort() function expects its first parameter to be something that behaves just like <=> and cmp:

sort   SUBROUTINE_OR_BLOCK    LIST

sort() calls the subroutine (or invokes the block) repeatedly with two values that are always stored in the special variables $a and $b. The subroutine's job is to answer the question "Which is higher, or are they the same?" and expects an answer of 1 if $a is higher, -1 if $b is higher, or 0 if they're equal.

The simplest way to sort

So the simplest possible sort, ascending numerical order, is

sort {$a <=> $b} 4,2,5

which returns the list 2,4,5.

You can get descending order by swapping $a and $b:

sort {$b <=> $a} 4,2,5

yielding 5,4,2. It's the same with strings - suppose you want to sort a list of Formula One(Like Indy Racing, only better.) drivers alphabetically:

@unsorted = qw/Schumacher Hill Alesi Villeneuve Coulthard/;

@sorted = sort {$a cmp $b} @unsorted;

Unless you've forgotten your ABCs, the result shouldn't surprise you: "Alesi", "Coulthard", "Hill", "Schumacher", "Villeneuve".

We could have written that even more succinctly:

@sorted = sort @unsorted;

since in the absence of a subroutine, sort() defaults to string comparison.

*> Common Pitfall: Forgetting that cmp uses ASCII order, which ranks lowercase letters higher than uppercase letters. If you want true alphabetical order, use {lc($a) cmp lc($b)} to sort your list.(Unfortunately there aren't any F1 drivers called van Something or d'Other this season so you’ll have to visualize your own example.)

Tinkering with the sort

Let's rewrite the above example using a named subroutine:

   @sorted = sort honestly @unsorted;

   sub honestly {
       $a cmp $b;
   }

If you can't be honest, be creative. Suppose you want to sort the list in cmp's ASCII order, but want Damon Hill to always come first, because let's face it, he's the best.(For those following the 1997 season - it's the car's fault, okay?) Now, sort() doesn't care what the subroutine does, so long as it returns -1, 0, or 1. So what we'll do is replace honestly() with a subroutine that checks if "Hill" is one of its arguments, and fudges the result if so.

   @sorted = sort fraudulently @unsorted;

   sub fraudulently { 
       return -1 if $a eq 'Hill'; 
       return  1 if $b eq 'Hill'; 
       $a cmp $b; 
   }

which produces the order "Hill", "Alesi", "Coulthard", "Schumacher", "Villeneuve".

*> Common Pitfall: Thinking all you need is the if $a case. Try the above without the if $b line and see what happens!

Sorting associative arrays (hashes)

You can't. Go home.

Seriously, think about what you want to do. Hashes simply map a set of scalars (the keys) to another set of scalars (the values); they have no beginning and no end, so they can't be sorted. So you can't sort a hash - but you can certainly sort its keys and values.

In keeping with our theme,(I promise to pick something else for next issue!) let's make a hash of some Formula One teams (keys) and the companies that provide their car engines (values). In Formula One parlance, these combinations of racing team and engine are called constructors, and every season there is a constructor championship in parallel with the driver championship. For example, the 1996 season was won by Williams-Renault because the most points were won by cars built by the Williams team using the Renault engine.

   %constructors = ( 	
       "Williams"  => "Renault", 	
       "McLaren"   => "Mercedes", 
       	"Benetton" => "Renault", 
       	"Ferrari"  => "Ferrari", 
       	"Arrows"   => "Yamaha", 
   	);

If you haven't used hashes much, you might be wondering whether it makes any difference what's the key and what's the value. Keys must be unique; that's why it would have been dead silly to use the engines as the keys and the teams as the values. You can't have two racing teams of the same name any more than you can have two NFL teams called "Dallas Cowboys." But two football teams could play in the same stadium, just as two racing teams can and do buy engines from the same manufacturer. As you can see, both the Williams and the Benetton teams buy their engines from Renault.

*> Common Pitfall: Unintentionally indexing hashes by non-unique keys. This results in data being lost.

Sorting a hash by key. Perl, being the helpful creature that it is, provides you with a function called keys() that returns a list of hash keys. They can then be sorted with the techniques we discussed above.

So, to produce a listing of constructors in team order:

@teams = sort {$a cmp $b} keys %constructors;

foreach $team (@teams) { 
    print $team,"-",$constructors{$team},"\n";
}

This displays:

Arrows-Yamaha 
Benetton-Renault 
Ferrari-Ferrari 
McLaren-Mercedes 
Williams-Renault

Don't be afraid to take advantage of the shortcuts Perl offers. You already know that:

  • sort() defaults to a string comparison
  • keys() returns a list

So we can fold the above sort() statement into the foreach loop:

foreach $team (sort keys %constructors) { 
    print $team,"-",$constructors{$team},"\n"; }

It's not complicated at all - in fact it's much easier to verbalize ("for each $team in the list of sorted constructor keys") than the first, more verbose, example: ("Let @teams be the list sorted by string comparison of constructor keys; for each $team in @teams").

Sorting a hash by value. keys() has a brother, values(), that returns a list of all the hash values. Now of course you can sort the values just as you sorted the keys, but will that help you at all in sorting your hash? No, because the hash is indexed by key, not by value, so you still need to arrange the keys first. But you can organize those any way you want: by value, by size, or by phase of the moon if that's your pleasure.

So whereas before we sorted the keys by key (that is, the teams by team name) with

@teams = sort {$a cmp $b} keys %constructors;

we now want to sort the keys by their corresponding values (that is, the teams by engine).

@teams = sort 
             { $constructors{$a} cmp $constructors{$b} } 
             keys %constructors;

which, followed again with

foreach $team (@teams) { 
    print $team,"-",$constructors{$team},"\n";
}

prints

Ferrari-Ferrari 
McLaren-Mercedes 
Williams-Renault 
Benetton-Renault 
Arrows-Yamaha

which is exactly what we wanted: A list of constructors sorted by engine.

Sorting a hash by key and value. If you made it to this point, you probably know enough to fulfill all your sorting needs. To demonstrate, here's a more sophisticated example that can be easily decomposed using what we know so far.

Suppose you wanted to sort the hash (constructors) by value (engine) as above but, knowing that the value may not always be unique, want to sort by key (team) in the case of identical engines. In other words, we want a primary sort by engine and a secondary sort by team name.

Here are the four steps:

  • We first need to get the keys (teams) sorted in the order we want:

@teams = sort myway keys %constructors;

  • We can sort them using any subroutine we like:

   sub myway { 
       # are the engines different? 
       if ($constructors{$a} ne $constructors{$b}) {

           # engines different - compare engines 
           $constructors{$a} cmp $constructors{$b}; 
       } else { 

           # engines same - compare teams 
           $a cmp $b; 
       } 
   }

  • We display our teams using the using the sorted keys:

foreach $team (@teams) { 
    print $team,"-",$constructors{$team},"\n";
}

  • We glow with satisfaction when the output arrives as desired:

Ferrari-Ferrari 
McLaren-Mercedes 
Benetton-Renault 
Williams-Renault 
Arrows-Yamaha

Sorting and efficiency

Once you've figured out how to do something, the next step is often figuring out how to do it fast. If you have a complicated sorting operation, consider doing as much work as possible before the sort, rather than embedding the work in the sort subroutine that gets called over and over and over. This occasionally requires some lateral thinking, but that's where all the fun is anyway, right?

In our first example we had these drivers:

@unsorted = qw/Schumacher Hill Alesi Villeneuve Coulthard/;

and we were trying to fudge the order so that Hill came first:

   sub fraudulently { 
       return -1 if $a eq 'Hill'; 
       return  1 if $b eq 'Hill'; 
       $a cmp $b; 
   }

which puts the computational burden of cheating on the sort() subroutine. But is there a more efficient way to make sure Hill stays on top? Well, since the null string ("") is always going to be first alphabetically, we could create another array that contains every driver's name - except for Hill, who'll have a null string in his place.

 
# copy the array
@idx = @unsorted;   

# replace Hill with the null string in the array
foreach (@idx) { $_ = "" if $_ eq "Hill" }

We've now created the array @idx with 5 elements whose values are "Schumacher", "", "Alesi", "Villeneuve", "Coulthard". So what we want is to

  • sort this array (@idx)
  • order the original array (@sorted) in the same order

We can do this all in one statement, shown below.

@sorted = @unsorted[ 
                    sort { $idx[$a] cmp $idx[$b] } 
                         0 .. $#idx 
                   ];

Done! Missed it? Here's the slow motion replay:

0 .. $#idx is 0,1,2,3,4 because that's how many elements we have. But remember

$idx[0] is "Schumacher" 
$idx[1] is "" 
$idx[2] is "Alesi" 
$idx[3] is "Villeneuve" 
$idx[4] is "Coulthard"

So our sort expression,

                     sort { $idx[$a] cmp $idx[$b] } 
                          0 .. $#idx
takes 0,1,2,3,4 and returns whatever order is specified by an alphabetical sorting of "Schumacher", "", "Alesi", "Villeneuve", "Coulthard" - which makes "" end on top.

But this isn't quite what we want - who's this "" bloke ? We want the original unsorted list in the same order, which we can obtain via an array slice:

@unsorted[1,2,4,0,3]. 
Got it? 
@sorted = @unsorted[1,2,4,0,3] 
i.e. 
@sorted = @unsorted[ 
                   sort { $idx[$a] cmp $idx[$b] }
                   0,1,2,3,4 
                   ]

i.e.

@sorted = @unsorted[ 
                    sort { $idx[$a] cmp $idx[$b] } 
                    0 .. $#idx 
                   ];

End of replay.

Although that line might look complicated on first inspection, all it says is,"Give me the elements of @unsorted in the order specified by comparing their indexed counterparts." A similar example can be found, not coincidentally, in the Perl FAQ. For further reading:

  • The description for sort() in Programming Perl, or the online perlfunc documentation.
  • "How do I sort an array by (anything)" entry of the Perl FAQ.
  • The "Advanced Sorting" section in Chapter 15 of Learning Perl.
  • For up-to-date information on the current Formula One season check out https://www.monaco.mc/f1/.


Frossie Economou used to think Formula One was boring and programming in Fortran wasn't.
Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentreBC Cancer Research CenterBC CancerPHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.151 }