This tutorial will take you headlong into programming Perl. Make sure that you have a recent perl installed (5.10 or newer) and that you were able to complete the "Hello World" example in Getting Started.

As you work through this tutorial, take the time to read the perldoc links (you can also use the perldoc program to lookup documentation from your console.) Several concepts are being introduced and may take some effort to grasp. Some readers may prefer to plow through the tutorial and then go back through to make sense of the details.

You should actually type the examples rather than using copy+paste. This will help you to learn and remember the feel of the Perl syntax. If you make a typo, perl will probably throw an error or warning — learning to understand perl's error messages and use them to find your typos is part of the process.

Solving a Problem

Let's say that your office has decided to keep track of how many bugs each developer fixes and each team is going to award prizes to the developer with the most bugs fixed per week. On your first day at work, your manager Sam gives you this input file and asks you to write a program to determine who wins the USB rocket launcher on your team (team 42).

  Team 7
  John: 19
  Sue: 20
  Pam: 35
  Team 42
  Jeff: 12
  Sam: 3
  Phil: 26
  Jill: 10
  Team 9
  Bill: 19
  John: 7
  Linda: 15

So, let's get started. Like every Perl program, we start with a shebang line and three lines to enable modern features, warnings, and strictures.

  1 #!/usr/bin/env perl
  2 
  3 use 5.010000;
  4 use warnings;
  5 use strict;

Then, we need to open the input file and read through it until we find the section for our team.

  7 my $team_number = 42;
  8 my $filename = 'input.txt';
  9 
 10 open(my $fh, '<', $filename) or die "cannot open '$filename' $!";
 11 
 12 my $found;
 13 while(<$fh>) {
 14   if(m/^Team (\d+)$/) {
 15     next if($1 != $team_number);
 16     $found = 1;
 17     last;
 18   }
 19 }
 20 die "cannot find 'Team $team_number'" unless($found);

This handful of code introduces several details about perl. Let's step through each part and see what is happening. On lines 7 and 8, we're only declaring some variables.

  7 my $team_number = 42;
  8 my $filename = 'input.txt';

Note that the variables have a $ sigil in front of them. We'll see more about sigils later, but perldata explains them in detail. The $ sigil is for scalar variables, which are being used here to hold simple one-element values (here, notice that the first one is a number and the second is a string.)

These variables are also being declared with the my keyword. That means that they will be lexical variables — so they can't be seen outside of their scope. Currently, our lexical scope is the entire file, but we'll see more about scope later.

 10 open(my $fh, '<', $filename) or die "cannot open '$filename' $!";

The call to open() on line 10 will populate the new variable (my $fh) with a filehandle for reading the file $filename. Note that the '<' argument means that we want to read the file rather than writing it. If open() returns false, something went wrong (perl sets a detailed message about this in the global $! variable) and we want to stop the program with a useful error message.

 12 my $found;
 13 while(<$fh>) {
 14   if(m/^Team (\d+)$/) {
 15     next if($1 != $team_number);
 16     $found = 1;
 17     last;
 18   }
 19 }
 20 die "cannot find 'Team $team_number'" unless($found);

In this loop, we queue-up the input to the section with our team number. The <> operator around $fh is just a shorthand for readline($fh). Because <$fh> will return false at end-of-file, the while(<$fh>) loop runs until we hit the end of the file.

Note that we are not using a loop variable here because while(EXPR) assigns the result of EXPR to $_ by default. The special topic variable $_ (pronounced "it") means "you know, the thing we've been discussing". Perl lets you use this conversational shorthand, so you do not have to mention $the_thing on every line in a short loop. So, there's a lot of shorthand here versus:

 13 while($_ = readline($fh)) {

Caution: saying "it" too many times in a paragraph is as confusing in code as it is in prose.

Now, let's examine the next line.

 14   if(m/^Team (\d+)$/) {

Note that the m// operator is being called without a $variable =~ in front of it. This means it assumes you want to match against $_, or "the current thing". So, this really means:

 14   if($_ =~ m/^Team (\d+)$/) {

The m/.../ is a regular expression match, which is a powerful part of Perl's text-processing ability. This regular expression also has a capture in it (everything in the parentheses) so we can easily pull the team number off of the end. The \d is regexp-speak for "digits", and the + is a "one-or-more" modifier. So, we're capturing "one-or-more digits" found after the literal string "Team ", which must be at the front of the line (because of the ^ anchor.) This regexp is also anchored to the end of the line ($), so it will only match lines that start with "Team " and end in some digits.

After a successful match on a regular expression with a capture, the captured string will be found in the $1 numbered variable. (If we had multiple captures, those would be in $2, $3, etc.)

 15     next if($1 != $team_number);

In line 15, we're jumping to the next loop iteration with next keyword, but only if the captured digits are numerically not equal to $team_number. Otherwise, we keep going:

 16     $found = 1;
 17     last;
 18   }
 19 }
 20 die "cannot find 'Team $team_number'" unless($found);

So, we've come to a line with the string "Team " at the front, followed by a number equal to $team_number. Thus, we set our $found flag to true, and use the last keyword to prematurely stop the while() loop.

But what happens if we never found a line "Team 42" in the file? We would reach the end-of-file state on $fh, which means readline() (a.k.a. <>) returns false, and the while() loop ends. In that case, we never set $found to a true value, so the code will die() with a helpful error message.

The unless keyword which qualifies our die() is simply the opposite of if (or the same as if(not $found) if you want to see it that way.) Notice that this code uses if both in the trailing statement modifier form on line 15 and in the leading compound statement form on line 14. The part between the matching braces ('{' and '}') in a compound statement is known as a block.

Note that $found is False because it is undefined — because we never assigned anything to it. There are a few rules to what perl considers True or False, but note that the string "False" is a True value in Perl.

But wait a second... do we really need the $found variable to check whether we fell through the end of the while() loop? Actually, we don't. This brings us to the Perl motto "There's more than one way to do it."

 12 while(<$fh>) {
 13   chomp;
 14   last if($_ eq "Team $team_number");
 15 }
 16 die "cannot find 'Team $team_number'" if(eof $fh);

In the above code, chomp removes the newline from $_ and we're using stringwise equality instead of a regular expression. After the loop, we just use the eof function to find out if $fh got to an end-of-file state before we ended the loop with last.

 14   last if($_ eq "Team $team_number");

Also note that we didn't need to do anything special to go back and forth between numbers and strings in any of the code above. The variable $team_number contains an integer 42, but we can stringwise match it against the pair of digits '4' and '2' or numerically compare the captured digits '4' and '2' to our number 42 as in the previous version:

 15     next if($1 != $team_number);

This is because perl will automatically coerce between strings and numbers depending on how you are using the variable. Perl usually does what you want with numbers, but there are some gotchas (especially if you are not using the warnings pragma, which tells you when you try to do number-ish things with a non-numeric string.)

I like this new version of the code with eof() instead of the $found flag. Less code is easier to maintain and eliminating a variable means having one less thing to think about. Besides, I just needed an excuse to mention regular expressions... So on with the programming!

  1 #!/usr/bin/env perl
  2 
  3 use 5.010000;
  4 use warnings;
  5 use strict;
  6 
  7 my $team_number = 42;
  8 my $filename = 'input.txt';
  9 
 10 open(my $fh, '<', $filename) or die "cannot open '$filename' $!";
 11 
 12 while(<$fh>) {
 13   chomp;
 14   last if($_ eq "Team $team_number");
 15 }
 16 die "cannot find 'Team $team_number'" if(eof $fh);
 17 
 18 my $max_bugs = 0;
 19 my $max_name;
 20 while(my $line = <$fh>) {
 21   chomp($line);
 22   last if($line =~ m/^Team \d+$/);
 23   my ($name, $bugs) = split(/: +/, $line, 2);
 24   die "malformed '$line'" unless(defined $bugs);
 25   if($bugs > $max_bugs) {
 26     $max_bugs = $bugs;
 27     $max_name = $name;
 28   }
 29 }
 30 
 31 say "$max_name wins the USB rocket launcher!";

Now we've found the name of the winner by looping through the rest of the file until the end or until we find a line with something that looks like the start of another team's section. We then have to parse out each person's name and how many bugs they fixed, while keeping track of who fixed the most bugs as we go.

But this looks a little different than our earlier queue-up loop. For starters, we're using the explicit variable my $line instead of $_.

 20 while(my $line = <$fh>) {
 21   chomp($line);
 22   last if($line =~ m/^Team \d+$/);

So, we have to say chomp($line) instead of just chomp, and use the =~ to apply the m// match operation. Also, we did not capture the digits in this regular expression because we don't need them for anything here.

Onward! We're now at the meat of the program, where each programmer's name and how many bugs they have are on the input line.

 23   my ($name, $bugs) = split(/: +/, $line, 2);
 24   die "malformed '$line'" unless(defined $bugs);

The split function takes a regular expression, the string to split, and an optional count. So, we're splitting $line on "a colon, plus one-or-more spaces", but "only into two parts". If we're wrong about the input or get fed some garbage, the split will miss and the second variable $bugs would be undefined (everything lands in $name instead.)

Let's assume that it is possible for $bugs to be zero in the report. Thus, we can't just say 'unless($bugs)' to check the split because '0' is false.

So now we've got a name and corresponding number of bugs...

 25   if($bugs > $max_bugs) {
 26     $max_bugs = $bugs;
 27     $max_name = $name;
 28   }
 29 }

Here, we're comparing the value of $bugs to the current value of $max_bugs (which we initialized to 0.) If the current value of $bugs is bigger, that becomes the new value for $max_bugs and we remember the current $name as $max_name.

Finally, print out the name of the winner:

 31 say "$max_name wins the USB rocket launcher!";

Here we see string interpolation, where the value of the variable $max_name is embedded in the string being output. This is one of the major benefits of Perl having sigils on variable names — we don't have to do anything special to make a variable become part of a string.

Exercises

1.  Modify the program to use a regular expression with captures instead of split to parse the "$name: $bugs" line.

2.  What happens if everybody's bugs-fixed count was zero? Modify the program to output "Nobody fixed any bugs!" when this happens.

Adding a Feature

With our current program, we can determine who won the contest for this week. Now Sam also wants the output to include a list of how many bugs each person on your team fixed, sorted alphabetically by name.

Since the input is not sorted, our program needs to remember each person's name and score, then do the sorting itself. By using a hash, we can add three lines of code to accomplish this:

  1 #!/usr/bin/env perl
  2 
  3 use 5.010000;
  4 use warnings;
  5 use strict;
  6 
  7 my $team_number = 42;
  8 my $filename = 'input.txt';
  9 
 10 open(my $fh, '<', $filename) or die "cannot open '$filename' $!";
 11 
 12 while(<$fh>) {
 13   chomp;
 14   last if($_ eq "Team $team_number");
 15 }
 16 die "cannot find 'Team $team_number'" if(eof $fh);
 17 
 18 my %bugs;
 19 my $max_bugs = 0;
 20 my $max_name;
 21 while(my $line = <$fh>) {
 22   chomp($line);
 23   last if($line =~ m/^Team \d+$/);
 24   my ($name, $bugs) = split(/: +/, $line, 2);
 25   die "malformed '$line'" unless(defined $bugs);
 26   $bugs{$name} = $bugs;
 27   if($bugs > $max_bugs) {
 28     $max_bugs = $bugs;
 29     $max_name = $name;
 30   }
 31 }
 32 
 33 say "$_: $bugs{$_}" for(sort keys %bugs);
 34 say "$max_name wins the USB rocket launcher!";

The first addition is the declaration of the %bugs hash. A hash (also known as an "associative array") allows us to store a mapping between keys and values. It's like a bunch of named variables gathered into one. When we want to store or retrieve a value from a hash, we just use the key.

 18 my %bugs;

Notice that this variable has the % sigil. This is how Perl distinguishes between hashes and scalar variables. So, %bugs and $bugs are completely different variables.

 26   $bugs{$name} = $bugs;

Here, we're using $name as the key to store the scalar $bugs as a value in the hash %bugs. We use the sigil $ on a hash whenever we're addressing a single value by its key.

Finally, we print out the sorted list of names and number of bugs they fixed.

 33 say "$_: $bugs{$_}" for(sort keys %bugs);

That's all! This code uses for in its statement modifier form, so we're looping through the output of the sort function. The input to sort is the list of keys from the %bugs hash. The $_ variable gets set by for to each of the sorted keys from the %bugs hash. So, say will output a line "keyvalue" for each sorted key ($_ is the key and therefore $bugs{$_} is the value.)

The sort function defaults to sorting alphabetically. We'll see later how to sort by different criteria.

The keys of a hash are not in any particular order. Thus, sort(keys(%some_hash)) is a fairly common idiom. But order does not always matter, so you can often process a hash with for(keys %hash). See also the values function and the each function for other ways to loop through a hash.

Exercises:

1.  What if there is a tie for most bugs fixed? How would this change the comparison and storing of $max_name?

Here is a hint: Just as we can use a hash to take the place of several variables, we can use arrays to keep track of a list of values. If we were going to have first and second prizes, we might have an array like this:

  my @prizes = ('USB rocket launcher', 'paperweight');

An array's values are retrieved by index, so first prize is $prizes[0]. You can also loop over arrays (e.g. 'for(@prizes)') and use functions like push(), pop(), and reverse() on them.

But the question is "what do you do about a tie?", not "what about second place?" (hey, I said this was a hint, not an answer.)

Another Feature

After running this bug-finding game for a few weeks, your manager begins to realize that Phil has a large collection of USB rocket launchers on his desk. Perhaps he's creating bugs just so he can fix them and win the game? So, this week Sam gives you a different input file which also shows how many bugs each person created. He wants you to modify your program to use this new information, and add a list of how many bugs each person created (sorted by numerically descending number of bugs) to the output. Also, the winner will now be determined by who has the greatest net number of closed bugs.

So, now the input file will look like this:

Team 7
John: 19, 9
Sue: 20, 19
Pam: 35, 8
Team 42
Jeff: 12, 7
Sam: 3, 0
Phil: 26, 23
Jill: 10, 6
Team 9
Bill: 19, 0
John: 7, 5
Linda: 15, 12

With what you've learned so far, you might decide that what we need is another hash. We can just change the split to get another value, and store how many bugs each person created (keyed by their name) in this new hash. Then calculate the net bugs per person when comparing to $max_bugs.

  1 #!/usr/bin/env perl
  2 
  3 use 5.010000;
  4 use warnings;
  5 use strict;
  6 
  7 my $team_number = 42;
  8 my $filename = 'input2.txt';
  9 
 10 open(my $fh, '<', $filename) or die "cannot open '$filename' $!";
 11 
 12 while(<$fh>) {
 13   chomp;
 14   last if($_ eq "Team $team_number");
 15 }
 16 die "cannot find 'Team $team_number'" if(eof $fh);
 17 
 18 my %bugs;
 19 my %created;
 20 my $max_bugs = 0;
 21 my $max_name;
 22 while(my $line = <$fh>) {
 23   chomp($line);
 24   last if($line =~ m/^Team \d+$/);
 25   my ($name, $fixed, $created) = split(/[:,] +/, $line, 3);
 26   die "malformed '$line'" unless(defined $created);
 27   $bugs{$name} = $fixed;
 28   $created{$name} = $created;
 29   my $net = $fixed - $created;
 30   if($net > $max_bugs) {
 31     $max_bugs = $net;
 32     $max_name = $name;
 33   }
 34 }
 35 
 36 say "Bugs created:";
 37 say "  $_: $created{$_}" for(
 38   sort({$created{$b} <=> $created{$a}} keys %created));
 39 
 40 say "Bugs fixed:";
 41 say "  $_: $bugs{$_}" for(sort keys %bugs);
 42 
 43 say "$max_name wins the USB rocket launcher!";

Now we're declaring one more hash:

 18 my %bugs;
 19 my %created;

And we have to change the split to get an extra value.

 25   my ($name, $fixed, $created) = split(/[:,] +/, $line, 3);
 26   die "malformed '$line'" unless(defined $created);

The [:,] in this regular expression is known as a character class, which means any of the given characters might be matched in that position. So, this regexp splits on either ':' or ',', followed by one or more spaces.

Then of course we need to store the new value:

 27   $bugs{$name} = $fixed;
 28   $created{$name} = $created;

And here is where we're calculating the net number of fixed bugs.

 29   my $net = $fixed - $created;
 30   if($net > $max_bugs) {
 31     $max_bugs = $net;
 32     $max_name = $name;
 33   }
 34 }

Now we need to add the "Bugs created" section to our output.

 36 say "Bugs created:";
 37 say "  $_: $created{$_}" for(
 38   sort({$created{$b} <=> $created{$a}} keys %created));

The sort here is being called with a block, which sort expects will return -1, 0, or 1 to tell it the sort order. The <=> operator conveniently does exactly this comparing two numbers. The sort function sets $a and $b to each pair of its input values when calling your sort block. Thus, we're retrieving the numeric values from the %created hash and comparing them in opposite order to get a descending sort. If we wanted to sort from smallest to largest, we would just reverse the sense of $a and $b.

Exercises

1.  Find two ways to make the results numerically descending while having the $a term left of the $b term in the sort block.

The Next Level

With everything you've learned so far (including how to find the documentation for the Perl builtin functions), you can already accomplish almost anything with Perl. You know how to declare variables, read input files, print output, loop over data, and use conditional statements like if to control program flow. And technically, that's more than enough to write any program — as long as you're not concerned with how long the code gets.

Imagine how difficult it would be to write this simple program without using hashes (that is: imagine that you could only use scalar variables.) You would have to find some way to store all of the different names and scores to be able to refer back to them during the sorting. It's possible, but you would have to write a lot of low-level code to fiddle with data storage when a simple hash to store and fetch by key is so much easier.

So, hashes give us a convenient shorthand to what would otherwise be a lot of repeated code to store and retrieve values. This reduces the complexity of our program by tucking the details away into a tidy, convenient feature of the language. These next three concepts are similarly powerful in terms of taming the complexity of your program.

Variable Scope

We've been using the my keyword to declare our variables. This means that the variables are only visible from within the enclosing file or block. Recall that a block is the part between the matching braces ('{' and '}'.)

So, for example, the $name variable inside our loop block is not visible in any code after the closing brace. Because we start with use strict, perl will tell us when we try to use a variable name which has not been declared in our current scope:

  $ perl -E 'use strict; {my $name = "Joe";} say $name'
  Global symbol "$name" requires explicit package name at -e line 1.
  Execution of -e aborted due to compilation errors.

Note that in 'while(my $file = <$fh>) {...', the variable $file is only going to be active inside of the while loop because in a compound statement, any my variables declared in EXPR are scoped to the attached block.

It is good programming practice to confine your variables to the smallest scope possible. So, we did not declare $name, $fixed, and $created at the top of the file because we only needed them inside of the while block. But, we did need to declare variables like the %bugs hash outside of the while block so that we could get to them later.

Exercises

1.  Examine our finished program (above) and determine the scope of each variable. Which variables are file-scoped? How could we change the program to reduce the number of broadly scoped variables?

References

We've used hashes and talked about arrays earlier. But, these are both limited in that their values can only be scalars. So you can have @array = (1,2,3) or %hash = (a => 1, b => 2, c => 3), but what about an array of arrays? This is where references come in.

A reference is also a scalar value, but it refers to another variable which can be a scalar, hash, array, or subroutine. A reference to any variable can be obtained by placing a backslash in front of its sigil:

  my @array = (1, 2, 3);
  my $array_ref = \@array;
  say $array_ref->[0];

When we access a value in a reference, the -> arrow operator deferences the scalar. We can also create a reference to a hash:

  my %hash = (a => 1, b => 2, c => 3);
  my $hash_ref = \%hash;
  say $hash_ref->{a};

But we are not limited to referencing named variables. We can create anonymous references by using [] for arrays and {} for hashes.

  my $array_ref = [1, 2, 3];
  say $array_ref->[0];
  my $hash_ref = {a => 1, b => 2, c => 3};
  say $hash_ref->{a};

And this means that we can have arbitrarily deep data structures by putting references in our references and so on.

  my @arrays = ([1, 2, 3], [4, 5, 6]);
  my $array_ref = $arrays[1];
  say $array_ref->[1];
  say $arrays[1][1];
  say join(',', @$_) for(@arrays);
  say $_ for(@{$arrays[0]});

Note that @$_ above is dereferencing the array reference in scalar $_. In the last case: @{$arrays[0]}, the braces surround the retrieval of $arrays[0] to clarify that the @ sigil is dereferencing that value. For more about references and how to use them, see the perldoc for perlref, perllol, perldsc, and perlreftut.

The trick to working with references is to know how to access the thing they reference. For a single hash or array element, you need the -> operator and {$key} or [$index], but you can get or set the reference's contents as a list by prepending the appropriate sigil (% or @.)

Exercises

1.  What is ref(\%somehash)? How would you use the ref function to write code which accepts either hash refs or array refs?

Subroutines

A subroutine is the fundamental reusable unit of behavior. A subroutine takes a list of arguments, and returns a list of values. This provides an abstraction where you don't need to think about what happens inside the subroutine, only about its inputs and outputs.

Recall that we needed a more widely scoped variable to keep track of $max and $max_name, but that we also needed to change this to be able to handle a tie. We can write a subroutine which bundles all of that into one piece:

 29 sub get_winners {
 30   my (%data) = @_;
 31 
 32   my $max = 0;
 33   my @win;
 34   for my $key (keys %data) {
 35     if($data{$key} > $max) {
 36       @win = ($key);
 37       $max = $data{$key};
 38     }
 39     elsif($data{$key} == $max) {
 40       push(@win, $key);
 41     }
 42   }
 43   return(@win);
 44 }

This would be called as my @winners = get_winners(%net_bugs), where %net_bugs is a hash of name => net-fixed-bugs pairs. The function then returns a list of winners (which will be just one if there is no tie for first.)

Perl passes inputs to a subroutine as the list @_. In this case, we're taking advantage of Perl's relation between hashes and lists, which is that an even-numbered list of values can be assigned to a hash — the list is taken as a series of key,value pairs. This is the same as when you initialize a hash with the () list constructor (e.g. my %hash = (a => 1, b => 2).)

You might have already figured out how the rest of this code works from the exercise earlier. When we find a new maximum value, we reset the @win list to contain only that name.

 35     if($data{$key} > $max) {
 36       @win = ($key);
 37       $max = $data{$key};
 38     }

If we find a value which ties for the current maximum, we add the corresponding name to the @win list.

 39     elsif($data{$key} == $max) {
 40       push(@win, $key);
 41     }

Notice that all of this subroutine's variables are limited to the scope of its block. The inputs all come via @_ and the return() provides all of the outputs. So, this subroutine has no side-effects, which is a good thing. There is no restriction in Perl against modifying variables at a distance, but avoiding side-effects in your subroutines makes them more reusable and predictable.

There is one more thing we can do to tidy-up our program using subroutines and references. We can get rid of the two hashes %bugs and %created and put things into a form which will more easily be fed into the new get_winners() subroutine. So, we want something that reads our input file into a data structure. This subroutine will take a filehandle and the team number as inputs.

 46 sub get_team_stats {
 47   my ($fh, $number) = @_;
 48 
 49   my @data;
 50 
 51   my $found;
 52   while(my $line = <$fh>) {
 53     chomp($line);
 54     if($line =~ m/^Team (\d+)$/) {
 55       last if($found);
 56       $found = 1 if($1 == $number);
 57       next;
 58     }
 59     next unless($found);
 60     my ($name, @stats) = split(/[:,] +/, $line);
 61     die "malformed $line" unless(@stats == 2);
 62     push(@data, $name, \@stats);
 63   }
 64   die "nothing for team $number" unless(@data);
 65   return(@data);
 66 }

We've changed the split here to just break into as many pieces as it can. Then, we're checking that the number of elements in @stats is 2.

 60     my ($name, @stats) = split(/[:,] +/, $line);
 61     die "malformed $line" unless(@stats == 2);

Now we're going to push a pair of e.g. "Jill", [10,6] onto @data. When we return @data, that list of pairs will easily turn into a hash, but this preserves the input ordering through the return.

 62     push(@data, $name, \@stats);

When we call this as my %bugs = get_team_stats($fh, $team_number), the pairs returned turn into keys and values of %bugs. Note that the values are two-element array references ("bugs fixed", "bugs created".)

Exercises

1.  Write a subroutine 'winners' which takes either a hash reference or array reference as input and returns a list of which keys or indices (respectively) have values tied for the maximum value. It would be called as my @win_keys = winners(\%hash) or my @win_idx = winners(\@array). So, given a reference to the array '(1,7,8,5,8,8)', it should return the list '(2,4,5)', and given a reference to the hash '(a => 2, b => 1, c => 2)', it should return the list '("a","c")'.

2.  What side-effect does the new get_team_stats subroutine have? What should you assume (or not assume) about the state of a filehandle after passing it to a subroutine?

Putting it Together

We're now ready to redo our program in terms of these two new subroutines. But, recall that get_team_stats returns "name,value" pairs where the value is an array reference. So, the number of bugs fixed by Phil is $bugs{Phil}[0], and $bugs{Phil}[1] is the number of bugs created by Phil. Thus, Phil's net is $bugs{Phil}[0] - $bugs{Phil}[1].

Recall that the winner is calculated by net scoring. So to feed get_winners, we could create a hash from our new data structure like so:

  my %net;
  $net{$_} = $bugs{$_}[0] - $bugs{$_}[1] for(keys %bugs)

That is, calculating the net score for each person's name ($_). But we could also use the map function to calculate and assign in one step.

  my %net = map({$_ => $bugs{$_}[0] - $bugs{$_}[1]} keys %bugs);

The map function performs list transformations. By generating a list of pairs (each name $_, and the corresponding net score), this creates the desired hash in one step. In reality, we do not need a hash named net, just something to pass into the get_winners function.

Therefore:

  1 #!/usr/bin/env perl
  2 
  3 use 5.010000;
  4 use warnings;
  5 use strict;
  6 
  7 my $team_number = 42;
  8 my $filename = 'input2.txt';
  9 
 10 open(my $fh, '<', $filename) or die "cannot open '$filename' $!";
 11 
 12 my %bugs = get_team_stats($fh, $team_number);
 13 
 14 say "Bugs created:";
 15 say "  $_: $bugs{$_}[1]" for(
 16   sort({$bugs{$b}[1] <=> $bugs{$a}[1]} keys %bugs));
 17 
 18 say "Bugs fixed:";
 19 say "  $_: $bugs{$_}[0]" for(sort keys %bugs);
 20 
 21 my @winners = get_winners(
 22   map({$_ => $bugs{$_}[0] - $bugs{$_}[1]} keys %bugs)
 23 );
 24 
 25 say "$_ gets a USB rocket launcher!" for(@winners);
 26 
 27 exit;
 28 
 29 sub get_winners {
 30   my (%data) = @_;
 31 
 32   my $max = 0;
 33   my @win;
 34   for my $key (keys %data) {
 35     if($data{$key} > $max) {
 36       @win = ($key);
 37       $max = $data{$key};
 38     }
 39     elsif($data{$key} == $max) {
 40       push(@win, $key);
 41     }
 42   }
 43   return(@win);
 44 }
 45 
 46 sub get_team_stats {
 47   my ($fh, $number) = @_;
 48 
 49   my @data;
 50 
 51   my $found;
 52   while(my $line = <$fh>) {
 53     chomp($line);
 54     if($line =~ m/^Team (\d+)$/) {
 55       last if($found);
 56       $found = 1 if($1 == $number);
 57       next;
 58     }
 59     next unless($found);
 60     my ($name, @stats) = split(/[:,] +/, $line);
 61     die "malformed $line" unless(@stats == 2);
 62     push(@data, $name, \@stats);
 63   }
 64   die "nothing for team $number" unless(@data);
 65   return(@data);
 66 }

This is somewhat longer than our previous program, but mostly because we fixed the issue about handling a tie. Each subroutine also adds a few lines of code vs the in-line version. But the main body of the program now clearly reads as:

  1. open the file
  2. read the data for our team
  3. output the number of bugs created in descending order
  4. output the number of bugs fixed in alphabetical order
  5. calculate and output the winners

Exercises

1.  Why do we want the team number to be passed into the get_team_stats subroutine? That is, why shouldn't we just use the outer lexical variable $team_number directly from inside the subroutine?

2.  Do we need the @winners variable? Change the final for loop to take the output of get_winners().


Further Reading

Visit the perlintro documentation for a more thorough introduction to Perl and links to additional documentation. By now, you should be familiar with the perldoc documentation. If you have questions, perlfaq may have answers for them. For review, here are the docs we covered in this tutorial again:

Finally, you may want to browse the purlfunc documentations to learn what builtin functions are available. Here is a list of the functions we covered in this tutorial (in order of appearance.)

Exercises

You should now be prepared to solve almost any problem in Perl. Here are a few exercises which build on what we've done here.

Reporting

Write a subroutine which takes the %bugs data structure as input and prints the report for the team (including bugs created, bugs fixed, and winners.)

Multiple Teams

Change the program to read the entire input into one data structure (such as $team_data{$number} = \%bugs.) Output a report for each team, including the total net bugs fixed by each team.

Sending Mail

Change the program to send its output via e-mail. Hint: you might need the Net::SMTP_auth module from the CPAN for this if your mail provider requires you to authenticate to send mail.