Wednesday, October 3

perl's each() and subtle retention of unwanted state

There's probably a "traps for the unwary" section somewhere out there that already describes this.

+

Anyway, by way of figuring out a bug earlier today, I discovered that you really shouldn't do this in a method:

  sub handle {
    my ($self, $option) = @_;
    my $output;

    # entry_map() returns a hashref

    while ( my ($pattern, $dispatch) = each %{ $self->entry_map } ) {
      if ($option =~ $pattern) {
        $output .= $dispatch->($self, $option);
        last;
      }
    }

    return $output;
  }

If you already know why you shouldn't do this, all I can say is that you were paying more attention to perlfunc than I was. It turns out that each keeps track of its place in a given hash regardless of where it's called within your program. If you're iterating over a local hash and you break out of your loop early, this probably won't bite you. In a context like the one above - where entry_map() returns a hash reference - you are going to wind up retaining a subtle little piece of state between method calls (and even potentially propagating it to other methods that use the same hash), because the reference always refers to the same anonymous hash.

In my case this led to particularly weird behavior under a FastCGI wrapper script. Since I was using the same object to process multiple queries, I would see a first request succeed (after hitting a pattern match), a second fail (but finish iterating over the hash), and a third succeed (by starting a fresh iteration and hitting a pattern match again). And so on.

Since I hadn't the faintest idea what was going on (the above parenthetical explanation being obvious only in retrospect), I thought that I might get some mileage out of using Data::Dumper to look at the contents of my object on each request and compare the ones that worked to the ones that didn't. Reasonable, right? Except that every time I did this, the bug disappeared. Why? Because apparently you reset the iterator for a given hash by reading all of its elements (or calling keys or values on it). Since Data::Dumper pulls out all the values it reasonably can from a data structure, it had the unexpected side-effect of resetting the iterator for my entry_map hash.

This fixed things:

  my %map = %{ $self->entry_map };

  while ( my ($pattern, $dispatch) = each %map ) {
    if ($option =~ $pattern) {
      $output .= $dispatch->($self, $option);
      last;
    }
  }

...although it might be better to dispense with each and the explicit loop altogether:

  sub handle {
    my ($self, $option) = @_;

    # Take the first matching pattern:
    my $map = $self->entry_map;
    my ($pattern) = grep { $option =~ $_ } keys %{ $map };

    return unless defined $pattern;
    return $map->{$pattern}->($self, $option);
  }

Now you know how I spent my afternoon.