Perl is quite useful in the world of computational biology, so I expected good things from Perl for Exploring DNA. A previous book on the subject, Beginning Perl for Bioinformatics, hasn't been updated since 2001. BioPerl is the largest project on the Comprehensive Perl Archive network (both in terms of features and disk size), and a guide to effectively using Perl would be extremely useful.
Perl for Exploring DNA is not that book, however, and it doesn't even mention BioPerl until the last chapter, and then only to say that it won't be discussing it. Instead, this book is about learning just enough Perl to write some toy programs, and certainly misses out on the features I'd want to pass on to scientists who are trying to leverage the power of a dynamic language to wrestle large data analysis problems.
The authors understanding of Perl, or at least their ability to explain in it in a way that won't confuse the reader who goes further with Perl, is seriously lacking. They fall into the trap of many novice explanations of concepts that give other beginners a poor foundation for further work. Often, the language they use to explain certain features uses terms that conflict with Perl jargon, which is bound to cause problems later.
As with most books of this sort, it seems that the Perl is really just a crude translation of similar C language programs. The examples may be in Perl, but lack the power of Perl. For example, on page 84, they want to count the number of times different nucleotides show up. They commit a carndinal sin of programming by repeating the same code with very slight changes to the different lines:
$numA = ( $sequence =~ tr/A/A/ );
$numC = ( $sequence =~ tr/C/C/ );
$numG = ( $sequence =~ tr/G/G/ );
$numT = ( $sequence =~ tr/T/T/ );
$probOfA = $numA / $sequenceLength;
$probOfC = $numC / $sequenceLength;
$probOfG = $numG / $sequenceLength;
$probOfT = $numT / $sequenceLength;
They have to do it this way because they shove the discussion of hashes, probably the most useful feature of Perl to the beginner, toward the end of the book. Since most of the useful stuff shows up at the end, most of the examples in the book do things the hard way. Additionally, many examples use features that the authors have not previously explained
Most of the examples in the book perform various operations on a string the represents DNA, but the operations don't have any real science behind them. This is a consequence of not being able to solve meaty problems with the limited amount of Perl they discuss. There are many asides that digress from the current topics and take away from the book focus.
Most disturbing, however, is a lack of wisdom in using Perl, The authors say very little about CPAN, the Comprehensive Perl Archive Network, which is Perl's killer feature. If there's something you need to do, CPAN probably has a module for it. When your a scientist just trying to get work done, CPAN should be the first thing you learn about. I expected this to be a real-world, task-oriented book. It's not. Futhermore, there are many great resources in the Perl community, but on page 20 the authors essentially say "Don't use them" out of some fear that scientists won't be able to hold their own against a "technical" community. After CPAN, the next best advice for the work-a-day Perler is effectively using the resources that are out there.
The concept of the book is a worthy one, but its execution, at least in this edition, is severely lacking. There are several other good books for the Perl beginner (although without the DNA focus), and I'm sure that any scientist worth his salt will know how to take the general concepts and apply them to specific tasks. Other books, such as Data Munging with Perl, can help with advanced string manipulation tasks.