on 18 July 2006
This book is mainly concerned with scripting as a 'glue' between applications: processing various input and output formats. The book is divided into 5 main categories of data handling: plain text, regular expressions, XML, binary data and SQL. There is a final chapter on various miscellaneous topics. Most of the examples are given in Python. Some of the code is demonstrated in Java, although, disappointingly for a book published in 2005, none of the Java 5.0 features are leveraged. However, if nothing else, it demonstrates why Java is not anyone's first choice for such activities.
If you've read any of the O'Reilly cookbook series, you will know what to expect, although the chapters are more cohesive and less episodic. Beginning programmers will get the most out of this book, although intermediate programmers should find at least some material here that's new to them.
The XML chapter is a pretty good introduction the use and advantages/disadvantages of SAX and DOM, and XSLT is also described, although the discussion is not so clear. Those without experience with databases will welcome the chapter on SQL. The discussion on dealing with plain text files in chapter 1 was highlight for me, a subject not often covered in much depth in cookbooks; if, like me, you still regularly need to convert between various plain text formats, this chapter will help formalise approaches that you may already be carrying out in a less than rigorous fashion.
Additionally, the paragraphs on floating point arithmetic were intriguing but all too brief. The chapter on dealing with binary is fairly good, although rather dry. Peter Seibel's discussion of binary data in the context of writing a Shoutcast server in Practical Common Lisp shows that the subject can be dealt with in a more compelling fashion. That said, for the most part, author Greg Wilson is a genial companion; the writing style is chatty, but doesn't overdo it.
Overall, if you own any cookbook-style books, there is little here that you don't already know. Even for a beginner, it's hard to see how anyone who decides they need this book hasn't already been exposed to some of the material here. In particular, does anyone really need yet another introduction to regular expressions? The treatment here isn't bad, it's just that this material is already covered in many introductory programming books (especially those that cover scripting languages like Perl and Python). As this takes up nearly 20% of the book, and there's less than 200 pages, it's a bit of a waste. Personally, I would have preferred more discussion of the less well-treated subjects, some of which are too sparsely described, but this would have detracted from the book's main aim.
This would be suitable for a beginner Pythonista, who for some reason didn't want the bulk of the likes of Python Cookbook. Otherwise, if you feel that some Pragmatic Programmers books can be rather lightweight and somewhat overpriced, this will not change your mind.
on 4 September 2005
Probably a good book for the beginner in data crunching, but I think it lacks more hardcore data crunching/reporting examples and it could do with some corrections.
To me it seems really stupid trying to advocate Test Driven Development, when - in at least 2 places in the book - there are assertions, which are not checked, revealing that neither author nor proofreaders actually checked the results were correct. This makes it a terrible book for beginners in areas like Regular Expression and SQL. RegEx is hard enough in multiple implementations to have to cope with misguiding from an introductory source.
I nearly din't get this book for the simple reason that I felt my library already covered most of the topics. I've got books on Python, XML, Regular Expressions, MySQL and so on - so why would I need this?
In the end I decided that I might as well get it (after all work was paying!), and that was defnitely the right decision. The strength of the book is not so much to introduce techology like XML as to point out when to use which data type, and how to work with them.
The author clearly has a lot of hard-earned experience in handling data, and that's where the practical examples can help. Unlike most texts the examples aren't "perfect" ones designed to showcase features of a language, but reflect real-world issues like inconsistent formatting.
It'll definitely be my first port of call when I next get asked to crunch some data.