on 27 April 1999
I've had the pleasure of listening to Dorian speak at seminars and even sharing a few brief words with him in person. When he mentioned to me last year that he was working on this book I had no idea how thorough and complete it would be. In fact, I remember wondering to myself how anyone could get their hands around this difficult, yet important aspect of data mining. I'm in awe! Anyone in the trenches will immediately understand the value of this book. Those just getting started in data mining will probably have no idea how much simpler their job just became. My only criticism of this book is that its title obscures that fact that there is a wealth of general data mining information contained within it - practical well beyond the data preparation phase. To understand why and how certain data preparation techniques work is to go a long way towards appreciating subtleties throughout the rest of the data mining process. Thanks Dorian!
on 21 May 1999
This book is simply great, it provides the best practices about the essence of data mining : data preparation
Any analyst knows that 80% of data mining time is spent in data preparation, nevertheless most authors focus on the remaining 20% : techniques and tools, where the value is more "visible"
The truth is that this value is unavaillable, unless you get data prepared, and that's the difficult job
Dorian book's allows you to understand this obscure process through a great analytical process. Strongly recommended
on 21 May 2006
An enjoyable book that addresses the topic of preparing for data mining and predictive analytics projects. It's a little light on examples but compensates for that by covering certain subjects really well. It stimulates thought by making good use of tone and clarity, without laboring too much on unnecessary topics.
This book will appeal to those already involved in data mining as well as those about to start new projects. It will definitely help people to plan well and re-assess what has already been done; and may even surface a few surprise reflections.
In my view, some of the material is inconclusive, which leaves the reader to fill in the gaps; perhaps a good thing. One often needs to be quite creative in order to derive usefulness out of sparsity and it's books like this that help drive the imagination.
For those new to the topic, I recommend reading a couple of other data mining books first, i.e. Data Mining, Practical Machine Learning Tools and Techniques by Ian H. Witten & Eibe Frank or Mastering Data Mining by Michael J. A. Berry & Gordon S. Linoff just to get a feel for things first.
Dorian Pyle has written this book for practicing data analysts who need a toolbox of techniques to get data ready for exploration and modeling. It is intended to fill "...that gap in the process between identifying data and building models." Because many data preparation techniques anticipate how data will be analyzed, the book also discusses a variety of data modeling tools and strategies.
The book's twelve chapters can be organized into three groups. The first three discuss data exploration as the larger context in which data mining is conducted. The author reminds us that finding interesting and useful problems that data analysis can solve is at least as important as knowing how to solve them. Chapters four through eight present common data problems and offer solutions. Processes include assembling data from archives and other sources, selectively removing variables, replacing missing observations, and normalizing distributions.
The final four chapters are more specialized. Chapter 9 discusses data problems that appear in time series data and other types of series data. Chapter 10 describes issues that may remain in data sets after problems with individual variables have been corrected. Chapter 11 describes why and how to conduct a "data survey" to learn the high-level features of a data set and prepare for more detailed analysis. The last chapter closes the book with modeling and analysis techniques--where most data mining books begin.
This book is an excellent resource for practicing data miners. It's coverage of data preparation is thorough; it connects well to other aspects of data mining; and it emphasizes overall purpose of making decisions with data. It provides an adequate statistical foundation and has a practical focus throughout. It is full of tips, tactics and techniques. Nicely done, Mr. Pyle.