Customer Reviews

3 Reviews
5 star:
4 star:
3 star:    (0)
2 star:    (0)
1 star:    (0)
Average Customer Review
Share your thoughts with other customers
Create your own review
Most Helpful First | Newest First

4 of 4 people found the following review helpful
4.0 out of 5 stars Two Paths to Prediction, 28 Feb 2013
John M. Ford "johnDC" (near DC, MD USA) - See all my reviews
This review is from: Data Mining:: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems) (Kindle Edition)
This is a good text on machine learning techniques from both the statistics and the machine learning perspectives. The authors note that these fields have developed in parallel with many researchers and practitioners working in each, but few familiar with the full range of techniques in both disciplines. Some procedures, such as tree induction and nearest neighbor clustering techniques, have been developed independently in both fields. However, for the most part statistics has focused on hypothesis testing and machine learning has tried to optimize search through the space of possible hypotheses. This book presents techniques from both traditions.

The organizational structure of the book supports its use as either a comprehensive text or a modular reference. The first section's five chapters introduce the foundations of data mining. In addition to concepts and definitions, there are simple example data sets and accessible descriptions of how both raw data and final analyses are used in this field. A particularly well-written fifth chapter discusses how to evaluate data mining models. It discusses the rationale for holdout samples, the use of cross-validation procedures, and how to avoid over-fitting models. Machine learning texts frequently lack depth on this topic while statistics texts often fail to communicate the consequences of poorly-fitted models. This integration of perspectives is a good one.

Chapters in the second section build on this foundation. Chapter 6 describes how to use ten different techniques to detect and describe patterns in large data sets. This section also describes how to prepare data for data mining, how to combine and transform variables to increase model accuracy, and how to improve prediction by combining different model types using bagging, boosting, and other aggregation techniques. A final chapter outlines directions of current and future research expanding our toolbox of techniques. The eight chapters of the third and final section are a detailed tutorial covering the Weka workbench of machine learning algorithms and data transformation tools.

This book has several communication strengths. The scope is broad for an introductory text. The Further Readings collections at each chapter's end are reasonably brief and point to current and in-depth sources. The text itself contains numerous example analyses and follows the useful strategy of analyzing the same data with several techniques. Its review of algorithms and formulas focuses on explaining how they work rather than on deriving them from general principles. A key strength is the book's close integration with Weka. This ensures that readers can step through analysis procedures, experiment with variations from default paths, and compare the performance of different formulations of the same research problems.

I recommend the book for readers introducing themselves to machine learning. It will take some of your time to learn the techniques and practice using them in Weka, but it will be time well spent. Don't skip the Weka section!
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No

3 of 3 people found the following review helpful
5.0 out of 5 stars Experts in their fields, 29 Jun 2013
This book is written by the software architects/developers of the WEKA machine learning tool. The book is large and comprehensive. The authors introduce the reader to the correct terminology to use when referring to concepts, and every concept they mention they follow through and explain its purpose and nuances in detail. The first 400 pages are dedicated to data mining and machine learning theory in an academic context with examples using simple tabular data which is easy to understand and is never longer than 1 page so as not to overcomplicate the learning experience (KISS - keep it simple stupid). There is however some maths that is over my head (the polynomials on page 231 and cartesian products on page 266). The remaining 200 pages are focused on using the WEKA multi-platform tool which the authors personally developed and have made open source. The hands-on section using WEKA also includes exercises which are suitable for a classroom environment. The authors of this book are not chancers looking to make some quick money, they are clearly experts and are very very good at what they do. Highly recommended.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No

2 of 2 people found the following review helpful
5.0 out of 5 stars A great classic updated, 8 July 2011
mboaj (Milan, Italy) - See all my reviews
I have been using Witten's book since its first edition for my course on Data Mining. The first edition was probably too limited (few topics, not discussed in depth). The second and this third edition have improved the original project much more. My students seem to prefer this more than Han's textbook.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No

Most Helpful First | Newest First

This product

Only search this product's reviews