Data Mining: Practical Machine Learning Tools and Techniq... and over 2 million other books are available for Amazon Kindle . Learn more

Have one to sell? Sell yours here
Sorry, this item is not available in
Image not available for
Colour:
Image not available

 
Start reading Data Mining on your Kindle in under a minute.

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems) [Paperback]

I. H. Witten , Eibe Frank
3.7 out of 5 stars  See all reviews (3 customer reviews)

Available from these sellers.


Formats

Amazon Price New from Used from
Kindle Edition 37.07  
Paperback --  
There is a newer edition of this item:
Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems) Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems) 4.8 out of 5 stars (4)
30.09
In stock.

Book Description

13 July 2005 0120884070 978-0120884070 2nd Revised edition
As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights for the new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; plus much more; algorithmic methods at the heart of successful data mining-including tried and true techniques as well as leading edge methods; performance improvement techniques that work by transforming the input or output; and, downloadable Weka, a collection of machine learning algorithms for data mining tasks, including tools for data pre-processing, classification, regression, clustering, association rules, and visualization-in a new, interactive interface.


Product details

  • Paperback: 560 pages
  • Publisher: Morgan Kaufmann Publishers In; 2nd Revised edition edition (13 July 2005)
  • Language: English
  • ISBN-10: 0120884070
  • ISBN-13: 978-0120884070
  • Product Dimensions: 2.9 x 18.8 x 23.1 cm
  • Average Customer Review: 3.7 out of 5 stars  See all reviews (3 customer reviews)
  • Amazon Bestsellers Rank: 377,279 in Books (See Top 100 in Books)
  • See Complete Table of Contents

More About the Authors

Discover books, learn about writers, and more.

Product Description

Review

"This book presents this new discipline in a very accessible form: both as a text to train the next generation of practitioners and researchers, and to inform lifelong learners like myself. Witten and Frank have a passion for simple and elegant solutions. They approach each topic with this mindset, grounding all concepts in concrete examples, and urging the reader to consider the simple techniques first, and then progress to the more sophisticated ones if the simple ones prove inadequate. If you have data that you want to analyze and understand, this book and the associated Weka toolkit are an excellent way to start." - From the foreword by Jim Gray, Microsoft Research "It covers cutting-edge, data mining technology that forward-looking organizations use to successfully tackle problems that are complex, highly dimensional, chaotic, non-stationary (changing over time), or plagued by. The writing style is well-rounded and engaging without subjectivity, hyperbole, or ambiguity. I consider this book a classic already!" - Dr. Tilmann Bruckhaus, StickyMinds.com

About the Author

Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography. He has written several books, the latest being Managing Gigabytes (1999) and Data Mining (2000), both from Morgan Kaufmann. Eibe Frank lives in New Zealand with his Samoan spouse and two lovely boys, but originally hails from Germany, where he received his first degree in computer science from the University of Karlsruhe. He moved to New Zealand to pursue his Ph.D. in machine learning under the supervision of Ian H. Witten, and joined the Department of Computer Science at the University of Waikato as a lecturer on completion of his studies. He is now an associate professor at the same institution. As an early adopter of the Java programming language, he laid the groundwork for the Weka software described in this book. He has contributed a number of publications on machine learning and data mining to the literature and has refereed for many conferences and journals in these areas.>

Inside This Book (Learn More)
First Sentence
Human in vitro fertilization involves collecting several eggs from a woman's ovaries, which, after fertilization with partner or donor sperm, produce several embryos. Read the first page
Explore More
Concordance
Browse Sample Pages
Front Cover | Copyright | Table of Contents | Excerpt | Index
Search inside this book:


Customer Reviews

3.7 out of 5 stars
3.7 out of 5 stars
Most Helpful Customer Reviews
1 of 1 people found the following review helpful
4.0 out of 5 stars pretty thorough introduction 12 July 2010
By pp_fin
Format:Paperback
Books explains basics of machine learnig in a way that quite easy to understand. does not go deep into maths involved, but sufficiently to allow understandinf of algorithms explained.

Very usefull especially if you plan to use weka datamining tool as pretty much everythin available in weka is explained in this book to degree that you choose suitable algorithms and tune them correctly
Comment | 
Was this review helpful to you?
1 of 2 people found the following review helpful
Format:Paperback
I've tried three times to read this book.

But it is so badly written I can't get through it! It doesn't flow well and it seems to jump into things with little or no context in many places.

At times I even had to refer to other books such as 'Speech and Language Processing: an Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition' by Jurafsky to get a proper explanation and work through of the theories.

I'll be looking for another book in this area - don't buy this one.
Comment | 
Was this review helpful to you?
3 of 6 people found the following review helpful
5.0 out of 5 stars Revised and ready to lead you down a good path 21 May 2006
Format:Paperback
Having read the first edition, the authors earn the extra rating because they've managed to improve on their work and practical WEKA resource offering. Without a doubt, an essential read for people who are both new and experienced in the fields of data mining, descriptive & predictive analytics or state & behavioural modelling.

The volume of material on the market today is still quite limited and in the gap between the first and second edition of this book, quite a lot has actually changed in the field. In my view, book content has only marginally progressed with the times, perhaps in favour of attempting to attract and activate new members, practictioners and commercially oriented researchers to the fore of data mining. It's a bold step to evolve material as the field evolves; those breaking new ground in this area should be more visible and offered greater support.

I believe that there is room in the market now for some revised materials covering anomalised commercial implementations of Advanced Data Mining & AI Concepts. A small community of authors could plug this gap really well.
Was this review helpful to you?
Most Helpful Customer Reviews on Amazon.com (beta)
Amazon.com: 3.9 out of 5 stars  37 reviews
56 of 58 people found the following review helpful
5.0 out of 5 stars Lucid 21 Mar 2006
By Developer - Published on Amazon.com
Format:Paperback
I'm surprisingly please with this book. I've been reading up on the topic and associated algorithms in other books for some time; I'm a software developer but don't have a statistics background, and so felt a lot of the texts were too focused on the math and the theory while being thin on content when it came to "rubber hitting the road", or even using clear, simple examples and straight-forward notation.

This book is so well-written that it communicates the concepts clearly, lucidly and in an organized fashion. The section that introduces Bayesian probability was drop-dead simple to follow. Quite frankly, having read a few other treatments on it, I can now say that everything else I read before this was overly complicated. Brevity is the soul of wit, no?

To the reviewer who criticized the authors use of words to describe equations: This is what the authors intended to do. Would you fault them for writing in English if you wanted Greek? Not everyone who can benefit from applied data mining has the requisite background to understand the nitty gritty mathematics, nor should they have to, if they just want to understand the behavior and practical applications of the technology.
35 of 36 people found the following review helpful
4.0 out of 5 stars Very readable book on Data Mining and ML 9 Oct 2005
By K. Greene - Published on Amazon.com
Format:Paperback|Verified Purchase
This book is very easy to read and understand. Unlike Hastie's Statistical Learning book, it is not geared towards those with an expert level knowledge of statistics, and instead takes time to explain functions and formulas for the person with a decent but not extrordinary understanding of statistical/math concepts. For example, their description of a Gaussian was the clearest I've seen. On the other hand, if you're math/statistics background is considerable, you may find this book somewhat simplistic or tedious.

The book has a good coverage of techniques and algorithms, although I was somewhat disappointed that they do not mention Influence Diagrams, considering the amount of coverage of both decision trees and Bayesian techniques. Their discussion of Combining Multiple Models, however, is well done, and is not covered to this extent in most books I've seen. I also like how they broke out the discussion of input and output (knowledge representation) into their own chapters.

Addendum 10/30: After reading a good hunk of this book I still agree with most of what I said earlier, but I do think the authors could have gone into graphical models a lot more. At the end of the discussion on Bayesian networks, Markov networks and other graphical models are mentioned very briefly and the author says they are very big in ML right now, but he doesn't say why they didn't describe them further. It might have something to do with the organization of the book. Graphical models almost need a chapter of their own but the book's chapters discuss all techniques in one chapter but with varying levels of detail.
41 of 46 people found the following review helpful
4.0 out of 5 stars Very helpful 26 April 2006
By Dr. Lee D. Carlson - Published on Amazon.com
Format:Paperback|Verified Purchase
The major virtue of this book is the emphasis on practical applications and bread-and-butter techniques for accomplishing tasks that one could expect in a business environment. That is not to say that these techniques could not be used in a scientific research environment. They indeed could be, and in fact may be even easier to implement due to the long time scales that are available in research environments for processing information. In the business world however data mining has proven to be an activity that gives a substantial competitive edge, and so many businesses are seeking even more sophisticated methods of data mining and Web mining. Data mining could easily be considered to a branch of artificial intelligence (AI), due to its emphasis on learning patterns and performing classification, and the learning and classification tools it uses were discovered by individuals who would describe themselves as being researchers in artificial intelligence. But many, and it is fair to include the authors of this book, do not want to view data mining as part of artificial intelligence, since the latter stirs up discussions on the origin of intelligence, autonomous robots, and conscious machines, to paraphrase a line from chapter 8 of this book. The authors make it a point to emphasize that data mining, or "machine learning" is concerned with the algorithms for the inference of structure from data and the validation of that structure.

Along with its practical emphasis, the book includes discussions of some very interesting developments that are not usually included in books or monographs on data mining. One of these concerns the current research in `programming by demonstration.' This research is targeted towards the "ordinary" computer user who does not possess any programming knowledge but yet wants to automate predictable tasks. The only thing required from the user is knowledge of how to do the task in the usual way. As an example, the authors discuss briefly the `Familiar' system, which extracts information from user applications to make predictions and then generates explanations for the user about its predictions. Even more interesting is that it learns the tasks that are specialized for each individual user. It learns from the unique style of each user and their interaction history. One of the most interesting and powerful claims of programming by demonstration is that is domain-independent, considering the current intense interest in reasoning patterns or algorithms that can process information arising from multiple domains. In this regard a successful system would then be able to learn how to play chess from a user along with perhaps composing music. Again, the ability of a machine to reason in many domains is a step towards what many in the artificial community have called a `universal' learning machine. But the authors do not hold to this view, and in fact they open up the discussion in the chapter on the Weka workbench with a statement to the effect that there is no single learning algorithm that will work with all data mining problems. The "universal learner" they say, is an "idealistic fantasy."

Another interesting discussion included in the book is that of `co-training', which is a methodology that arises in the context of `semi-supervised learning.' In this learning scheme the input contains both unlabeled and labeled data. In co-training, one depends on the fact that the classification task depends on two different and independent perspectives. Then assuming there are a few labeled examples, a different model will be learned for each perspective, and then the models are separately used to label the unlabeled examples. Each model will contribute both negative and positive examples to the pool of labeled examples. The procedure is then repeated until the unlabeled pool is empty. This allows both models to be trained on the new pool of labeled examples. The authors point out some evidence indicating that if a (naive) Bayesian learner is used throughout this procedure, then it outperforms a learner that develops a single model from the labeled data. The intuition behind this is that using the independence of the two perspectives allows one to reduce the likelihood of an incorrect labeling. References are given for readers that want to investigate this approach in more detail, along with more brief discussions on its generalizations, such as co-EM, which involves probabilistic labeling of unlabeled data in one perspective, and how to use support vector machines in place of the naive Bayesian learner.

For the practitioner, the most useful discussion in the book concerns the evaluation of the different methods for data mining. What makes one approach to data mining better than another, and is there then a ranking of the different approaches? Can one in fact make judgments on the reliability or performance of data mining algorithms using solely the training or test data? If one had a general methodology for ranking data mining algorithms according to their performance then this would be a major advance, since this would allow a classification scheme for machine learning where one could speak of one machine being `more intelligent' than another. Unfortunately however this is difficult, and even said to be impossible according to some researchers. There are results in the research literature, going by the name of `free lunch' theorems, which seem to indicate that one cannot distinguish machine learning algorithms based solely on the way the deal with training or test data. The authors do not discuss these results in this book, but it is certainly apparent that they are aware of the difficult issues involved in the prediction of performance for data mining algorithms.
15 of 16 people found the following review helpful
5.0 out of 5 stars Incredibly practical introduction 30 Oct 2006
By David Donohue - Published on Amazon.com
Format:Paperback|Verified Purchase
This book is perfect if you are trying to get your hands around what data mining and machine learning is. Most of the books I have read on this subject want to start with equations and get more complex from there, with little practicality. This book makes extensive use of examples and introduces the mathematical basis for algorithms where needed. The authors make the point that simpler algoritms often work best for solving machine learning problems. Similarly, I would argue, simpler books work best for understanding highly complex fields. I very highly recommend this book.
21 of 24 people found the following review helpful
5.0 out of 5 stars Great Book in Every Way 1 Nov 2005
By R. Williams - Published on Amazon.com
Format:Paperback|Verified Purchase
The first edition of this book was good, but this is a huge improvement. The writing is really great, very clear, even when it heads into deeper waters. The explanation, for instance, of the various algorithms for accomplishing attribute discretization is very clear, even as the equations start to get very long and complicated.

It's pretty incredible that this book is so readable, kudos to the authors for that. Most importantly, though, it gives you a very good sense of what you need to know as you work through the many data mining options. The authors' assertion that DM is not a magic box is good, and it is clearly a dictate that they mind themselves throughout the book: DM doesn't mean that you just plug in a black box and it starts to lay eggs. Generating rules, building trees and knowing how to pick attributes to build the tree from are all critical topics that get excellent treatment.
Were these reviews helpful?   Let us know
Search Customer Reviews
Only search this product's reviews

Customer Discussions

This product's forum
Discussion Replies Latest Post
No discussions yet

Ask questions, Share opinions, Gain insight
Start a new discussion
Topic:
First post:
Prompts for sign-in
 

Search Customer Discussions
Search all Amazon discussions
   


Look for similar items by category


Feedback