- Format: Kindle Edition
- File Size: 21696 KB
- Print Length: 736 pages
- Simultaneous Device Usage: Up to 2 simultaneous devices, per publisher limits
- Publisher: Pearson; 1 edition (29 Aug. 2013)
- Sold by: Amazon Media EU S.à r.l.
- Language: English
- ASIN: B00IZ0GAFG
- Text-to-Speech: Not enabled
- Word Wise: Not Enabled
- Average Customer Review: 3 customer reviews
- Amazon Bestsellers Rank: #651,814 Paid in Kindle Store (See Top 100 Paid in Kindle Store)
Enter your mobile number below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
Getting the download link through email is temporarily not available. Please check back later.
To get the free app, enter your mobile phone number.
|Print List Price:||£60.99|
Save £11.99 (20%)
Introduction to Data Mining: Pearson New International Edition [Print Replica] Kindle Edition
|Length: 736 pages||Format: Print Replica|
Customers Who Bought This Item Also Bought
Top Customer Reviews
Covers all the major parts of Data Mining, for understanding Clustering, Classification and Association Rule Mining.
Even though I would title this under machine learning more.
Most Helpful Customer Reviews on Amazon.com (beta)
This book widely cover areas such as data preparation and understanding, classification, anomaly detection, association analysis and clusering. Although the book has a strong emphasis on the two last ones, nearly all standard data mining techniques are at least briefly discussed. However, this book does only have a fiew pages about kernel methods for example. Indeed, it is normal, as kernel methods are more suitable for machine learning (I mean making prediction) than data mining (I mean looking for description).
Therefore, this book is:
* able to explain data mining without thousands of equations
* a good way to start with data mining
* covering nearly all standard data mining techniques
* focused on association analysis and clustering
and it is not:
* a good book for kernel methods and other advanced techniques
* written in the statistical nor in the database perspective
My comment: if you are in the data mining field and not comming from mathematics or databases, then you really should buy this book.
The extensive problem sets are well suited for the student. These often expand on concepts in the narrative, and are worth tackling.
The central theme in the book is how to classify data, or find associations or clusters within it.
Cluster analysis gets two chapters that are superbly done. These summarise decades of research into methods of grouping data into clusters. Usually hard to do, because an element of subjectivity can creep into the results. If your data is scattered in some n-dimensional space, then clusters might exist. But how to find them? The chapters show that the number of clusters and the constituents of these can depend on which method you adopt, and various initial conditions, like [essentially] seed values for clusters, if you choose a prototype cluster method like K-means.
The descriptions of the cluster algorithms are succinct. Why is this useful? Because it helps you easily understand the operations of the algorithms, without drowning you in low level detail. Plus, by presenting a meta-level comparison between the algorithms, you can develop insight into rolling your own methods, specific to your data.
Part of my research involves finding new ways to make clusters, and the text was very useful in explaining the existing ideas.
Speaking somewhat loosely, the goal of data mining is to find interesting patterns in massive amounts of data or the classification of such patterns. This entails of course that one have a notion of what is "interesting" and one of the main problems in data mining is to find suitable `interestingness measures'. And since one is typically dealing with large amounts of data, one must use various statistical sampling and preprocessing techniques to massage the data and obtain a `representative' sample of the original data. In addition, one must be able to handle data that is `anomalous', i.e. data that has characteristics that are markedly different from most of the other data, or that has attributes that are unusual if compared with typical values for those attributes. These issues and techniques are discussed in detail in the first three chapters of the book, where the authors outline some of the bread-and-butter topics needed for effective manipulation of data.
The real substance and power of data mining comes from its role in classification and for discovering interesting patterns in huge data sets. The authors, in chapters 4 - 7, discuss various powerful techniques for data classification and association analysis. Association analysis in particular has been used quite extensively in recent years, due to the use of market basket transactions in on-line purchasing and the goal of marketers to learn the purchasing behavior of their customers. Association analysis uncovers relationships in the marketing data in the form of `association rules'. For disjoint itemsets X and Y, an association rule is a logical implication expression between these itemsets that has a certain `strength' that is measured by its `support' and `confidence.' The support measures how often a rule is applicable to a given data set, while the confidence measures how frequently the items in Y appear in X. The support reflects the ability of the rule to be not due to chance alone, while the confidence measures the reliability of the rule inference. The collection of all association rules that can be formed from a data set is too large to be practical and so strategies must be developed to prune the number of rules. The authors discuss in detail various methods for dealing with this computational drawback, such as `frequent itemset generation' and `rule generation.'
The detection of anomalies consists of the identification of `outliers', which as the name implies are data objects that lie "far away" from the other data objects. It remains of course to quantity what it means to be "far away" and for this reason this branch of data mining, as the author points out, is sometimes called `deviation detection' or `exception mining'. The omission of outliers is sometimes justified, since they are merely artifacts that only serve to alter the statistics of a particular data set. However, sometimes their presence signals important information, if not a major scientific discovery. Data mining therefore must contain tools that detect anomalies intelligently and efficiently. The authors discuss anomaly detection in fair detail, emphasizing the statistical techniques that are available to do it. They classify the techniques for anomaly detection as being `unsupervised', `supervised', and `semi-supervised'. As the name implies, supervised anomaly detection requires the existence of a training set with both anomalous and "normal" data with each class being labeled as such. When these labels are unavailable, one has to perform unsupervised anomaly detection, and for this approach to work the anomalies must be distinct from one another. If the normal data is labeled but the anomalies are not, one must do semi-supervised anomaly detection. The only weakness in the authors' discussion is that they do not include real-world case studies that illustrate the different techniques, such as clustering and density methods.
Clear, easy to read text that covers many topics of data classification and mining (as described by the several other reviewers).
Don't expect to be able to actually perform any of the data mining techniques discussed in the book from (or while) reading it. There is no software that comes along with it, nor does the book champion a specific package that you can use while reading through the chapters. It kind of reminded me of the old Wendy's commercial "Where's the beef?"
If you're interested in getting a general feel of what data mining has to offer, this is a decent first read. If want to do any of those things, you will need to seek out other sources.
I personally found books associated with specific software packages much more useful. Depending on your background, you may be better off skipping straight to them.
Look for similar items by category
- Books > Computing & Internet > Computer Science > Information Systems
- Books > Computing & Internet > Databases > Data Storage & Management > Data Mining
- Kindle Store > Kindle eBooks > Computing > Databases
- Kindle Store > Kindle eBooks > Computing > Networking > System Administration > Storage & Retrieval