This semester I am taking a class on statistical learning theory where we prove bounds on various learning algorithms and I came to realize that I did not know all of the methods that we were proving bounds on. To bring myself up to speed, I picked up this book. Having had only minimal exposure to the algorithms that underlie machine learning, I found this introduction to be very useful. It starts with a concise, but by no means terse review of basic statistics which lays the foundation for the rest of the book. If you struggle to get through this review, or if it is new material, this may not be the book for you. I should say that the author does not shy away from using equations, but does not use them gratuitously either. He also does a reasonable job of not only explaining the steps that may not be intuitive as well as giving some motivation for what the equations actually mean.
After reading this book, I can actually say that I have a much deeper understanding of many the algorithms discussed. I found the exposition on principle component analysis (PCA) to be very enlightening (I have come across PCA in my work and had not previously found an explanation that I could understand) and the whole chapter on dimensionality reduction fascinating. The chapters that discuss clustering and kernel methods were also good. Also, the way that each chapter, which roughly corresponds to a single method, first focuses on the way the algorithm can be used for classification and then the more general case of regression was well thought out.
This book does have some drawbacks though. For instance, there are many careless typos in some chapters, making you wonder if they just forgot to proofread these chapters. Even more infuriating, I am fairly certain that I came across at least one equation that was misprinted. After just one wrong equation, you start to question the veracity of every one which you do not fully comprehend. Also, I must say that I still only vaguely understand how a multilayer percetron works even though it is a major focus of the book. Also, the chapter on Bayesian estimation was hard to follow.
All in all, I think this book is well worth the price and that if you devote the time needed to read it you will learn a lot.