The field now collected beneath the standard of 'Machine Learning' is so vast, and draws from so many different schools of thought, and hence mindsets, notations and assumptions, that it is extremely hard to take your bearings. Even knowing what exists, and how it relates to the rest of what exists, is extremely difficult. The old school statistics guys speak one language, the machine learners another, and the Bayesian chaps yet a third, and so although there are many unifying ideas, these are hard to identify. The primary strength of this book is that it allows the reader to see the connections by providing a unifying framework and notation all the way from basic distributions through standard statistical models to machine learning black-boxes and out to applied algorithms. Many sections end in current academic references, as well as current practical uses thereof. I have wanted such a text for a very long time, and am thrilled to have found it.
Beyond that, the approach that the book takes to maths hits the sweet-spot between the thicket of lemma-lemma-theorem-proof found in 'academic' books, and the hand-wavy elisions found in 'practitioners' book. That is, important proofs are stated and fully worked, within the context of softer discussion of the concepts presented. Finally, having the source code for all images in the books allows you to dive in and really understand by doing. Having this code a gold standard off which to base your own software is fantastic.
I have read the other main books in this area (PGM, ESL, PRML etc) and think this is the most broad, thorough and unified presentation available. It can be used as the foundation for understanding this field.