Data Analysis with Open Source Tools and over one million other books are available for Amazon Kindle . Learn more


or
Sign in to turn on 1-Click ordering.
or
Amazon Prime free trial required. Sign up when you check out. Learn more
More Buying Choices
Have one to sell? Sell yours here
or
Get a £4.85 Amazon.co.uk Gift Card
Data Analysis with Open Source Tools
 
 
Start reading Data Analysis with Open Source Tools on your Kindle in under a minute.

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Data Analysis with Open Source Tools [Paperback]

Philipp K. Janert
4.0 out of 5 stars  See all reviews (2 customer reviews)
RRP: £30.99
Price: £20.14 & this item Delivered FREE in the UK with Super Saver Delivery. See details and conditions
You Save: £10.85 (35%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
In stock.
Dispatched from and sold by Amazon.co.uk. Gift-wrap available.
Only 9 left in stock--order soon (more on the way).
Want guaranteed delivery by Wednesday, May 30? Choose Express delivery at checkout. See Details

Formats

Amazon Price New from Used from
Kindle Edition £16.67  
Paperback £20.14  
Trade In this Item for up to £4.85
Get an extra £5 when you trade in books worth £10 or more until June 30, 2012. Trade in Data Analysis with Open Source Tools for an Amazon.co.uk gift card of up to £4.85, which you can then spend on millions of items across the site. Trade-in values may vary (terms apply). Find more products eligible for trade-in.

Frequently Bought Together

Customers buy this book with Hadoop: The Definitive Guide £29.65

Data Analysis with Open Source Tools + Hadoop: The Definitive Guide
Price For Both: £49.79

Show availability and delivery details

  • This item: Data Analysis with Open Source Tools

    In stock.
    Dispatched from and sold by Amazon.co.uk.
    This item Delivered FREE in the UK with Super Saver Delivery. See details and conditions

  • Hadoop: The Definitive Guide

    In stock.
    Dispatched from and sold by Amazon.co.uk.
    This item Delivered FREE in the UK with Super Saver Delivery. See details and conditions



Product details

  • Paperback: 540 pages
  • Publisher: O'Reilly Media; 1 edition (25 Nov 2010)
  • Language English
  • ISBN-10: 0596802358
  • ISBN-13: 978-0596802356
  • Product Dimensions: 23.4 x 17.8 x 3.6 cm
  • Average Customer Review: 4.0 out of 5 stars  See all reviews (2 customer reviews)
  • Amazon Bestsellers Rank: 122,745 in Books (See Top 100 in Books)
  • See Complete Table of Contents

More About the Author

Philipp K. Janert
Discover books, learn about writers, and more.

Visit Amazon's Philipp K. Janert Page

Product Description

Book Description

A hands-on guide for programmers and data scientists

Product Description

Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.

Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.

  • Use graphics to describe data with one, two, or dozens of variables
  • Develop conceptual models using back-of-the-envelope calculations, as well as scaling and probability arguments
  • Mine data with computationally intensive methods such as simulation and clustering
  • Make your conclusions understandable through reports, dashboards, and other metrics programs
  • Understand financial calculations, including the time-value of money
  • Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations
  • Become familiar with different open source programming environments for data analysis

"Finally, a concise reference for understanding how to conquer piles of data." --Austin King, Senior Web Developer, Mozilla

"An indispensable text for aspiring data scientists." --Michael E. Driscoll, CEO/Founder, Dataspora


Suggested Tags from Similar Products

 (What's this?)
Be the first one to add a relevant tag (keyword that's strongly related to this product)
 

Your tags: Add your first tag
 


Customer Reviews

5 star
0
3 star
0
2 star
0
1 star
0
Most Helpful Customer Reviews
3 of 3 people found the following review helpful
Mixed opinion 17 Jan 2012
Format:Paperback|Amazon Verified Purchase
I have to agree with a lot of the US reviews. I am missing a focus in the book.

The author wants to make a point how important it is to understand the math behind real world problems, but I was disappointed by his attempts to convey mathematical principles. Formulas may work for some people, to me the book failed to point out why they are necessary - or how i can add value with them in the analyses i do. In this regards, the author overpays his dues to his academic background. I can see how the author studied physics and addresses people with like-wise framed minds. But for these people, the book will be too trivial. The major disappointment for me was that the book failed to live up to its expectations regarding the subtitle "with open Source tools". I would have expected a range of cool tools to work with, instead it's GNU and R, and there is not a single end-to-end case of getting the data, figuring out the issue and then presenting it in a graph. Sometimes, the style is too conversational, sometimes it is too strict and abstract. There are few moments when the two extremes touch. Other parts of the book - were the author shares his academic insights - felt awkward. The statement "You will never understand what mathematics is if you see it only as something you use to obtan certain results" will definitely find its way in my "Dictionary of Received Ideas".

Still after all this negative criticism, I am giving it an average 4 stars. Why? There were some conversational parts that are helpful. This happens especially when the author highlights pitfalls and real-world application on distribution laws and showing/interpreting graphical analysis (although he doesn't point out how it's done). I can put these ideas to use, and they are valuable, because they show the true expertise of the author and can serve as a guideline for people learning to get familiar with advanced statistical analysis. And I want to give credit to the broad scope of the book. I prefer this to textbooks that focus on one aspect only. Although the book is often too abstract, I appreciate the approach to cover many topics in 10-20 page essays.
Comment | 
Was this review helpful to you?
3 of 3 people found the following review helpful
Very good 12 Dec 2011
By heltz
Format:Paperback
I you have no idea of what a statistical analysis or data analysis is and you'd like to know how to do it. This is your book.
There is not much formulas so if you want a book that gives you the mathematical basis this is not for you.
This book presents the different methods and the tools you can use to get them with some specific examples.
It is an applied book to get it on quickly.
The structure is done in a way that you can pick chapter you are interested in and come back later ofr other parts.

Definitely a good purchase.
Comment | 
Was this review helpful to you?
Most Helpful Customer Reviews on Amazon.com (beta)
Amazon.com:  25 reviews
131 of 145 people found the following review helpful
It falls short of initial expectations 7 Feb 2011
By J. Felipe Ortega Soto - Published on Amazon.com
Format:Paperback
This book is aimed at offering a practical, hands-on introduction to data analysis for pragmatic readers without strong scientific or statistical background. Some basic programming experience is required. The author provides many personal (and sometimes useful) comments about different tools and procedures in data analysis.

However, a careful reading reveals many problems, specially an obscure presentation of key concepts. In my opinion, the target audience for this book would be people without previous contact with data analysis. Hence the importance of presenting its core elements correctly. Otherwise, it's useless for them.

In particular:

- Few pages are actually dedicated to present open source tools supporting the different graphs and techniques included in the book. From the title, I expected a more complete tour through available open source tools for data analysis.

- No clues about how to obtain most of the graphs and results presented in the book. No related data sets are available for download, either. A book like this is useless if we cannot learn how to replicate all the examples.

- The formula of the variance for a sample is just wrong. One must divide by n-1 and not n; see "Applied Statistics and Probability for Engineers" (Montgomery and Runger 2006).

- The author presents one of the most obscure explanations for the median I've ever come across. Recurring to an RFC (RFC 2330) to explain such a simple concept is really awkward.

- In chapter 3 and Appendix B, natural logarithms (base e) are presented in the text, while graphs plot powers of 10. Definitely, not the right way to transmit correct concepts and methods.

- I concur with a previous review in that "Workshop" sections just present an ultra-short overview of some open source tools. A quick search in your favourite engine will display much more informative introductions (even quick start guides).

- Today, effective data analysis heavily depends on using the best possible implementation. While I might find educational to learn some of this implementations, in a real situation it is much better to rely on precise implementations of algorithms already available (e.g. libraries in GNU R).

All in all, I still recommend "R in a Nutshell" for a gentle introduction to data analysis with an open source tool (GNU R). It also has some inaccuracies and typos, but at least it's much more informative and clear. Besides, it does include an R package with all datasets and examples, ready to be installed and explored.
24 of 24 people found the following review helpful
Full of insight, light on details 17 April 2011
By Code Monkey - Published on Amazon.com
Format:Paperback
This book covers such a wide range of topics that it necessarily skims over all of them but it always hits all the major points that an introductory survey should. Each chapter has a straight forward tone, strikes the right balance between developing mathematical rigor and developing an intuitive understanding of data , and undeniably passes on the lessons of hard earned, real world experience. But a reader who is actually working on a real data problem will almost certainly come to the realization that the understanding gained is somewhat superficial - that it's going to take a lot more heavy reading (probably of books, papers, and software tools recommended in this book) to get any real work done!

The single biggest problem with this book is its misleading title. This book is not going to teach you how to use open source software to analyze data. There is only minimal information about how one would actually use the software tools being discussed. What you get is a brief commentary about what the author thinks each software package is good for. It's the same story as with the mathematical details: you will not find them here, but this book will give you an excellent idea of what to look for. So in the end it does leave you feeling just a little bit cheated, even though all the advice you got seems extremely well informed.

What this book does astonishingly well is communicate an attitude to data analysis that most textbooks (and nearly all the college courses I took) seem to miss. Nearly every chapter is a stream of stunningly insightful observations on how to approach data, without the mathematical detail that overwhelms most practicing programmers. I would recommend it to any reader who understands that truly useful insights are hard to come by, but detailed algorithms and formulae are easily found in the Internet Age. I wish the book were a few hundred pages shorter, that it corrected a few sloppy mistakes (like confusing revenue and profit), but I'm certainly glad I read it.
34 of 36 people found the following review helpful
Good, not great. Prerequisites and chapter organization issues. 27 Jan 2011
By Peter Alfheim - Published on Amazon.com
Format:Paperback
The book is very good for the intermediate-to-advanced data analysts. Beginners beware: there are some important prerequisites that are not obvious before you buy it, and there are some organization problems.

First, the prerequisites. "I strongly recommend that you make it a habit to avoid all statistical language"..."Once we start talking about standard deviations, the clarity is gone." These are two sentences in the same passage from the Preface. The rest of that passage is similar. However, even the first chapters make heavy use of statistical language. Moreover, they assume that you already know statistics to the level of density estimation, noise, splines, and regression. Page 21 even features a footnote about the Fourier transform and Fourier convolution theorem. Clearly this book is not for the statistically-shy or for mathematically-shy in general, no matter what the Preface suggests. You also need to know Python and R.

Second, the chapter organization problems. There's a mismatch between the first part of each chapter, which introduces concepts and techniques, and the Workshop part of the same chapter, which uses software. I was expecting the Workshop to illustrate the implementation of the same concepts and techniques. It's not really so. The Workshop introduces Python and R facilities at a different (lower) speed than the rest of the chapter. One could even wonder why the Workshop is in the same chapter. I'd rather that each chapter consisted of a few detailed case studies that first introduce concepts and techniques and then illustrate them with software libraries.
Search Customer Reviews
Only search this product's reviews

Customer Discussions

This product's forum
Discussion Replies Latest Post
No discussions yet

Ask questions, Share opinions, Gain insight
Start a new discussion
Topic:
First post:
Prompts for sign-in
 

Search Customer Discussions
Search all Amazon discussions
   


Listmania!


Look for similar items by category


Look for similar items by subject


Feedback


Amazon.co.uk Privacy Statement Amazon.co.uk Delivery Information Amazon.co.uk Returns & Exchanges