Agile Data Science and over 2 million other books are available for Amazon Kindle . Learn more

Sign in to turn on 1-Click ordering.
Trade in Yours
For a £3.64 Gift Card
Trade in
More Buying Choices
Have one to sell? Sell yours here
Sorry, this item is not available in
Image not available for
Image not available

Start reading Agile Data Science on your Kindle in under a minute.

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Agile Data Science: Building Data Analytics Applications with Hadoop [Paperback]

Russell Jurney
2.0 out of 5 stars  See all reviews (1 customer review)
RRP: £25.99
Price: £24.79 & FREE Delivery in the UK. Details
You Save: £1.20 (5%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
Only 1 left in stock.
Dispatched from and sold by Amazon. Gift-wrap available.
Want it Monday, 28 July? Choose Express delivery at checkout. Details


Amazon Price New from Used from
Kindle Edition £15.14  
Paperback £24.79  
Trade In this Item for up to £3.64
Trade in Agile Data Science: Building Data Analytics Applications with Hadoop for an Amazon Gift Card of up to £3.64, which you can then spend on millions of items across the site. Trade-in values may vary (terms apply). Learn more

Book Description

28 Oct 2013 1449326269 978-1449326265 1

Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop.

Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps.

  • Create analytics applications by using the agile big data development methodology
  • Build value from your data in a series of agile sprints, using the data-value stack
  • Gain insight by using several data structures to extract multiple features from a single dataset
  • Visualize data with charts, and expose different aspects through interactive reports
  • Use historical data to predict the future, and translate predictions into action
  • Get feedback from users after each sprint to keep your project on track

Frequently Bought Together

Agile Data Science: Building Data Analytics Applications with Hadoop + Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython + Doing Data Science: Straight Talk from the Frontline
Price For All Three: £65.92

Buy the selected items together

Product details

  • Paperback: 178 pages
  • Publisher: O'Reilly Media; 1 edition (28 Oct 2013)
  • Language: English
  • ISBN-10: 1449326269
  • ISBN-13: 978-1449326265
  • Product Dimensions: 23.2 x 18.1 x 1 cm
  • Average Customer Review: 2.0 out of 5 stars  See all reviews (1 customer review)
  • Amazon Bestsellers Rank: 366,521 in Books (See Top 100 in Books)
  • See Complete Table of Contents

More About the Author

Discover books, learn about writers, and more.

Product Description

Book Description

Building Data Analytics Applications with Hadoop

About the Author

Russell Jurney cut his data teeth in casino gaming, building web apps to analyze the performance of slot machines in the US and Mexico. After dabbling in entrepreneurship, interactive media and journalism, he moved to silicon valley to build analytics applications at scale at Ning and LinkedIn. He lives on the ocean in Pacifica, California with his wife Kate and two fuzzy dogs.

What Other Items Do Customers Buy After Viewing This Item?

Customer Reviews

5 star
4 star
3 star
1 star
2.0 out of 5 stars
2.0 out of 5 stars
Most Helpful Customer Reviews
1 of 1 people found the following review helpful
2.0 out of 5 stars erm short and expensive 8 Mar 2014
Format:Paperback|Verified Purchase
Please note, this is a practical reference, little on the bigger picture of big data. The 178 pages listed includes the index and probably 3/4 of 158 pages of real book are code listings and screen shots so being generous that's 40 pages of proper scribblings. I suspect the author wrote this within a day or two. I have read the entire book in one sitting, just a couple of hours.

My reason for the poor review is not reflective of the content, just the disappointment in that it's only really a booklet and that you're paying 50p per page.
Comment | 
Was this review helpful to you?
Most Helpful Customer Reviews on (beta) 3.8 out of 5 stars  6 reviews
8 of 8 people found the following review helpful
5.0 out of 5 stars Absolute required reading for all new data scientists 3 Jan 2014
By chad - Published on
Format:Paperback|Verified Purchase
I was once told by a chief data scientist that they would rather teach a mathematician programming than a programmer math (to be a data scientist). After being a data scientist for some time now I would have to respectfully disagree. 85% of data science is plumbing and I wouldn't hire a physicist to be a plumber. Indeed modern data scientists really do need to be full-stack developers trapped in an academic's body.
Jurney nails it! He offers tools and methodologies adapted to common data science workflows and their associated pitfalls wherein we spend 85% of our time plumbing and 15% of our time integrating some off-the-shelf algorithm to find deep insight.
So, for new data scientists or 3rd-4th year grad students who have balanced their Twitter API hack with NSF grant deadlines, this is ABSOLUTELY REQUIRED READING.
7 of 7 people found the following review helpful
4.0 out of 5 stars Chapter 3 alone is worth the price of the entire book. 5 Feb 2014
By Carsten Jørgensen - Published on
Book review - Agile Data Science by Russell Jurney, O'Reilly Media

The subtitle "Building Data Analytics Applications with Hadoop" of this book says more about the book than the actual title "Agile Data Science". However the subtitle will probably fool most people. Before reading this book I believed that Hadoop with the the distributed file-system HDFS. If you are looking for a book about building applications on the of HDFS then this book IS NOT for you. It turns out that Hadoop is much more than just HDFS.

Do not buy this book for learning about agile software development methodologies. There are some rather strange comments about personal and private space requirement for creative workers as well as mentioning of "Easy access to large-format printing is a requirement for the agile environment." The discussion about agile methods for working with data science is interesting. The basic question is if it is possible to bridge agile methods and data science since science in it's nature does not consists of a predefined set of tasks. It seems to me that the tools and software used in chapter 3 are called agile an hence is the process agile. In part II of the book the application build is chapter 3 is refined in a number of steps that the author calls iterative. But again, that does not make the process agile. I am not saying that the author is wrong but the point about the agile method and how process and tools interact to make the development agile is not entirely clear to me.

This is NOT a book about the inner workings of Hadoop. Please refer to "Hadoop: The Definitive Guide" by Tom White for O'Reilly Media for a thorough introduction to Hadoop. Instead the book takes a very practical approach and show us how to build agile applications using various Hadoop components like Pig, MapReduce, and the Avro serialization framework. In addition you will see how to move data into the popular noSQL database MongoDB and how to use ElasticSearch to search the data. Finally, all the collected data is accesses through a lightweight web application build with Python and Flask with visual enhancement made in Bootstrap and D3.

Agile Data Science covers a lot of material and uses lots of different software and tools. If you want to run the examples in the book you have two options 1) a user-contributed Linux Vagrant image is available with most of the required software or 2) you can follow along the instructions given in the book and the accompanied Github project and install the software yourself. In either case you have to pay close attention to software versions. All of the examples work but it does require some effort the get them running and if you feel uncomfortable using a terminal and command line you might have a hard time playing with the examples.

Being able to work in an agile way with data science is quite important but I do not feel that the attempt made by the author convinced me that the suggested framework will work in a practical setting.
The main value of this book is definitely chapter 3 where Jurney show us how to go from zero to a working data science application. The application is literally build from ground up starting with data collection over storing data to build a web front-end. This chapter is alone worth the price of the entire book.

Part II of the books contains interesting material about data visualizations and prediction models. For many readers some prior knowledge about Naive Bayes and the Natural Language Toolkit would most likely be useful to fully understand the implications of the predictions made around what makes an email likely to receive a response.

I review for the O`Reilly Reader Review Program and I want to be transparent about my reviews so you should know that I received a free copy of this ebook in exchange of my review.
3 of 3 people found the following review helpful
1.0 out of 5 stars He sent this with NON-WORKING CODE 9 Jun 2014
By Sean Franks - Published on
Format:Kindle Edition|Verified Purchase
The story is nice, but the code that forms the basis of the entire project behind the book DOESN'T COMPILE. The author has - as of today (June 9, 2014) - removed all of the github references to the project.

I"m half way through the book, have been practicing Agile development techniques for several years, and I am not quite sure what in particular makes this book about Data Science 'Agile' based.

One thing that he does nicely is explain the Pig code he uses, but I can't use those programs because the Python programs that gather the data that feed Pig will not compile, even after I de-bugged his code for several hours. (Example: the author made reference to an RFC inline in the Python code that would have NEVER compiled. NEVER. Line 11 from call to email utilitiies)
1 of 1 people found the following review helpful
5.0 out of 5 stars This book plunges right into the meat of data science 28 Feb 2014
By gatorgirl - Published on
Format:Paperback|Verified Purchase
I really like the introduction - it gives a solid and good overview of how a data shop functions, and the different types of organizational roles.

After that, the book moves pretty fast. The sections are brief, but thorough. The author is forward thinking enough to put his examples on GitHub, so that was really helpful.

Overall, I really recommend this book. There was one section I had a lot of trouble with, and it was mostly versioning errors with Hadoop and Mongo, other than that, everything's been pretty straight forward.
3.0 out of 5 stars Could be useful for data scientists who need a familiarity with deployment environments. 17 Jun 2014
By K. Luangkesorn - Published on
One of the problems with data science is that any description of what is encountered takes on the appearance of a mythical unicorn, noone person could possibly have all of the skills required. And it gets worse when you add to the standard set of statistics, domain knowledge, and programming the ability to deploy the application into a high speed environment. This book is not going to make a data scientist an expert in running a data center, but it is useful to give someone who has the rest of the skills an understanding of the environment their work will be deployed into.

One of the conflicts between the data scientist/analyst and information technology groups is that while the data scientist gives the data owned by the organization its value, IT is charged with storing the data and providing the access. And in a high velocity, high volume environment of big data, not understanding how the architecture works can lead to the data scientist creating valid solutions that cannot be applied in the actual day to day working environment. That is where this book comes in. The book has associated virtual machines in software repository so that the data scientist who does not know anything about infrastructure and the software stack that the data and the analysis rides on can see how everything fits together.

The book title is misleading. This is not a book about data analytics. This is a book for data analysts so they know how their analytical application is deployed and applied to day-to-day use in enterprise environments. For that reason it is useful.

Disclaimer: I received a free electronic copy of this book as part of the Oreilly Press Blogger program.
Were these reviews helpful?   Let us know
Search Customer Reviews
Only search this product's reviews

Customer Discussions

This product's forum
Discussion Replies Latest Post
No discussions yet

Ask questions, Share opinions, Gain insight
Start a new discussion
First post:
Prompts for sign-in

Search Customer Discussions
Search all Amazon discussions

Look for similar items by category