or
Sign in to turn on 1-Click ordering.
or
Amazon Prime free trial required. Sign up when you check out. Learn more
More Buying Choices
Have one to sell? Sell yours here
or
Get a £10.00 Amazon.co.uk Gift Card
Hadoop in Action
 
 
Tell the Publisher!
I’d like to read this book on Kindle

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Hadoop in Action [Paperback]

Chuck Lam
5.0 out of 5 stars  See all reviews (1 customer review)
RRP: £31.99
Price: £25.02 & this item Delivered FREE in the UK with Super Saver Delivery. See details and conditions
You Save: £6.97 (22%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
In stock.
Dispatched from and sold by Amazon.co.uk. Gift-wrap available.
Only 5 left in stock--order soon (more on the way).
Want guaranteed delivery by Wednesday, May 30? Choose Express delivery at checkout. See Details
Trade In this Item for up to £10.00
Get an extra £5 when you trade in books worth £10 or more until June 30, 2012. Trade in Hadoop in Action for an Amazon.co.uk gift card of up to £10.00, which you can then spend on millions of items across the site. Trade-in values may vary (terms apply). Find more products eligible for trade-in.

Frequently Bought Together

Hadoop in Action + Hadoop: The Definitive Guide + HBase: The Definitive Guide
Price For All Three: £74.81

Show availability and delivery details

Buy the selected items together

Customers Who Bought This Item Also Bought


Product details

  • Paperback: 325 pages
  • Publisher: Manning Publications; 1 edition (22 Dec 2010)
  • Language English
  • ISBN-10: 1935182196
  • ISBN-13: 978-1935182191
  • Product Dimensions: 23.6 x 18.9 x 1.7 cm
  • Average Customer Review: 5.0 out of 5 stars  See all reviews (1 customer review)
  • Amazon Bestsellers Rank: 52,077 in Books (See Top 100 in Books)

More About the Author

Chuck Lam
Discover books, learn about writers, and more.

Visit Amazon's Chuck Lam Page

Product Description

Product Description

Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.

The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.

Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.

This book assumes the reader will have a basic familiarity with Java, as most code examples will be written in Java. Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples.

About the Author

Chuck Lam

is a Senior Engineer at RockYou!. Chuck received his B.S from San

Jose State University and his Ph.D in Electrical Engineering from Stanford

University, where his thesis topic was computational.


Inside This Book (Learn More)
Browse Sample Pages
Front Cover | Copyright | Table of Contents | Excerpt | Index | Back Cover
Search inside this book:

Suggested Tags from Similar Products

 (What's this?)
Be the first one to add a relevant tag (keyword that's strongly related to this product)
 
(1)
(2)

Your tags: Add your first tag
 

Sell a Digital Version of This Book in the Kindle Store

If you are a publisher or author and hold the digital rights to a book, you can sell a digital version of it in our Kindle Store. Learn more

What Other Items Do Customers Buy After Viewing This Item?


Customer Reviews

4 star
0
3 star
0
2 star
0
1 star
0
Most Helpful Customer Reviews
2 of 2 people found the following review helpful
Format:Paperback
I have some other books on Hadoop and this is the best by far. It clearly explains the core concepts with excellent examples and looks about the sub projects such as Pig and Hive. I would highly recommend this book to anyone wanting to learn Hadoop.
Comment | 
Was this review helpful to you?
Most Helpful Customer Reviews on Amazon.com (beta)
Amazon.com:  4 reviews
13 of 14 people found the following review helpful
Hadoop book for normal people 15 Dec 2010
By Patrick Faith - Published on Amazon.com
Format:Paperback|Amazon Verified Purchase
I really love this book, is made for normal people just trying to get something done. The streaming coverage is perty good, it's the best book for python type of people I've seen. Lot of configuration information - very practical. I can't really review the java examples, but i did like the very practical examples on simple combiners. I think this book in combination with the newer version of "definitive guide"(make sure to get the recent one), really makes a solid statement on the hadoop front. I think both books are mandatory for anyone doing anything serious in hadoop.
6 of 6 people found the following review helpful
Good introduction to Hadoop ecosystem 24 Mar 2012
By Erik Gfesser - Published on Amazon.com
Format:Paperback|Amazon Verified Purchase
After checking out reviews of what O'Reilly and Apress had to offer with regard to Hadoop, I ended up purchasing this book based on positive reviews, my past positive experiences with the Manning "In Action" series of texts in general, such as "Spring in Action" and "Java Persistence with Hibernate", formerly "Hibernate in Action" (see my reviews), and the fact that this book was the most recently published on the subject. In short, this text is well organized, and covers its focus on Hadoop well, but potential readers should be aware that about one-third of what Lam has to offer here are ancillary to Hadoop, and not with regard to Hadoop itself. Inclusion of the larger ecosystem within which Hadoop sits personally makes sense, and I do not think this aspect of the book detracts from what the author provides in any way.

The author provides a good introduction to Hadoop in the first three chapters, which includes a discussion on differences between Hadoop and traditional technologies in this space, such as relational databases, as well as a tour of Hadoop building blocks, working with files in the Hadoop Distributed File System (HDFS), and the anatomy of a MapReduce program. The next three chapters contain the bulk of the text, which focuses on writing MapReduce programs, and includes segments on chaining MapReduce jobs, joining data from different sources, creating a Bloom filter, and monitoring, debugging, and tuning.

The next two chapters offer a short cookbook in which the author presents 5 different general MapReduce techniques (Lam admits that specialized MapReduce techniques can be found rather easily by Googling, and that he does not intend this cookbook to be comprehensive in any way), as well as a chapter on managing Hadoop, followed by four chapters on running Hadoop in the cloud, brief introductions on programming with Pig (a Hadoop extension that provides a language called Pig Latin) and using Hive (a package built on top of Hadoop that provides a SQL-like language called HiveQL). and a chapter that discusses four Hadoop case studies from the New York Times, China Mobile, StumbleUpon, and IBM (the case study from IBM takes up about 50% of the discussion, and the case study from the New York times is less than a page).

Be aware that at the time of this review, this book was published over a year ago. One of the common complaints I read about what O'Reilly and Apress have to offer in this space is that their counterparts to this book cover older versions of Hadoop. In chapter 4, Lam mentions that "one of the main design goals driving toward Hadoop's major 1.0 release is a stable and extensible MapReduce API. As of this writing, version 0.20 is the latest release and is considered a bridge between the older API (that we use throughout this book) and this upcoming stable API. The 0.20 release supports the future API while maintaining backward-compatibility with the old one by marking it deprecated."

"Future releases after 0.20 will stop supporting the older API. As of this writing, we don't recommend jumping into the new API yet for a couple reasons: (1) Many of Hadoop's own library classes in 0.20 aren't written under the new API yet. You won't be able to use those classes if your MapReduce code uses the new API in 0.20. (2) Many still consider the most production-ready and stable version of Hadoop as of this writing to be 0.18.3. Some users are warming up to version 0.20, but we suggest you wait a little longer before going full production with it." The author follows up by writing that "by the time you read this the situation may be different. In this section we cover the changes the new API presents. Fortunately, almost all the changes affect only the basic MapReduce template. We rewrite the template under the new API to enable you to use it in the future."

Exactly two weeks ago today, Hadoop 1.0.1 was released after 6 years of development. Inbetween the version that this book covers, and this most recent version, several intermediary versions were released, which provide bug fixes, improvements, optimizations, and new features, as well as support for some of the offerings in the Hadoop ecosystem. More timely information on open source technologies that enjoy wide community support is always going to be more readily available on the internet, especially via blog posts, but in my opinion this fact does not detract from the value of this text, which still serves as a good introduction to the Hadoop ecosystem, especially for those more comfortable starting out with a published text. Just be aware that you will be quickly referring to other materials after you make your way through this text.

The portions that I especially appreciated about what Lam has to offer include his presentations in chapter 5 on reduce-side joining and creating a Bloom filter, the cookbook that he provides in chapter 7 that includes segments on passing job-specific parameters to tasks, probing for task-specific information, partitioning into multiple output files, inputting from and outputting to a database, and keeping all output in sorted order, as well as chapters 9, 10, 11, which discuss the larger Hadoop ecosystem, especially the introduction to Pig Latin. Recommended to anyone looking for an introduction to the Hapoop ecosystem of technologies who understands that published texts such as this one cannot contain information about the latest releases.
9 of 11 people found the following review helpful
Excellent Intro to Hadoop 8 Jan 2011
By Kenneth DeLong - Published on Amazon.com
Format:Paperback
This book is extremely well-written and clear, as well as being very pragmatic and useful. You can really understand how to set up and use Hadoop. I've read many other articles on Hadoop and MapReduce, but after reading this book I thought "why couldn't those articles have explained it that clearly?"
Search Customer Reviews
Only search this product's reviews

Customer Discussions

This product's forum
Discussion Replies Latest Post
No discussions yet

Ask questions, Share opinions, Gain insight
Start a new discussion
Topic:
First post:
Prompts for sign-in
 

Search Customer Discussions
Search all Amazon discussions
   


Listmania!


Look for similar items by category


Look for similar items by subject


Feedback


Amazon.co.uk Privacy Statement Amazon.co.uk Delivery Information Amazon.co.uk Returns & Exchanges