- Save 10% on Books for Schools offered by Amazon.co.uk when you purchase 10 or more of the same book. Here's how (terms and conditions apply) Enter code SCHOOLS2016 at checkout. Here's how (terms and conditions apply)
Cassandra: The Definitive Guide Paperback – 2 Dec 2010
- Choose from over 13,000 locations across the UK
- Prime members get unlimited deliveries at no additional cost
- Find your preferred location and add it to your address book
- Dispatch to this address when you check out
Special Offers and Product Promotions
Frequently Bought Together
Customers Who Bought This Item Also Bought
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
About the Author
Eben Hewitt is Director of Application Architecture at a publicly traded company where he is responsible for the design of their mission-critical, global-scale web, mobile and SOA integration projects. He has written several programming books, including Java SOA Cookbook (O'Reilly).
What Other Items Do Customers Buy After Viewing This Item?
Top Customer Reviews
With this being said, a warning: to get the most out of this title, the reader must have a good grasp of both Java (all the code is in Java!) and relational databases. Yes relational databases, because throughout the whole book the author constantly presents challenges and how they could be solved with RDBMs (if they ever could) and Cassandra.
I like the approach of the author. He doesn’t want the reader to switch whatever database he’s using to Cassandra. There is no need to drive a semi truck to go buy cigarettes. No, the author rather wants us to know what Cassandra is and what it can offer so that we can make an informed decision. The question thus is what would you do if you had this durability, this scalability and these blazing fast writes?
In these 300 pages all the aspects of the life cycle of a Cassandra cluster are covered: installation, configuration, monitoring and how to keep it healthy. The code is not missing but, back to the original problem, it refers to an outdated API or, worse, to the CLI, which is now close to get completely deprecated, which means that to replicate what the author does, you often have to go search in the CLI wiki.
A nice book, no doubts. While the project significantly evolved since 2010, it still provides valuable information to anyone new to Cassandra.
As usual, you can find more reviews on my personal blog: http://books.lostinmalloc.com Feel free to pass by and share your thoughts!
Most Helpful Customer Reviews on Amazon.com (beta)
Now that the book's out and I've had a chance to read it once through, I have to say that it does not meet my expectations. The author is clearly very interested in his subject and also very anxious to share insights not only into Cassandra but into modern non-relational databases in general (to the extent of including a 25-page appendix "The Nonrelational Landscape" at the end of the book). He does a pretty good job of explaining how Cassandra works at the level of distributed storage including scaling as well as availability and consistency. And though I haven't gone through the steps, he seems to give pretty good instructions for installing, configuring and monitoring a Cassandra cluster.
What he doesn't cover nearly as well as I was hoping (and would have expected from an O'Reilly book) is data modeling in Cassandra and the actual APIs for putting data into the database and getting data out (i.e. querying). It's not that he doesn't cover these subjects at all. In fact he devotes two chapters to data modeling (Chapter 3 The Cassandra Data Model and Chapter 4 Sample Application) and two to APIs (Chapter 7 Reading and Writing Data and Chapter 8 Clients), and these chapters contain a lot of useful information. The problem is that the information I really want is either mixed in with other, for me, less important information and/or is too limited or even not present at all.
Here are some things that I would have expected to be presented in reasonably full, coherent form in a "definitive guide" to Cassandra:
Column families, supercolumns and columns - what are they for, how do you use them effectively? Especially supercolumns, which, in conjunction with the intrinsically sparse data representation, allow you to blur the distinction between structure and data and store data in "wide" format and even as out-and-out row-specific lists. He touches on matters of this sort, including in the design patterns at the end of his Data Modeling chapter, but doesn't integrate them into a coherent account of how to use the Cassandra data representation model.
Lack of joins - what are the alternatives? He addresses this issue too, but mostly says, denormalize your tables and design for common queries - or even more bluntly, precompute the results of your common queries and put them into your database. This may be a good approach in some situations, but leaves a lot of questions like, when do you precompute your query results, where and how, what triggers the computation, and how do you handle data changes that invalidate previously precomputed query results (one of the problems that normalization and joins were originally designed to solve). Also, I believe he does not say very much about implementing joins and other complex queries on the client side. Does Cassandra have properties that determine more vs. less efficient ways of doing this? How important is planning for locality in your column family organization? And supercolumns for maintaining lists/sets so that you don't have to assemble them at query time?
Primary API - what is it? As the author explains, Cassandra doesn't have a query language, so he can't offer a chapter on the Cassandra equivalent of, say, SQL for relational databases. But Cassandra does have an API that lets you put data in and get data out, if not also other things like creating and deleting column families, supercolumns and columns. I was really expecting a chapter (or appendix or whatever) listing out the complete set of API requests and responses, either in some language-neutral format or in terms of the "native" Cassandra language, i.e. Java, ideally with additional information on "bindings" for other client-side languages like PHP, Python and so on. Again the information is sort of there, but not pulled together.
Higher-level wrappers - what are they about? The author talks about Thrift and Avro as (at least somewhat) high-level languages for communicating with Cassandra, but doesn't lay out in any coherent what those languages are. These tools may be very familiar to some, but I'm sure not to all. He does provide enough information - especially in the form of external links - to make it possible to start exploring these tools, but I would have expected the book to give a pretty good idea of what they're about without having to go off and read other material.
While I am, overall, dissatisfied with the book, I found it both an interesting read and an engaging introduction to the world of Cassandra. It also undeniably offers a wealth of information, even if it's not exactly the information a person may be looking for. For this reason I'm rating it 3 stars.
The book was written to against version 0.7b2 of Cassandra. That beta status alone should be warning of the perils of premature publication. None of the code examples work (or indeed compile) with the current API (0.7b5). Downloading the latest code from the author's spartan support site offers little gain. The zip ball contains a readme file noting that the code did work once and suggesting the reader fixes it themselves.
There is a consistent pattern of requiring the reader to understand terms which are first defined several chapters later. Slices for example, or setting up the Cassandra JMX interface which is required for data loading in chapter 4 but first described in chapter 8.
Annoying, especially as there is solid information here and it's not badly written. Had the O'Reilly editors been more pro-active, ignored the me-first commercial pressures, delayed publication until the API stabilized and sorted out the structural problems in the writing this could have been a solid read.
#1) The edition I have talks about cassandra-0.7 that is already obsolete (now on 4 March, 2013 - we have 1.2)
The preferred way of accessing the store may be CQL3 now.
#2) As an application developer - The biggest concern I had was around solving my problem or data modeling. I do not want to delve too much into how to create a cluster and all. The example model of Hotel reservation is too simplistic. You are better off reading Jay Patel's Ebay tech blogs or Datastax's metric collection sample on the subject. They do a much better job of explaining the cassandra data model.
Also, any effort to introduce cassandra data modeling in terms of "equivalent RDBMS terms " is fraught with danger as cassandra is actually a big map. The book comes short on my data modeling expectations.
#3) Apart from storage, many people would be looking to run analytic on top of cassandra. It would have been great to explain how to run Hadoop/Pig on top of latest cassandra in detail.
#4) I do not/ cannot comment on how this book is for clustering and administration - because that is not my interest - please check other reviews for that.
The fact that we invest in books because they stand the test of time does not apply here. You cannot pull out this book from shelf two years down the line to check some fact or jog your memory. O'reilly sucks big time. These kinds of book are nothing but an effort to ride the latest wave of technology.
Given that the only real way to learn system is to code to it this presents a real challenge. The current book will give you an overview and feel for Cassandra but will not by itself allow you to start using it.
The initial chapters on downloading and installing the product get the reader started using Cassandra immediately. Then the bumps in the road appear. The current version of Cassandra requires a semicolon to end each statement in the command line interface (CLI) client - that 's missing in the book. This is noted in the errata on the O'Reilly site as "unconfirmed", and if you're coming from a MySQL/Oracle background it's something you might try, otherwise, it's frustrating.
A similar issue crops up two chapters later when the user is told to "start jconsole" to load a YAML schema file. Granted jconsole is not a core Cassandra component, but for the non-Java programmer, this probably entails another trip to Google searching for direction.
This book would be well-served by having an active web site backing it to keep pace with the changes in Cassandra. For now, it's an interesting read, but not very satisfying.
Disclaimer, I was provided access by O'Reilly Publishing to an electronic copy of this book for purposes of review.