on 19 December 2013
This is, to the best of my knowledge the first and only book at time of writing, concerning the topic at hand. While it doesn't provide too much insights into the design philosophy of Kafka (Why did LinkedIn write this tool? What are the fundamental assumptions? etc.) it does provide an added value to the material (incl. the online documentation or videos about deployment, etc.) available elsewhere.
The Ch1 introduces Kafka, its essential characteristics (persistent, high throughput, distributed pub-sub messaging system) and a hint of some use cases. Especially this section is incredible short and would benefit from expansion. In Ch2 and Ch3 the author covers setup and deployment. What is missing here is a troubleshooting guide. As long as all goes normal, there is no problem, but what I would expect from such a book is hints how to fix things. Then, in Ch4 the Kafka internals are described. I found this chapter well written. The chapters 5 and 6 cover how to write Kafka producers and consumers respectively. Again, nice job describing what needs to be done but lacking the troubleshooting part. In Ch7 the author describes integration scenarios with Storm (way to short!) and Hadoop. The Ch8 then describes the tooling around Kafka (administration, debugging, replication, etc.).
Overall, the book would benefit from a running example that is implemented end-to-end with (much) more details what to do when things go south. Also, a proper reference section would help. Finally, the book would benefit from some more proof-reading and technical editing.
on 31 December 2013
When I received this book the first thing that struck me was its length. It is a very short book, weighing in at a mere sixty nine pages. Despite its brevity, it covers quite a wide range of topics. Some of these are very useful for newcomers, such as how to actually install Kafka and its design fundamentals. This well grounded approach to learning about Kafka continues throughout most of the book, making it excellent for someone looking to know more about Kafka and maybe wanting to play around with it.
This book is definitely focused more towards those beginning Kafka. It begins by discussing why Kafka is needed and some of the problems it solves. This is a good way to start the book as it focuses the reader on what Kafka is aimed at, allowing them to quickly determine if it fits their needs. Keeping with the beginner-centric approach it even explains how to install Kafka in several different modes, which will definitely help anyone looking to experiment in getting a working environment up and working very quickly. Throughout the book when it gives examples, which it does frequently, it mentions which type of cluster to set up as well as things like replication factor and partitions for the topics that are created as part of the examples. This hands-on approach means that by the end of the book, if the examples are followed faithfully, the reader will have a good idea how to set up and manage the various parts of a Kafka cluster in most of the common configurations I have come across. Some more advanced options are covered too, such as setting up several brokers on one machine and sharing one zookeeper instance.
Despite the clear focus on beginners, I believe this book is still useful even to those with some Kafka experience. At the time of writing I have been using Kafka 7 for around 6 months. This book has a section that concisely covers the differences between Kafka 7 and 8, how to set up both clusters and some of the other tools such as mirroring as well as briefly mentioning how to migrate from 7 to 8. The book as a whole is mostly focused on version 8 but is still useful for those wishing to experiment with 7. I think this is probably the best approach that could have been taken, as due to publishing time scales and how close version 8 is to a final release, version 8 is probably what most newcomers should use and the brief comparison will help existing users decide if they need to switch. The mention of migration was unfortunately missing concrete examples, but being as this is a beginner-focused book and only existing users would need this information, I can see why these were excluded. On the other hand, I think this attempt to keep the scope narrow backfires near the end of the book as it begins to look at integration with Storm and Hadoop. It vaguely mentions what these technologies are, tells the reader the vague approach and the classes to use. However, given the potential utility of these sections, they were sorely missing the easy-to-follow examples prevalent throughout the rest of the book, and I feel this puts them outside of the book's scope.
My favourite thing about this book is that it clearly defines its audience early on and sticks to it (right up to its last few pages). It is an excellent book for beginners, full of simple examples, diagrams and screenshots showing expected output. There is also a clear progression; it starts with setup and command line tools and progresses to more advanced custom programs. All in all it is a good introductory book, which covers all the topics often missed out of introductory texts, re-enforces past points and keeps the reader well grounded by explaining the reasoning behind Kafka's existence and design.
What I disliked most about this book was the final few pages. The blurb leads the reader to expect to come out knowing how to integrate Kafka with Storm and Hadoop, but the book only vaguely covers the approach the reader should take and some classes to read up on, making the last few pages feel very rushed. I have worked on a job that mirrored Kafka data to Hadoop and while I recognised the class names and approach were correct, it is a reasonably complicated task. Given what seems to be the book's target audience, I believe these pages do not really fit with the rest of the contents.
This book could definitely be improved by expanding the final chapter to include more beginner-friendly content. Also, at the time of writing, the final example in the section on writing consumers is very poor. It was clearly meant to be used to show how to build a multi-threaded consumer but once instantiated the executor service remains unused and the partitions are all consumed from the control thread. This means that the example appears to work but does not actually show multi-threaded consumption. This example can be fixed relatively easily, but given that multiple partitions consumed in parallel is probably one of the more common use cases it is disappointing that this error made it this far. The example itself seems not to have been properly thought out in terms of multithreading, as even if it is made to use many threads, the executor is immediately shut down once initialised so this needs refactoring. I have reported this error to the publishers and they are currently reviewing my suggested corrections, but it is worth mentioning that this example is incorrect to prevent frustration as readers try to apply what they have learned in real use cases.
Buy this book: If you are new to or interested in Kafka or find the documentation a bit daunting, this is a good book to have and given its focus on Kafka 8 it should be relevant for a good while. It would also be a good book for anyone who needs to get up to speed on Kafka quickly.
Do not buy this book: If you are just interested in the chapter on integration with Storm and Hadoop (they are too brief and high level to be of much use) or you already know your way around Kafka.