Apache Solr High Performance Paperback – 25 Mar 2014
|New from||Used from|
- Choose from over 13,000 locations across the UK
- Prime members get unlimited deliveries at no additional cost
- Find your preferred location and add it to your address book
- Dispatch to this address when you check out
Customers Who Bought This Item Also Bought
Enter your mobile number below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
Getting the download link through email is temporarily not available. Please check back later.
To get the free app, enter your mobile phone number.
About the Author
Surendra Mohan, who has served a few topnotch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cuttingedge technologies such as Drupal and Moodle for more than nine years. He also delivers technical talks at various community events such as Drupal meetups and Drupal camps. To know more about him, his writeups, and technical blogs, and much more, log on to http://www.surendramohan.info/. He has also authored the book Administrating Solr, Packt Publishing, and has reviewed other technical books such as Drupal 7 Multi Sites Configuration and Drupal Search Engine Optimization, Packt Publishing, and titles on Drupal commerce and ElasticSearch, Drupalrelated video tutorials, a title on Opsview, and many more.
Top Customer Reviews
* updation (found throughout the book) - didn't anyone English proof-read this?
* explanations are either non-existent, or poorly written. Things like setting up a Solr Cloud are a joke
* this is a book that deals with high performance, but fails to properly address its goal. Just one example: Uwe Schindler notes that documentCache is a legacy setting, which is no longer needed. Instead, as much memory as possible should be given to the OS, that has the best protocols for caching drive reads. The book offers no mention of this and instead praises the documentCache. Why would I ever buy a book that gives less or worse information than what's available for free online? Isn't the whole point of it to centralize and filter information, so that I, as a busy professional, have a proper reference to go to?
* the whole book feels like a sloppy money maker for Packt. Avoid!
Most Helpful Customer Reviews on Amazon.com (beta)
The books that I read in order of preference were:
Mastering Apache Solr: A practical guide to get to grips with Apache Solr
Scaling Big Data with Hadoop and Solr (Community Experience Distilled)
Apache Solr High Performance
This book contains what I consider to be one of the most bizarre inclusions, Chapter 1 shows how to install SOLR as a service on windows. Why you would ever want to do that is beyond me, I can only assume that the offer felt this inclusion would make the topic more accessible to those on the Windows platform. I certainly hope nobody intends to run a serious SOLR installation on the Windows platform. This is a recipe for frustration. If however you are insistent in running on Windows, this is the book for you.
Chapter 2 provides excellent coverage of scoring and the various ways to leverage and control scoring within SOLR.
Chapter 3 is about performance optimization. This chapter is good, but really just talks about the features that make SOLR performant. It provides detail on how to configure and use those features, but it does not address what I have found in my experience to be the crux of the problem of SOLR performance, system tuning.
Chapter 4 is labelled Additional Performance Optimization, but has little to do with performance and more to do with advanced SOLR features. In fact use of most of these features can severely impact performance. This chapter covers similar document queries, sorting by function, homophones, stopwords and word filtering.
Chapter 5 is Troubleshooting. This chapter covers some very common scenarios that will inevitably happen in a production SOLR environment. If nothing else the list of scenarios can be very useful in creating your disaster recovery plan. That said, the list is not really complete, and the coverage is light. For example one of the sections is basically just covering the use of optimize, as though that is a troubleshooting technique, and not a regular best practice. The section on garbage collection is a good start, but really any book on SOLR should have a chapter dedicated to GC.
Chapter 6 is poorly named, "Performance Optimization with ZooKeeper". It should have just been named "ZooKeeper". ZooKeeper does nothing to improve the performance of SOLR, though it is a requisite stepping stone to the world of SOLR Cloud. This chapter also does not discuss in detail making ZooKeeper performant. Rather this chapter is a ZooKeeper tutorial, and a pretty good one.
And the book abruptly ends. No further coverage of SOLR Cloud. No coverage of high performance indexing or data ingestion. No discussion of tuning SOLR for large installations, scaling search, balancing memory allocations, and so on. There is a smattering of information throughout about performance optimized schemas, but it is not collected or concise. I feel like the author hit his deadline and submitted what he had.
This book was a good start, but falls short of being more than an introduction to SOLR and certainly does not live up to its name. This book provided very little in the way of code samples. Most of the topics were covered at a very high level without significant detail. For these reasons I rate it 3 stars.
There are six chapters in this rather short book, some of them did not fully apply to my use case. For example there is a chapter about installing Solr, which thankfully is only a few pages and only about using Windows. The rest however is applicable to all operating systems. The second chapter was for me the most valuable, covering the entire topic of boosting your search. You will learn a lot about how Solr (or Lucene) work behind the scenes and most likely the knowledge you gain here will have an impact on the way your application works. I ended up refactoring my query creation and indeed it enhanced the user experience by far. The accuracy was much better (therefore effectively limiting the need to create multiple queries from a user's perspective).
As I did not use a Solr cloud only the caching aspects were interesting to me from the third chapter about performance optimization. Therefore also chapter 6 about ZooKeeper (when using multiple Solr servers) did not apply for me. The additional performance tricks of the fourth and the troubleshooting hints of the chapter fifth chapter on the other hand were quite interesting, even if I did not need all of them (yet?).
If you are using Solr and you are at a point where you have things working but there are still question marks on many areas for you, this book is a valuable read to gain some hands-on knowledge to improve your projects. I suggest you get a copy before your project goes live as it will save you a bit of rewriting code after go live.
My favourite part of the book is the chapter on Troubleshooting. The problems covered in that chapter are bound to happen to everyone dealing with Solr especially when you are setting up and hosting it yourself. For someone relatively new to Solr this is a great pool of knowledge. A knowledge you would otherwise only get through tedious trial and error, I really felt I could benefit from the author’s experience here!
That said considering this is a book on performance I was missing chapters on fields, schema and index design, and a chapter on benchmarking your performance improvements. This book is really a collection of a best practices and in that sense it's great and a good read for every Solr dev who wants to go beyond the basic setup.