To be honest and straightforward I expected more from a book with title like Raspberry Pi Super Cluster. The author Andrew K. Denis has a very clear vision on the subject (like in his previous book Raspberry Pi Home Automation with Arduino, which I liked a lot). He's done his best to deliver an exhaustive set-up while being concise at the same time, but it seems to me, this clearly is the wrong format for a book on the given topic.
Now having this book at hand, I finally got the chance to answer many of the questions I had about clustering, and how it can be applied to a set of Raspberry Pis. The first impression is that it is very well structured and gradual. Lets see, the first two chapters are short introductions to parallel computing (background history and the contemporary systems) and the initial set-up respectively. They're short and to the point. And that's the way it should be - it is presumed that if you're going parallel, then you're somewhat advanced tinkerer already. Actually the second chapter is pretty abundant in details on how to install the operating system, the required software and tools. I skimmed through it, because I already had the two Raspberry Pi units pretty well equipped with what was needed.
The next chapter is the first encounter with a parallel software in the face of MPICH - one of the oldest and most widely adopted implementations of the MPI (Message Passing Interface) implementations, which is designed for applications written in C, C++ or Fortran. In this chapter we also come to one tricky part - setting up of the second (equally applicable to third, fourth, and so on) Raspberry Pi unit. It is tricky because it's a continuation of the set-up started in the second chapter and must be followed strictly. Especially the part with the RSA keys exchange. If you get just one thing wrong, you may have to start all over (like myself). The good news is that the procedure is short and not as much obscure, as one can imagine for a set-up concerning security matters. Once the berries are prepared correctly, the only thing you'll care further on would be the parallel frameworks and your applications.
At chapter four, after we've calculated the number Pi with a small MPICH application written in C, we finally arrive at one of the most popular representatives of the modern trends in parallel software - Apache Hadoop. Its installation is quick, but the configuration is a bit detailed, especially when you take into account that most of the things have to be done at least twice. Here I met the biggest downside of the book - the lack of any trouble shooting for the situations when you get stuck. Although I followed every step verbatim, there were errors logged on the console, for which there was no help around. Fortunately the messages are somewhat self explanatory, so with little deduction one can get to the next step fairly easy. Another disadvantage, if I may call it that, is Hadoop's version. I don't know when the book was written, but when it says "Download the latest version", on the project's site you get version 2.2.0, while for the book this is version 1.2.1. This wouldn't be be much of a problem, if Hadoop's architecture hasn't been changed significantly. So if you prefer the latest, the instructions in the chapter are of no use for you. If the author had a good reason to stick to the older branch of the software, this reason remains obscure to the reader. There are few lesser inaccuracies like wrong documentation URL, not whole scp commands, and a sense of text that was a bit too rushed.
Now having the framework for parallel computing already set, it is time to test it with an application. Since Hadoop is written in Java (as a typical Apache project), its main target implementation language is Java. The test application is counting some words from an input file and is not particularly interesting, but gives a simple and comprehensible introduction to the MapReduce concept. More interesting is the Monte Carlo algorithm's approach described in the sixth chapter. The good thing is that it is compared side by side with analogous C program for MPI. This actually is the culmination and the essence of the book. For further investigations of the concepts and ways to apply the parallelism in practice, help is available online. Many resources are given in the appendix.
The last chapter is quite handy in general and beyond the scope of the book. The instructions for booting the Raspberry Pi with an external USB HDD as an auxiliary data storage seem very useful. The building of LEGO case for the cluster, and the suggestions for alternative energy sources give interesting views to Raspberry Pi on their own.
All in all setting up a cluster form Raspberry Pi units is shown to be not so complex as expected. Only the correct set of steps should be followed, and followed strictly at times. If not giving a hint to certain project, this book at least puts you in a firm starting position on the road to parallelism.
5 people found this helpful.
Was this review helpful to you?
I've seen several YouTube videos showing clusters of Raspberry Pi's and, as I have a few Raspberry Pi's, thought I would have a go at building my cluster. Looking for information on how to do it, I came across this book which provides a great introduction into parallel computing and also provided very clear, concise instructions to set up a cluster of 2 Raspberry Pi's.
I love reading the back story to the things I learn and the first chapter provides a really good history to parallel/distributed computing. Setting up the Raspberry Pi's is covered in the second chapter and is done well and without going into the finite detail of setting up a Pi that is covered (and repeated!) in so many other places. In the third chapter, the process to setting up MPI on the Raspberry Pi's is covered which enables the Raspberry Pi's to be connected in a parallel computing environment. This is then taken forward into chapter 4 and where we set up Hadoop and MapReduce. Hadoop enables distributed applications to be written and MapReduce is intended to enable systems to process large datasets. In setting up MPI, Hadoop and MapReduce, simple applications are written but the book then brings this all together by writing an application to calculate pi using Hadoop, and then the same in MPI to compare the two technologies. Finally, the book provides some very useful information on how to take things further.
Overall, I thoroughly enjoyed reading and using this book to take me into an area of computing I've never delved into. I thought the book was concise, easy to read and the examples were clear and easy to follow and I'll certainly be keen to read Andrew K. Dennis' future books
3 people found this helpful.
Was this review helpful to you?