Top positive review
32 people found this helpful
Gives a first look at an important subject
on 11 April 2013
In "Big Data", Mayer-Schönberger and Cukier discusses the shift in our society towards the ability to generate, store and analyze considerably larger amounts of data than before. There has been a trend towards more data for decades (even centuries, I suppose), but recent technological advances has given rise to a visible qualitative shift in the way which we manipulate data. Statistics used to focus more on getting the most out of few data, whereas in recent decades, there has been rising interest in trying to get information out of large, unruly sets of data (often labeled "machine learning" or "data mining"). The information extracted in such cases are often more vague, but as the authors argue, can nonetheless, based on sheer size and available computing power, lead to essential insights.
Most of Mayer-Schönberger and Cukiers book consists of discussions of examples where an innovative use of a large, unwieldy data set yields large insights or value added. The examples are diverse, ranging from air-ticket price prediction to constructing ocean navigation maps or predicting exploding sewer lids. They make it quite obvious that the usefulness of big data is not a hypothetical future possibility, the data are with us now, are already a part of our society, and will only increase in importance in the future. These facts make the book relevant: Big data is a rising trend, and the more people become conscious of this, the more we'll be able to harness its potential.
The book is not flawless, however. There were two main points which I found problematic:
1. The authors divide their discussions into basically seven chapters on the benefits of big data, two on the dangers of big data, and finally a summing up. The first seven positive chapters are very positive indeed, highly extolling the applications of big data, while the two negative are very negative, somewhat dramatizing the dangers (using Robert McNamara's "body count" obsession from the Vietnam war as an example of how not to use data). This all-or-nothing view felt somewhat schizophrenic to me. I realize that this is meant as a pop science book, but I would have preferred a more academic, objective tone of analysis. As it stands, the authors come across as somewhat uncritical of the practical limitations of big data. For example, big data yield the possibility of detecting subtle associations which otherwise might have gone unnoticed, but also comes with the danger of false positives. This means that problems cannot necessarily just be solved by "throwing more data at them". The authors do not reflect critically on such problems.
2. At several points throughout the book, the authors write that one of the enabling factors of the usefulness of big data is a shift from causation to correlation. Many machine learning techniques (indeed, the majority of statistical techniques) only yield associations (correlations, in the words of the authors), not causation. The authors invites us simply to accept that we should not concern ourselves overly with causation, as correlations suffice. This is misleading. Noncausal analysis suffices when we wish to predict something. Here, machine learning techniques work well. Causal analysis is necessary when we wish to understand possible effects of interventions, for example when we give cancer patients chemotherapy. Here, we desire to understand the causal effect of the therapy. The traditional way to obtain this is through comparatively small and expensive randomized experiments. Identifying correlations in observational data, which is what most of the examples in the book are about, simply does not suffice. In boldly claiming that we should shift our attention from causation to correlation, the authors overplay their hand: Our interest in causality precisely shows that big data has its limitations, and these limitations should not be handwaved away.
In spite of these concerns, however, the authors should ultimately be commended for writing one of the first layman's books about one of the most important technological trends in our society. The book is not perfect, but is nonetheless filled with great examples of how big data can be used to solve otherwise very difficult problems, and discusses many of the benefits and drawbacks of big data (the drawbacks being for example privacy issues and society reacting to "predicted" actions instead of actual actions). If you are interested in an overview of how the increasing generation and analysis of data is influencing and will continue to influence society, then this is a good buy.