on 11 April 2013
In "Big Data", Mayer-Schönberger and Cukier discusses the shift in our society towards the ability to generate, store and analyze considerably larger amounts of data than before. There has been a trend towards more data for decades (even centuries, I suppose), but recent technological advances has given rise to a visible qualitative shift in the way which we manipulate data. Statistics used to focus more on getting the most out of few data, whereas in recent decades, there has been rising interest in trying to get information out of large, unruly sets of data (often labeled "machine learning" or "data mining"). The information extracted in such cases are often more vague, but as the authors argue, can nonetheless, based on sheer size and available computing power, lead to essential insights.
Most of Mayer-Schönberger and Cukiers book consists of discussions of examples where an innovative use of a large, unwieldy data set yields large insights or value added. The examples are diverse, ranging from air-ticket price prediction to constructing ocean navigation maps or predicting exploding sewer lids. They make it quite obvious that the usefulness of big data is not a hypothetical future possibility, the data are with us now, are already a part of our society, and will only increase in importance in the future. These facts make the book relevant: Big data is a rising trend, and the more people become conscious of this, the more we'll be able to harness its potential.
The book is not flawless, however. There were two main points which I found problematic:
1. The authors divide their discussions into basically seven chapters on the benefits of big data, two on the dangers of big data, and finally a summing up. The first seven positive chapters are very positive indeed, highly extolling the applications of big data, while the two negative are very negative, somewhat dramatizing the dangers (using Robert McNamara's "body count" obsession from the Vietnam war as an example of how not to use data). This all-or-nothing view felt somewhat schizophrenic to me. I realize that this is meant as a pop science book, but I would have preferred a more academic, objective tone of analysis. As it stands, the authors come across as somewhat uncritical of the practical limitations of big data. For example, big data yield the possibility of detecting subtle associations which otherwise might have gone unnoticed, but also comes with the danger of false positives. This means that problems cannot necessarily just be solved by "throwing more data at them". The authors do not reflect critically on such problems.
2. At several points throughout the book, the authors write that one of the enabling factors of the usefulness of big data is a shift from causation to correlation. Many machine learning techniques (indeed, the majority of statistical techniques) only yield associations (correlations, in the words of the authors), not causation. The authors invites us simply to accept that we should not concern ourselves overly with causation, as correlations suffice. This is misleading. Noncausal analysis suffices when we wish to predict something. Here, machine learning techniques work well. Causal analysis is necessary when we wish to understand possible effects of interventions, for example when we give cancer patients chemotherapy. Here, we desire to understand the causal effect of the therapy. The traditional way to obtain this is through comparatively small and expensive randomized experiments. Identifying correlations in observational data, which is what most of the examples in the book are about, simply does not suffice. In boldly claiming that we should shift our attention from causation to correlation, the authors overplay their hand: Our interest in causality precisely shows that big data has its limitations, and these limitations should not be handwaved away.
In spite of these concerns, however, the authors should ultimately be commended for writing one of the first layman's books about one of the most important technological trends in our society. The book is not perfect, but is nonetheless filled with great examples of how big data can be used to solve otherwise very difficult problems, and discusses many of the benefits and drawbacks of big data (the drawbacks being for example privacy issues and society reacting to "predicted" actions instead of actual actions). If you are interested in an overview of how the increasing generation and analysis of data is influencing and will continue to influence society, then this is a good buy.
on 14 July 2014
This book takes a typical "management primer" approach - it is narrative-based, as deep as an oil slick and ultimately as intellectually nutritious as cotton candy. There is a grinding inconsistency between the approach of the book and the message it is trying to impart. A book about the benefits of using data to make decisions needs to show-not-tell and this book contains virtually no data or quantitative analysis.
One problem with the narrative approach is that sooner or later any given reader encounters a story they have some familiarity with and realises how it has been simplified and spun to suit the purposes of the book. That moment came for me with the story of Steve Jobs' management of his terminal cancer referenced from Isaacson's authorised biography, pp. 550-551. But any rounded account of this story also has to engage with the very different impression given by pp. 452-456 of the same book. You need to take an 'N = all' approach to your sources, guys!
I really parted company with this simplistic narrative at the account of 'The-Numbers.com' which uses big data to predict income from movie proposals (pp. 144-145). I challenge any movie fan to read this section and not be thinking: "ha! that explains a lot about Hollywood's output over the last decade!". But the book never even acknowledges there might be any problem with this approach. For the rest of the book, I was expecting the authors to return to this piece of low-hanging fruit, but they never did. What a missed opportunity to introduce the problem of "causal pollution" of big data sets. As soon as big data gets used extensively to drive decisions, feedback from those decisions begins to pollute the data set reducing its predictive value and constricting the solution space. When a movie is predicted to be a flop, it never gets made so the prediction never gets tested. Meanwhile the same old movies are getting made again and again and with the same stars, their ageing making their performance increasingly ridiculous. So the whole industry is locked in a spiral of decline having poisoned the well of creativity. We can probably live with the suicide-by-data of Hollywood, but the same processes are behind many of the problems with the world financial system.
This book was given to me as a gift, but I've learned to steer clear of any book that has notes and bibliography which consist mainly of media articles and journalistic interviews rather than academic research.
According to Viktor Mayer-Schönberger and Kenneth Cukier, "There is no rigorous definition of big data. Initially the idea was that the volume of information had grown so large that the quantity being examined no longer fit into the memory that computers use for processing, so engineers needed to revamp the tools they used for analyzing it all...One way to think about the issue today -- and the way we do in the book -- is this: big data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more." Much more.
Mayer-Schönberger and Cukier identify and examine several "shifts" in the way information is analyzed that transform how we understand and organize society. Understanding these shifts helps us to understand the nature and extent of big data's possibilities as well as its limitations. For example, more data can be processed and evaluated. Also, Looking at vastly more data reduces our preoccupation with exactitude. Moreover, "these two shifts lead to a third change, which we explain in Chapter Four: a move away from the age-old search for causality." They devote a separate chapter to each of these shifts, then shift their and their reader's attention to a term, indeed a process that helps frame the changes: datafication, a concept they discuss in Chapter Five.
Then in Chapters Six and Seven, they explain how big data changes the nature of business, markets, and society as what they characterize as a multi-dimensional "treasure hunt" continues to extract insights from data and unleash dormant value by a shift from causation to correlation. That is to say, big data "marks an important step in humankind's quest to quantify and understand the world" in ways and to an extent once thought impossible.
These are among the dozens of passages that caught my eye, also listed to suggest the scope of Mayer-Schönberger and Cukier's coverage.
o Letting the data speak (Pages 6-12)
o More, messy, good enough (12-18)
o More trumps better (39-49)
o Illusions and illuminations (61-68)
o Quantifying the world, and, When words become data (79-86)
o The "option value" of data, and, The reuse of data (102-107)
o The value of open data (116-118)
o The big-data value chain (126-134)
o The demise of the expert (139-145)
o Paralyzing piracy (152-157)
o The dictatorship of data, and, The dark side of big data (163-170)
o Governing the data barons (182-184)
o When data speaks, and, Even bigger data (189-197)
On Page 197, Mayer-Schönberger and Cukier observe, "What we are able to collect and process will always be just a tiny fraction of the information that exists in the world. It can only be a simulacrum of reality, like the shadows on the wall of Plato's cave. Because we can never have perfect information, our predictions are inherently fallible. That doesn't mean they're wrong, only that hey are always incomplete. It doesn't negate the insights that big data offers, but it puts big data in its place -- as a tool that doesn't offer ultimate answers, just good-enough ones to help us now until better methods and hence better answers come along. It also suggests that we must use this tool with a generous degree of humility.....and humanity."
I realize that no brief commentary such as mine can do full justice to the material that Viktor Mayer-Schönberger and Kenneth Cukier provide in this volume but I hope that I have at least suggested why I think so highly of it. Also, I hope that those who read this commentary will be better prepared to determine whether or not they wish to read the book and, in that event, will have at least some idea of how to leverage Big Data applications and capabilities to transform how they live, work, and think.
on 7 June 2013
I had heard about "Big Data" from a friend who attended the Hay Literary Festival. I am not normally reading business books but I was intrigued and read it. I was not disappointed at all. "Big Data" is not a typical business book (although the authors do talk about the business implications quite a bit); rather I felt it is more a science book - explaining a very different approach to understanding the world we live it. I found it absolutely fascinating - and when I told a friend about it, he said he had read a review of it in the "New Scientist" recently, so I think I wasn't wrong at all. Highly recommended - full of original ideas and the stories are great.
on 19 June 2013
There are information technologists still unsure whether the term 'big data' describes a phenomenon of real significance that will bring major changes to the way societies interpret the world, or a buzzword that's being talked-up as the IT industry's latest 'next big thing'. The authors of this book at times embrace both positions, mixing discursive appraisal of big data's potential to impact our lives for the better with real-life examples of the phenomenon in action as presented throughout these pages. They do not go too deeply into the inner mechanics of big data, or of the analytical management tools (such as Hadoop) needed to explore its potentiality in full; but 'Big Data' does a fairly good job in explaining its broader dynamics for the lay reader.
Arguably big data has come about due to two important developments in IT: the emergence of affordable, highly extensible storage systems capable of containing massive data sets drawn from a variety of sources, allied to ever-faster high-performance compute resources that can examine them in a more timely fashion. Innovative analytics are a key factor: for the first time we are able to interrogate much `bigger' data much faster, so that value patterns or characteristics can be discerned and exploited - often in real-time - to create new services, applications, or decision support aids.
Yet to take full advantage of the big data opportunity, Cukier and Mayer-Schönberger say, scientists must also abandon some centuries-old practices intrinsic to a basic scientific approach, and have their basic understanding of how to make decisions and comprehend reality challenged: "Society will need to shed some of its obsession for causality in exchange for simple correlations," they say - i.e., "not knowing why but only what".
'Big Data' is big on case studies, ranging from how big data has been used to analyse fluctuating airline ticket prices to secure the best deals, to providing mechanisms that enable astronomy to make better sense of the hundreds of terabytes that new radio telescopes will soon be spewing out. The diverse range of these examples show that the big data effect - 'datafication' - is already affecting our lives, from steering Amazon customers to 'personalised' purchasing options, to helping doctors make smarter diagnostic decisions for premature babies.
This book also explains some of the risks to this brave new 'datafied' world - such as the pitfalls of using big data to 'predict' potential behavioural characteristics in individuals or groups, and thereby legitimise pre-emptive remedial controls. If ready to adopt a 'messier' approach to large-scale data management, as the authors of this book characterise it, big data advocates should not lose sight of the fact that we are operating on a view of reality governed by the limitations of the data gathering mechanisms at our disposal.
on 10 December 2013
I agree with other reviewers that the topic is superficially treated, and certainly for anyone that works in the field or has academic interest in Big Data the book will fall short. However for the uninitiated like me this is a great conceptual introduction to the subject and if read right after Who owns the Future by Jared Lanier the two books come together to form a very interesting and thought provoking package dealing with the future, the role which large companies such as Google, Facebook and Amazon play and how these companies have profited from data freely given, or otherwise by the public. It is certainly worth reading both in tandem.
on 11 March 2013
*A full executive summary of this book is available at newbooksinbrief dot com.
The main argument: Statistical information, or data, has long been recognized to be a potentially rich and valuable source of knowledge. Until recently, however, our ability to render phenomena and events in a quantified format, store this information, and analyze it has been severely limited. With the rise of the digital age, though, these limitations are quickly being eroded. To begin with, digital devices that record our movements and communications, and digital sensors that record the behavior of inanimate objects and systems have become widespread and are proliferating wildly. What's more, the cost of storing this information on computer servers is getting cheaper and cheaper, thus allowing us to keep much more of it than ever before. Finally, increasingly sophisticated computer algorithms are allowing us to analyze this information more deeply than ever, and are revealing interesting (and often counter-intuitive) relationships that would never have been possible previously. The increasing datification of the world, and the insights that this is bringing us, may be thought of as one grand phenomenon, and it has a name: Big Data.
The insights that are emerging out of big data are spread out over many areas, and are already impacting several aspects of society. To begin with, big data is helping established businesses to run more efficiently and safely. For example, big data is being used to streamline assembly lines and also to catch quality control problems in the factory. But the benefits of big data go well beyond the factory. For example, the courier company UPS has used big data to help it map out more efficient trucking routes. The resulting improvements have allowed UPS to shave 30 million miles and 3 million gallons of fuel per year from their routes (loc. 1352). The more efficient trucking routes have also led to less traffic accidents. Meanwhile, car companies are beginning to use data from sensors in automobiles to understand which parts are causing problems, and also to understand where and why accidents are happening, so that they may be lessened.
In addition to helping already established businesses, big data is also allowing for new business opportunities that were never possible before. For example, the business prodigy Oren Etzioni used big data to set up a business called Farecast that predicts the cost of airfare tickets. When his business was bought by Microsoft for $110 million, Etzioni used big data again to set up a related business that predicts the cost of all manner of consumer goods. His very profitable business, Decide.com, saves consumers on average $100 per product (loc. 1867).
Outside of the business world, big data is also being used by governments to help reduce costs and make society safer. For example, in 2009 Google was able to apply big data to search terms to help identify how the H1N1 virus was spreading through communities in real time. This method of tracking disease pandemics holds great promise for allowing public health organizations to know when pandemics are beginning, and also to keep better track of how they are unfolding, in order that they may better contain them. In addition, big data is being used to help identify where potentially dangerous infrastructural problems are occurring, and also to identify trouble spots for fire hazards, in order that they may be addressed.
Big data also has significant potential uses in health care. Indeed, our increasing ability to monitor and record everything from our vital signs to the health of our systems to our individual genomes promises to inaugurate an age of personalized medicine that will allow doctors to more easily diagnose our ailments and tailor treatments to our individual bodies.
While big data may already be bringing us impressive benefits, Viktor Mayer-Schonberger and Kenneth Cukier argue that the bulk of the benefits are yet to come. Indeed, for the authors, businesses and governments are only just now waking up to the incredible potential of Big Data. And as they direct more attention to recording and analyzing data streams, the potential uses of the information will only multiply.
On the negative side, big data also carries substantial potential dangers. Most notably, as more and more information about us is recorded, kept and used, our privacy is increasingly threatened. For the authors, a good deal of oversight will be needed in order to ensure that the potential abuses of big data are curbed.
The book is well written and represents a fine overview of the present and future of big data. Also, the authors do well to raise important big-picture issues related to the phenomena, though the potential impacts of big data (both positive and negative) are occasionally overblown. All in all the book is a good introduction to an important and interesting topic. A full executive summary of the book is available at newbooksinbrief dot com; a podcast discussion of the book will be available soon.
on 25 June 2016
The term "Big Data" is constantly being thrown around today by businesses and the technology world. Leveraging big data to gain competitive advantages is an organisational panacea. Given current compute power and storage capabilities we are now able to truly leverage big data in ways one could only previously dream.
This book however lays an important theme: big data is about knowing what and not why. Said differently, it is more about correlations than causations and making that mind shift is at the core of leveraging big data. Organisations that can combine mathematics and statistics along with programming and network science will be at the forefront of big data literacy.
Said best by the authors: "… when we say that humans see the world through causalities, we’re referring to two fundamental ways humans explain and understand the world: through quick, illusory causality; and via slow, methodical causal experiments. Big data will transform the roles of both."
Three key takeaways from the book
1. Some staggering "Big Data" statistics at the time this book was written:
○ About seven billion shares change hands every day on U.S. equity markets, of which around two-thirds is traded by computer algorithms based on mathematical models that crunch mountains of data to predict gains while trying to reduce risk.
○ Google processes more than 24 petabytes of data per day, a volume that is thousands of times the quantity of all printed material in the U.S. Library of Congress.
○ Facebook, a company that didn’t exist a decade ago, gets more than 10 million new photos uploaded every hour. Facebook members click a “like” button or leave a comment nearly three billion times per day, creating a digital trail that the company can mine to learn about users’ preferences.
○ The 800 million monthly users of Google’s YouTube service upload over an hour of video every second.
○ The number of messages on Twitter grows at around 200 percent a year and by 2012 had exceeded 400 million tweets a day.
○ More than 300 exabytes of stored data existed in 2007. To understand what this means in slightly more human terms, think of it like this. A full-length feature film in digital form can be compressed into a one gigabyte file. An exabyte is one billion gigabytes. In short, it’s a lot. Interestingly, in 2007 only about 7 percent of the data was analog (paper, books, photographic prints, and so on).
2. The amount of stored information grows four times faster than the world economy, while the processing power of computers grows nine times faster.
3. Big data’s ascendancy represents three shifts in the way we analyze information that transform how we understand and organize society:
i. We can analyze far more data
ii. Loosen up our desire for exactitude
A move away from the age-old search for causality. Instead we can discover patterns and correlations in the data that offer us novel and invaluable insights. The correlations may not tell us precisely why something is happening, but they alert us that it is happening. Fundamentally, big data is about 'what', not 'why'.
on 4 October 2015
Very interesting book. It is one of the books that I would recommend to be used as reference book, as it contains lots of examples and quotations about individuals, who woke up to the reality of the Big data and how it could be utilised for the good and, perhaps, the more challenging way of profiling innocent people according to their names, culture, religion, political thoughts etc.
I would also recommend this book to the anyone interested in studying or curious about "the concept machine learning and what role the big data can play." Sometimes, you may wonder how Cortana finds out when it is the time to leave for work or home; or it predicts how the traffic would be, while you are on your way to work or home. If you do wonder about this, then you must read this book.
Author's acknowledgement of the role of "algorithmists" in Big data is also plausible. Imagine the day the nutters become part of the law society. I think this would inject honesty into the "how most lawyers handle cases that they are working on."
You can skip this paragraph: If you ever wondered how Neural Network proponents will ever succeed to teach a basic Times table to algorithm that requires two input numbers, like 8 time 7, then after reading this book, you will note Big Data will may help. Note that when we are young and attending elementary schools, most of us learn the Times table by memorising. As we grow, we simply identify a strategy where we, for example, think 7 Times table goes up by 7 and 8 Times table goes up by 8. Hence, no need to memories. In this instance, Big Data can be used to bridge the gap between the Neural Network and those, like me, who very much believe that we should focusing on mimicking how our neocortext works and complement it with Algorithms that make our machines perform better than our neocortext. In this paradigm, the Big Data will be used for playing the role of the memory and experience, while still we will be able to create strategies that can be serialised into and de-serialised from the Big Data repository.
The author does also go on about privacy and the challenges Big Data faces. I think the question to ask is: if we accepted to use the cloud, have we not sleepwalked into sharing our data with those, who are there to analyse data? Is it the machine that should only have an access to our private data; or also those, who own this smart machines? Would the combination of Big Data and Intelligent machines bring about the creation of all-knowing being that cannot only know our past, but can also predict our future activities. And imagine what impact this would have on currency/stock traders? Do not even think politics here, as this will get more scarier.
If you have ever watched the Movie "Her" and reasoned with the poor man, who fallen for OS that knows him very well, then think about the consequence of intelligent machines, powered by Big Data! And this is another reason to read this book.
However, we should never fear exploring what we are capable of doing for the good of this world and its inhabitants; but should also be prepare to ensure that the all-knowing thing, which we are in the process of creating, is not one dictator, but one that lives and functions within democratic system.
"For in much wisdom is much grief,
And he who increases knowledge increases sorrow." -- Ecclesiastes 1:18 (NKJV)
I believe that your reaction to this book depends totally on whether you already crunch and reuse all available data. If you do, it's old news. If you don't, you may feel that the floor has been tilted a bit in favor of those who have and know how to use the data. In the latter case, first appreciating that data-driven learning can be more valuable than theory-testing learning can be quite an eye-opener. You may not agree. Sometimes you should and sometimes you shouldn't.
If you aren't a data jock, the book has accessible examples and anecdotes that you will probably understand just fine. If you are a data jock, the content may seem, well, "Elementary, my dear Dr. Watson."
I thought the most interesting parts related to how privacy might be protected against unanticipated invasions by those who are incautious in plowing ahead without considering who will get hurt and how.
The case for not needing to know cause-and-effect is greatly overstated here. One of the best potential uses of data-driven analysis is identifying what combinations of changes may work best with one another in a new business model, an improved strategy, or an upgraded business process. If you don't understand cause-and-effect in seeking to make such improvements, you'll make a big mess most of the time. That's also true in other complex environments, such as many medical ones. The benefits of one such cause-and-effect based improvement will usually run rings around the kind of incremental enhancements described in this book from acting on merely data-driven conclusions.
I've graded the book for its value to someone who is new to the subject.