The Age of Big Data

Kenneth Cukier is the data editor for The Economist.

A revolution in how society uses information is shaking up business, politics and everyday life.

information_overload_200.jpg
Ian McKinnell | Getty Images

The fruits of the information society are easy to see, but the information itself is not. It is invisible, encoded in electronic pulses. And where the hardware of the information age gets replaced over the time, the data itself tends to stick around; new info layered atop the old bits and bytes like sedimentary rock.

Now that all that information is building up and new data being generated all the time, something special is starting to happen. As the cost of processing power and computer memory plummets while performance improves, society is able to do new things with all that information that it couldn't do before. Call it the age of big data.

For instance, high-end cars know that an engine part is likely to break down before it actually does, based on the pattern of temperature or vibration it produces, a technique known as predictive maintenance. The idea is that a part does not fail all at once. Instead, it deteriorates over time until it eventually breaks. By monitoring the part all the time, it can spot problems before they become too serious.

The same approach may eventually apply to other areas of life, such as healthcare, when the aggregation of electronic medical records enable physician and statisticians to detect illnesses or ensure better treatment. Meanwhile, the firehose of tweets that flood the world daily are being successfully used to make predictions on Hollywoods box office sales and even the daily closing price of the New York Stock Market .

All these examples pose the question: what is big data? The term came from the sciences to

describe situations where the size of the data outstrips the tools to store it or process it. So when researchers have to run analysis on huge data sets, they bring the algorithms to the data, not the information to the analyses, as typically was the case.

As these huge data sets have invaded the corporate world there is another way of thinking about big data. It is that there are things that one can see with a lot of data that one simply can't with a smaller amount. For example, training computers to translate languages used to be a hard task. But as soon as researchers stopped fiddling with the algorithms (or "brains" of a system) and instead focused on the data (or the "brawn"), the quality of translations improved significantly. Rather than train a computer all the rules of syntax, the geeks let the computer loose to solve math problems: what is the statistical probability of a word in one language being a suitable replacement for a word in another?

This represents a case of being able to learn something from a large corpus of data that one simply cannot know from examining a smaller amount. The data whispers insights to those with the wisdom to listen. The promise of big-data is akin to the idea underpinning nanotechnology: that when one goes into the sub-atomic scale, the ordinary rules of physics don't apply and materials take on new properties. So too with big-data, the size is an advantage.

Society has always existed in an information-constrained universe. But today the amount of data greatly overtakes our ability to comprehend it. For instance, the Sloan Digital Sky Survey collected more data in its first few weeks in 2000 than in the entire history of astronomy. Its successor, the Large Synoptic Survey Telescope, to start in 2016, will acquire that amount of data every five days. And decoding the human genome took ten years before it was finally completed in 2003. Today it can be done in minutes.

As the amount of data in the world grows, the only certainty is that there will need to be more qualified peopled to make sense of it. That should be good news as we stop and salute our machine overlords.

Kenneth Cukier is the data editor for The Economist.