Data Economy

A brief history of big data, the Noam Chomsky way

Noam Chomsky
ChinaFotoPress | Getty Images

The latest news from the fast-evolving world of the Data Economy:

For those familiar with Noam Chomsky, the pioneering linguist whose theory of recursion seeks to find the universal in all human languages, you probably also know that Chomsky often has not-so-nice things to say about the U.S. government, and has also made a career of finding the universal in the abuse of power by those who control the globe in any age. The political activist and self-professed anarcho-syndicalist (whatever that is) has made a career of bashing the U.S. government, specifically. Would he do the same to the surveillance state when given the opportunity? Not really. Or not exactly. In a recent appearance at MIT's Engaging Data 2013 Conference, Chomsky split the bill with Washington Post "big brother" reporter Bart Gellman.

"Big data is a step forward," Chomsky is quoted as saying by Computer World. "But our problems are not lack of access to data, but understanding them. [Big data] is very useful if I want to find out something without going to the library, but I have to understand it, and that's the problem."

Hmmm, not much agitation there...but wait! Chomsky went on to say that big data isn't really a new problem anyway. "We can be confident that any system of power—whether it's the state, Google, or whatever—is going to use the best available technology to control, to dominate, and to maximize their power," Chomsky is quoted as saying. "And they'll want to do it in secret."

Now that's sounding more like Chomsky.

(Read more: Charts that changed the world, before big data)

Gellman—who was on the receiving end of some Edward Snowden leaks—begged to differ politely. "I think it's important to know how far you can get into these discussions without assuming bad faith, that these institutions are deliberately working against the interests of citizens...The principle way in which I agree with Chomsky is that the place where the opposition comes from is that they don't want you to know about things they feel they need to do on your behalf. Because they're afraid you might not agree and want to call it off."

Well, guess that's the difference between an anarcho-syndicalist and Pulitzer Prize-winning "have to remain impartial" journalist.

Facebook on mining your privacy
Facebook doesn't have the greatest reputation in the world for thinking of your privacy above and beyond its business, but we'll give Ken Rudin, head of the social network's data analytics team, the benefit of the doubt in dishing on his "five keys" to successful big data in Information Week.

Anyway, it's more or less the usual big data guru routine:

1. It's the analysis, not the raw data, that counts.
2. A picture is worth a thousand words
3. Make a big data portal (Not sure if Facebook is planning on dominating in cloud services some day)
4. Use a hybrid organizational model (We're asleep already, so let's move on)
5. Train employees (Hmmm, we guess it's this kind of advice that gets Ken the big bucks).

"If the team doesn't know what it might do differently, I'm going to choose a different problem." Rudin told Information Week. And when we hear the usual guru-speak, we move on to other happenings in the data economy.

(Read more: The elite athletes born all roided up)

Yahoo to Big Brother: Take that!
Yahoo has joined the ranks of the big tech companies getting a clue that leaving users in the dark about the government's rifling through their Internet wanderings isn't A-OK. Yahoo announced this past week it will encrypt the flow of info through its data centers to make it tougher for that pesky government to get a peek at your invisible trail.

Brain drain
Oh here we go again. More reports from the front where the human brain will be replaced by machines. Speaking at a recent symposium on cognitive computing at the IBM Almaden Research Center, famed venture capitalist Vinod Khosla was quoted by EETimes saying, "Data science will do more for medicine in the next 10 years than biological science."

Meanwhile, Palm Computing founder Jeff Hawkins talked up his latest invention: Grok. It uses a technique employed in the neocortex to track large datasets by creating so-called sparse distributed representations (SDRs). "We don't know how to characterize it mathematically, but I'd argue this is a basic building block of cognitive computing," Hawkins said. Well, you keep arguing Jeff, and come back to use when you do find a way to categorize it.

It's been a long time since those pioneers in artificial intelligence from the era of World War II thought they had cracked the neuron, and many decades followed in which computer scientists realized, man, that's a harder nut to crack than we imagined. Good luck Jeff! We think your proof of recreating the brain is pretty sparsely distributed, but go ahead, keep representing machine versus good old human being.

(Read more: 10 surprising ways companies use your private info)

But wait!

Has someone heard our tired complaints? "Wouldn't it be incredible if we could create minds as powerful as our own but more reliable, less in need of coffee, and incapable of making mistakes? It's a tempting proposition. But it's also a red herring, leading the tech conversation astray," writes Christian Madsbjerg, cofounder of business consultancy ReD, in VentureBeat, dismissing all of the excitement over the motherboard as the brain. Oh wait, this article was a mere response to another VentureBeat article on why big data will replace the human brain. Anyway, talk about brain drain. The important thing about big data is making oodles of money anyway.

The next big data IPO

Sumo Logic is aiming to go public in 2015, following Splunk and Tableau Software into the market in response to the big appetite from investors for any company with the words "big data" or "data analytics" in an S-1 filing. Its chief executive recently spoke to Reuters with the kind of IPO road show talk you're not allowed to use when you're actually in an IPO period but can use liberally before that period begins.

CEO Vance Loiselle told Reuters revenue is going to grow ten times this year. Sumo projects 300 percent revenue growth next year, though true to the private company mold, reveals no actual revenue figure. "(An IPO) is definitely something we will consider in the next couple of years. 2015 is probably a realistic timeline." And no company in the tech sector has ever gotten that timeline wrong before, right?

Splunk is up near-150 percent this year (and was up another 23 percent this past Friday after its latest earnings beat) and Tableau a "mere" 25 percent this year (25 percent—not even keeping up with the S&P 500—a veritable "dog" of the big data stock world).

Your car running on big data

Industry consultants love throwing big numbers around about the projected growth of splashy industries. And oh boy do they love doing that with big data, terabytes just don't cut it when exabytes and infinibytes (OK, we made that up) are available.

In its new study, Emerging Technologies: Big Data in the Connected Car, IHS Automotive forecasts "there will be 152 million actively connected cars on global roads by 2020, and $14.5 billion of value from the OEM connected car landscape from a variety of Big Data assets found in the connected car—diagnostics, location, user experience (UX) /feature tracking, and adaptive driver assistance systems (ADAS)/autonomy.

IHS Automotive estimates "conservatively" that more than 480 terabytes of data will be collected from the OEM connected car landscape in 2013 through millions of small data transmissions sent through more than 26 million connected cars. "A combination of increased connected car sales and a growing scale of information coming from connected cars will result in the collection of some 11.1 petabytes of connected car data by 2020."

But why stop there? The study says that "about 30 terabytes of data would be collected each day from the 152 million connected cars on the road in 2020, or about 350 megabytes per second, compared to about 15 megabytes per second in 2013."

We just love conservative estimates and the word "about" thrown right before a 30 terabyte, 2020 prediction. Personally, we just wish we knew how to replace the fuse in our car that makes the dashboard AC/DC charger work.

Turkey day data
It's a well-known saying among the man on the street that all of this big data buzz is utterly unintelligible and is likely no more than selling the Brooklyn Bridge or snake oil. That's why companies like Tableau Software are working hard to bring big data down to the level of the masses. So consider this as proof that big data can be useful to you: Tableau and Allrecipes.com teamed to create an interactive Thanksgiving map of the United States of America, what we like to eat and where we like to eat it, based on 78 million online recipe views.

AllRecipes.com can look across its millions and millions of data points coming from recipes readers and they can use it to help them with marketing and advertising as well as with getting the right content to the right audiences ... oh damn, that sounds like just the kind of big data mumbo jumbo that puts the average American to sleep. Ignore it! And if you've lacked an appetite for big data until now, we still think you may gobble this up.

—By Eric Rosenbaum, CNBC.com