It's Google vs. Amazon to create the biggest database in history

Google Amazon
Getty Images; Chris Ratcliff | Bloomberg | Getty Images

It's Google vs. Amazon to create the biggest database in history.

Four years after its proposed creation and six years after the Flash Crash of May 2010, the long-delayed Consolidated Audit Trail is finally showing signs of life.

Here's the amazing thing: If a flash crash happened today, we wouldn't know much more about what caused it than we did when the original Flash Crash occurred. There has been some progress. But there's no real way to quickly examine the trail of the most important market information, like canceled bids and offers, or who executed the trades.

The CAT is supposed to be the ultimate unraveler of the mysteries of the stock market: a vast database that will enable regulators to look at who has been trading what in the sub-second trading world that exists today. And not just trades that take place: every bid and offer that is put into the market, regardless of whether the trade is executed or not.

And so the SEC spoke: The trading community will develop a massive database system that will enable regulators to get at all the important information, and — eventually — be able to analyze it. Fast. Find out what happened for sure. No blaming the Flash Crash on Waddell & Reed, or some knucklehead in London who may or may not be trying to spoof the markets. Real data. Real info.

We're still waiting. But on Wednesday, the SEC is set to publish ground rules on how to build the system. The ground rules will mandate what data has to be collected and within what time frames.

To figure this out, the industry is going outside to partner with the two biggest database providers: Google and Amazon.

There are three final bidders to build the CAT: 1) FINRA, the market regulator, is teaming with Amazon Web Services, 2) FIS is teaming with Google, and Thesys Technologies (an affiliate of high frequency firm Tradeworkx) is offering to build the database on its own (they have brought in Rosenblatt as an advisor).

The winner will likely not be chosen until the third or fourth quarter, but the bidding has set up a very interesting battle between Google and Amazon. The winner will have bragging rights to having built the biggest database in history, potentially attaining supremacy in cloud computing, and expanding its reach into the lucrative financial services industry.

The whole project, while an exciting technological feat, has big problems.

1) Can it be done? This would be the largest database ever assembled. We are talking about 50 billion to 100 billion pieces of information a day. In a seminar I moderated for the Security Traders Association of New York last month at Google headquarters in New York, Google officials exhibited their usual can-do attitude, insisting that it could be done.

2) Who's going to pay for this? Building the database is an immense and potentially ruinously expensive undertaking, and it's not clear who is going to pay the potentially several billion dollars it may take to create it. The short answer is simple: It will be paid for by those who use it, that is, the broker-dealer community like Goldman Sachs, Merrill Lynch, Morgan Stanley, etc. But it's also likely they will find some way to pass these costs on to consumers. There is already grumbling from the financial community.

3) How to protect privacy? Each trade will be accompanied by Personally Identifiable Information — a code revealing exactly who is making the trade. This is immensely valuable information; we are talking about someone having access not only to every trade Warren Buffett ever makes, but also every bid and offer he or his firm makes. How will the security of this information be protected?

4) Who owns the data, and who has access to it? It's not just creating the database, it's what you do with it once it is created. You could potentially create a computer program that monitors the system for anomalous (fraudulent or illegal) behavior, or you could query it directly on individual events. All this is still in the future: The SEC is expected to issue rules initially just about the creation of the database. But creating rules around how to collect the data, how to query the data, and who would be allowed to query the data are very much on the trading community's mind.

The next step is the rules will be put out for a 180-day comment period, after which the SEC will decide whether the rules should be approved. The SEC will likely choose who will build the database in the third or fourth quarter.

If all this sounds endless, well, it is. The industry needs to bring trading into the 21st century, but the regulatory structure is so oppressive that progress is glacial. It has taken years just to figure out a simple method of running a pilot to trade small-cap stocks in increments other than a penny, just to see if trading in nickles will stimulate more trading.

There's a crying need to develop computer simulation models that enable markets to test new trading methodologies without years — decades even — of proposals.

Now do you see why the industry needs the help of a Google or an Amazon, or both?

Correction: This story was revised to correct that the Flash Crash was in 2010.

  • Bob Pisani

    A CNBC reporter since 1990, Bob Pisani covers Wall Street from the floor of the New York Stock Exchange.

Wall Street