Bright lights, big data: What the market may tell us about a correction

Traders on the floor of the New York Stock Exchange.
Brendan McDermid | Reuters
Traders on the floor of the New York Stock Exchange.

Digging into some big data—everything from Google searches to the tone of news stories—might be able to tell us whether this market is headed for a correction.

Proponents of "wisdom of the crowds" contend that such data may contain within it some key insights about specific variables of interest. So can we find any evidence of this wisdom in big data?

The short answer is yes, and investment professionals should seriously consider a new factor: namely, big data based sentiment, as an input into investment decision making.

Unlike most traditional indicators that are derived from government data and surveys, the growing volumes of unstructured data contain a new type of intellectual and emotional content factor that could be a powerful addition to commonly used factors such as value and momentum. Let's consider why this might be the case.

Consider Figure 1 which plots the S&P 500 market index from January 2006 through October 31, 2014 as the blue time series, versus the frequency of the phrase "market correction," as the red series reported from Google Trends over the same period.

It is readily apparent that the recent use of the phrase "market correction" is at an all-time high. The most recent peak is significantly higher than those that preceded the market collapse of 2008 when the S&P500 dropped by roughly 60 percent from its peak in October 2007 to its trough in March 2009. It would not be unreasonable to interpret the pattern as indicating a heightened sense of nervousness about the market.

Read MoreCorrection possible, but QE end needed: Fed Fisher

Thought viruses and slowdowns

In a New York Times economic commentary on October 18 2014, Robert Shiller asked whether the abrupt decline of 6 percent that week signaled the beginning of a bear market. Shiller used the metaphor of a "thought virus" to describe the market, arguing that the stock market is driven by popular narratives, some of which can spread by contagion.

Shiller's thought virus is similar in many ways to a meme, which is defined as "an idea, behavior, or style that spreads from person to person within a culture." Some of these are pernicious, like the "debt ceiling crisis."

This latter virus took hold between May 2 and June 15 of 2011 when the S&P dropped by over 7 percent for fear that Congress would not raise the debt ceiling in time and the US government would default on its debt. The virus "died" when Congress raised the ceiling on July 31 2011.

Interestingly however, the market had one of its worst days on August 4 which was after the Congress action, driven by a different virus that had been quiescent: fears of a global economic slowdown expressed in this August 4 headline in CNN Money:

"The Dow tumbled 512 points—its ninth deepest point drop ever—as fear about the global economy spooked investors."

Sound familiar to what we are hearing today? The market continued to grind down until October 3 after which it rallied as the "global slowdown" virus abated. Arguably, this type of meme doesn't ever die out but remains dormant, rising up periodically when conditions are hospitable.

Read MoreCorrection watch: Here are the official levels

These memes include fears of recession, inflation, interest rates (hikes or cuts), political risks, and these days, fear of terrorist acts. The challenge is whether these memes can be identified and measured accurately enough over time to provide useful ex ante sentiment indicators that presage socio-economic activity.

Consider zooming in on the last 90 days of Figure 1 and also integrating some additional news data. The chart in Figure 2 (now focusing on Monday August 4, 2014 to October 31, 2014) shows the S&P 500 along with the daily average positive and negative sentiments computed from a collection of 13,269 business news stories on Reuters that appeared over that period.

The sentiment measures were calculated using the General Inquirer (GI) methodology in which every word is classified as belonging to one or more predefined categories such as positive, negative, strong, and weak. A simple proportion is calculated for each category based on a count of words in the category relative to the total words in the sample.

The GI methodology has been shown to be useful in some studies as a simple first order approach for sentiment extraction. In our calculation, we normalize the individual sentiment percentages to yield a type of normalized Z-score of positive and negative sentiment for each day.

Pessimistic viruses?

The chart is broken into three periods separated by the vertical black bars:

1. August 4 and September 18, 2014, where a steadily rising market flattens out. During this time, the S&P500 rose by roughly 5% and the positive and negative sentiment scores both oscillated around 1.

2.A sharp drop of roughly 6 percent in the S&P 500 between September 18 and October 15 2014 in response to fears of a global slowdown. The positive sentiment dropped dramatically by 60 percent from peak to trough and an average drop of over 30 percent to 0.66, while the level of negative sentiment drifted slightly higher with three pronounced peaks of increasing magnitude at weekly intervals: on September 27, October 4 and October 11 (sentiments tend to peak on Saturdays which is not altogether surprising).

3.A sharp reversal of over 8 percent for the S&P 500 was seen between October 15 and October 31, as the market made new highs. Interestingly, the average negative sentiment has stayed above 1 during the rebound while the positive sentiment has remained at 0.6, significantly lower than the pre –"global slowdown" level.

It is tempting to conjecture that the market is currently in a heightened state of anxiety. What might explain the major drop in positive sentiment?

One interpretation could be that while there are always doomsayers: when the countervailing optimists drop out of the discussion, it bodes ill for the market. A contrary interpretation, however, could be that the low positive sentiment scenario coupled with a healthy level of negative sentiment is indicative of a key segment of the investment community having already taken money off the table.

In this latter narrative, this "sidelined" money can be put back into the market to provide and support a market rise in the near term. (This could, for example, explain the recent market rise despite the low positive sentiment.) A slightly different variant of this view could hypothesize that low negative sentiment coupled with high positive sentiment portends a heightened state of risk in which investors are fully invested. However, bad news can spark a rapid correction.

The possibility of multiple theories suggests that we must dig deeper and more methodically into the data to try to address such questions and identify characteristic viruses, and to measure which ones are getting stronger or weaker.

As of this writing, for example, concerns about Europe have resurfaced to include the narratives of stress test failures of certain European banks and/or the need for aggressive quantitative easing. Could this environment of depressed positive sentiment be a fertile one, in which a pessimistic virus can take hold (particularly if a contingent of the investment community were to become nervous and take risk off the table)?

Read MoreECB survey downgrades inflation, growth forecasts

Regardless of how the future pans out, professional investors cannot ignore the emergence of a new and critical class of "factor" in their investment models—namely, big data based sentiment, and the growing capability for market participants to extract and measure it reliably.

This new type of factor, which is very different from indices such as the Michigan consumer sentiment index, may well join the ranks of key determinants of investment performance, alongside traditional factors such as value and momentum that have been used for decades on Wall Street.

Commentary by Vasant Dhar, professor at NYU Stern School of Business and editor-in-chief of Big Data Journal.