Efficient market theory assumes that all market prices incorporate all information at the same time. Realistically, different market segments focus on different news flows, depending on the nature of the traded security and their research capacity. Such specialization makes it plausible that lagged correlations arise between securities prices, even though their specifics may change overtime. Indeed, there is empirical evidence for lagged correlation between the price trends of different U.S. stocks. Such lagged correlation can be identified and tested through a neural network. Academic research finds that price trends of some stocks have been predictable out-of-sample based on information about the price trends of others.
Moewsa Ben and Gbenga Ibikunleb (2020), “Predictive intraday correlations in stable and volatile market environments: Evidence from deep learning.”
The below are quotes from the paper. Headings, cursive text and text in brackets has been added.
This post ties in with this site’s summary article on quantitative methods for macro information efficiency.
Evidence for lagged correlation in U.S. stock prices
“We investigate intraday predictability with differing time intervals, and…find evidence of the presence of time-delayed correlations in S&P 500 stocks in both stable and volatile markets, and of the viability of using deep learning for trend predictions in large numbers of inter-correlated time series.”
“Specifically, by using trend approximations as features…[lagged correlations] can be used to realise above-average accuracies in predicting price trend changes without the inclusion of data from the target stock as an input, delivering evidence against the random walk hypothesis and most forms of the efficient market hypothesis in stable market environments.”
“Our experiments outperform predefined baselines for strict statistical key performance indices, which includes accuracies for different prediction horizons. Predictions of one stock’s trend changes based on other stocks’ price trend gradients in the preceding time step show an improved accuracy for larger time intervals, with average and maximum accuracies of 56.02% and 63.95%, respectively, for one-day predictions.”
“This…makes the prediction of price changes based on historical data an attractive use case for trend forecasting involving inter-correlated time series in stock markets as an example of real-world complex systems.”
A methodology for detecting lagged correlation
“We obtain two sets of high-frequency transaction data from the Thomson Reuters Tick History database.”
“Feature engineering describes the manual selection and, if necessary, transformation of given datasets into new data that better represent the features needed for a chosen task. For this paper…simple linear regressions are used as way to approximate the trends over given time intervals. By running a linear regression over each time series and time interval separately, and by taking the first derivative of the resulting equation, the trend gradients for single stocks and time intervals are obtained.”
“A simple linear regression is performed on each time step for each stock’s price series, after which the first derivative of the resulting regression is computed as a trend strength indicator. For each target stock, all stocks’ gradients for the preceding time step except for information about the target stock are then used as input features for a neural network with five hidden layers and two output nodes for a binary classification task. The model is then, with separate experiments for each stock, trained to predict the upwards or downwards change of the target stock’s trend gradient in the next time step.”
“[The figure below] depicts a schematic overview of the experimental setup. For the number of stocks that are made usable during the data cleansing and pre-processing, gradients of the price trends for each separate stock are computed in the feature engineering step…The gradients for one time step are then used as inputs to a feed-forward artificial neural network that is fully connected for adjacent layers to predict whether the gradient of the left-out nth stock changes upwards or downwards with regard to its gradient in the preceding time step.”
“The reason for omitting the target stock gradient for the preceding period in the input vector is that the described framework is designed to test for lagged correlations, which prohibits using the target stock’s own information. Predictions are meant to explore the predictability based solely on correlations with other members of the S&P 500 stocks. major distinguishing aspect of this paper from related research on time series-based stock market prediction.”
“In order to identify general correlations, we reduce result variability via five-fold cross-validation…A model is then trained on all but one of the sets and tested on the remaining one, alternating which sets are used for training and testing. This way, multiple tests are carried out over all instances of the data, and the results for all models are then averaged. This is the “gold standard’ in machine learning, as it addresses the possibility of testing models on an unrepresentative subset. Cross-validation makes for more reliable and stable results.”
Are markets inefficient?
“Weak-form market efficiency can be upheld under the additional assumption that only a select number of market agents have access to, or choose to implement, techniques such as the one described in this work. In other words, if a small-enough amount of capital is involved with exploiting lagged correlations in stock markets, these correlations are still present in price data and the market remains efficient for the majority of market agents. “
“Given the observed reluctance of financial firms to deploy artificial intelligence approaches on a large scale, as well as the inherent difficulties of black-box models when having to produce research for market oversight agencies, this scenario does not seem unlikely and allows for our results to be compliant with much of the financial literature.”