Statistical learning and macro trading: the basics

The rise of data science and statistical programming has made statistical learning a key force in macro trading. Beyond standard price-based trading algorithms, statistical learning also supports the construction of quantamental systems, which make the vast array of fundamental and economic time series “tradable” through cleaning, reformatting, and logical adjustments. Fundamental economic developments are poised to play a growing role in the statistical trading and support models of market participants. Machine learning methods automate the process and are a basis for reliable backtesting and efficient implementation.

Data science and statistical programming

Data science is simply a set of methods to extract knowledge from data. The methods do not necessarily have to be complicated and the datasets do not necessarily have to be “big”. Indeed, in the macroeconomic space the information content of data is expanding rather slowly because the release frequency of official statistics is typically low (often just monthly and quarterly), the set of relevant countries and currency areas is limited, business cycles take many years to unfold, and there is little scope for generating experimental data.

The key technology behind the expansion of data science in trading is statistical programming. Statistical programming has made the use of data science convenient, cheap, and practical. The great convenience is that data transformations and estimations can be executed with a few simple lines of code. Costs are low because most tools and packages are open-source, with plenty of free help and problem solutions. And practicality means that statistical code can be directly embedded into information and trading systems.

The dominant solution for statistical programming is Python, an interpreted, object-oriented, high-level programming language. It owes much of its popularity to simple syntax; code is easily readable and memorable. Python’s conquest of the data science space was led by the rise of its flagship packages: NumPy (foundational package for scientific computing), SciPy (collection of packages for scientific computing), pandas (structures and functions to work with data), statsmodels (library of statistical models and tests), matplotlib (plots and other 2D data visualizations), Seaborn (high-level interface for statistical graphics), scikit-learn (tools for machine learning and predictive data analysis), PyCaret (low-code machine learning library with time series module), TensorFlow (artificial intelligence library that uses data flow graphs to build models), and Keras (interface for neural networks). Moreover, unlike domain-specific programming languages, such as R, Python is not only suitable for research and prototyping but also for building production systems.
The R project provides a programming language and work environment for statistical analysis. R is not just for programmers (view post here). Even with limited coding skills R outclasses Excel spreadsheets and boosts information efficiency. Like Excel, the R environment is built around data structures, albeit far more flexible ones. Operations on data are simple and efficient, particularly for import, wrangling, and complex transformations. The tidyverse, a collection of packages that facilitate data science within R is particularly useful for the purpose of macro trading research (view post here). Moreover, R is a functional programming language. This means that functions can use other functions as arguments, making code succinct and readable. Specialized “functions of functions” map elaborate coding subroutines to data structures. Finally, R users have access to a repository of almost 15,000 packages of functions for all sorts of operations and analyses.
Julia is a fast, high-level dynamic programming language that was released in 2012. The intention was to combine the high-level functionality of MATLAB and R with the speed of C or Ruby. Julia is a general-purpose language but particularly suitable for computational sciences and high-performance computing. It has by now attracted a sizeable community.

Statistical programming has changed many areas of financial markets trading:

Most importantly, statistical programming has enabled the implementation of statistical learning for the prediction of market returns and volatility.
Statistical programming allows building practical libraries for backtesting, portfolio construction, and risk monitoring, such as QF-Lib or pyfolio.
Statistical programming objects also facilitate many more advanced analytical tasks in asset management. For example, predictive power scores, which are based on Decision Tree models, are an alternative to correlation matrices for quick data exploration (view post here). Unlike correlation, this score also captures non-linear relations, categorical data, and asymmetric relations.

The macroeconomic data challenge

Efficient macro trading requires building information systems or trading strategies that rely on economic data. The latter are statistics that – unlike market prices – directly inform on economic activity. For trading, macro information must be relatable to market prices, both conceptually and in terms of timing. This is a major problem. Solutions of this problem give rise to tradable economics, the technology for building systematic trading strategies based on economic data (view post here). Tradable economics not only must be grounded in an in-depth understanding of the economic data but also must wrangle many data deficiencies and inconveniences of format:

Short history: Many economic data series, particularly in emerging economies, have only 15-30 years of history, which does not allow assessing their impact across many business cycles. Many surveys and “alternative data series” are even much shorter. Often this necessitates combining them with older discontinued series or substitutes that the market had followed in the more distant past.
Revisions: Most databases simply record economic time series in their recently revised state. However, initial and intermediate releases of many economic indicators, such as GDP or business surveys, may have looked significantly different. This is because the information is being revised and adjustment factors are being updated in hindsight. This means that the information recorded for the past actually is not the information that was available in the past.
Time inconsistency: Unlike market data, time of reference and time of availability for economic data are not the same. The latest information for production in January may only be available in late March. This information is typically not embedded in the databases of the main service providers.
Calendar effects: Many economic data series are strongly influenced by seasonal patterns, working day numbers, and school holiday schedules. While some series are calendar adjusted by the source, the adjustment is typically incomplete and not comparable across countries.
Distortions: Almost all economic data are at least temporarily distorted relative to the concept they promise to measure. For example, inflation data are often affected by one-off tax changes and administered price hikes. Production and balance sheet data often display disruptions due to strikes or natural disasters and sudden breaks due to changes in methodology. On some occasions statistics offices have even released plainly incorrect data for lack of care or under political pressure.

Generally, data wrangling means the transformation of raw irregular data into a clean tidy data set. In many sciences, this simply requires reformatting and relabelling. For macroeconomics, data wrangling takes a lot more.

Common technical procedures include[1] splicing different series across time according to pre-set rules, [2] combining various revised versions of series into a “vintage matrix” and ultimately a single “point-in time” series, and [3] assigning correct publication time stamps to the periodic updates of time series.
Additional standard statistical procedures for economic data include seasonal and standard calendar adjustment (view post here), special holiday pattern adjustment, outlier adjustment, and flexible filtering of volatile series. Seasonal adjustment is largely the domain of official software and there are modules in R and Python that provide access to these. Beyond there are specialized packages in R and Python that assist with other types of adjustments.
Beyond, machine learning methods can be used to replicate what statistical procedures would have been applied in the past. Unlike market price trends, macroeconomic trends or states are hard to track in real-time. Conventional econometric models are immutable and not backtestable because they are built with hindsight and do not aim to replicate perceived economic trends of the past (even if their parameters are sequentially updated). Machine learning can remedy this. For example, a practical approach is “two-stage supervised learning” (view post here). The first stage is scouting features. The second stage evaluates candidate models.

Market data are typically easier to wrangle than economic data but can also suffer from deficiencies. The most common issues are missing or bad price data. Moreover, there is – so far – no generally available database for a broad range of generic financial returns (as opposed to mere price series). Nor are there widely used packages of functions that specifically wrangle financial return data across asset classes.

News and comments are major drivers for asset prices, probably more so than conventional price and economic data. Yet it is impossible for any financial professional to read and analyze the vast and growing flow of written information. This is becoming the domain of natural language processing; a technology that supports the quantitative evaluation of humans’ natural language (view post here). Natural language processing delivers textual information in a structured form that makes it usable for financial market analysis. A range of useful packages is now available for extracting and analyzing financial news and comments.

Altogether, statistical programming allows the construction of quantamental systems (view post here). A quantamental system combines customized high-quality databases and statistical language code in order to systematically investigate relations between market returns and plausible predictors. The term “quantamental” refers to a joint quantitative and fundamental approach to investing. The purpose of a quantamental system is to increase the information efficiency of investment managers, support the development of robust algorithmic trading strategies, and reduce the costs of quantitative research.

Statistical learning

Statistical learning refers to a set of tools or models that help extract insights from datasets. Understanding statistical learning is critical in modern financial markets, even for non-quants (view post here). This is because statistical learning illustrates and replicates how the experiences of investors in markets shape their future behavior. In financial markets, statistical learning can enhance information efficiency in many ways. It can also directly predict returns, market direction, or the impact of specific events. Methods range from simple regression to complex neural networks. Simplicity can deliver superior returns if it avoids “overfitting”, i.e. gearing models to recent experiences. Success must be validated based on “out-of-sample” test sets that played no role in the estimation of model parameters or choice of hyperparameters (model structure).

Linear regression remains the most popular tool for supervised learning in financial markets. It is appropriate if one can relate market returns to previously available information in a theoretically plausible functional form. In the macro trading space, mixed data sampling (MIDAS) regressions are a useful method for nowcasting economic trends and financial market variables, such as volatility (view post here). This type of regression allows combining time series of different frequencies and limits the number of parameters that need to be estimated.

Structural vector autoregression (SVAR) is a practical model class if one wishes to capture several interconnected time series processes. It studies the evolution of a set of linearly related observable time series variables, such as economic data or asset prices. SVAR assumes that all variables depend in fixed proportion on past values of the set and new structural shocks. The method is useful for macro trading strategies (view post here) because it helps identify specific interpretable market and macro shocks (view post here). For example, SVAR can identify short-term policy, growth, or inflation expectation shocks. Once a shock is identified it can be used for trading in two ways. First, one can compare the type of shock implied by markets with the actual news flow and detect fundamental inconsistencies. Second, different types of shocks may entail different types of subsequent asset price dynamics and, hence, form a basis for systematic strategies.

A particularly important practice of statistical learning for investment research is dimension reduction. This refers to methods that condense the bulk of the information of a large set of macroeconomic time series into a smaller set that distills the most important information for investors. In macroeconomics, there are many related data series that have only limited incremental relevant information value. Cramming all of them into a prediction model undermines estimation stability and transparency. There are three types of statistical dimension reduction methods.

The first type of dimension reduction selects a subset of “best” explanatory variables by means of regularization, i.e. the reduction of coefficient values through penalizing coefficient magnitudes in the optimization function that is applied for statistical fit. Penalty functions that are linear in individual coefficient values can set some of them to zero. Classic methods of this type are Lasso and Elastic Net (view post here).
The second type selects a small set of latent background factors of all explanatory variables and then uses these background factors for prediction. This is the basic idea behind static and dynamic factor models. Factor models are one key technology behind nowcasting in financial markets, a modern approach to monitoring current economic conditions in real-time (view post here). While nowcasting has mostly been used to predict forthcoming data reports, particularly GDP, the underlying factor models can produce a lot more useful information for the investment process, including latent trends, indications of significant changes in such trends, and estimates of the changing importance of various predictor data series (view post here).
The third type generates a small set of functions of the original explanatory variables that historically would have retained their explanatory power and then deploys these for forecasting. This method is called Sufficient Dimension Reduction and is more suitable for non-linear relations. (view post here).

Dimension reduction methods do not only help to condense information of predictors of trading strategies, but also support portfolio construction. In particular, they are suited for detecting latent factors of a broad set of asset prices (view post here). These factors can be used to improve estimates of the covariance structure of these prices and – by extension – to improve the construction of a well-diversified minimum variance portfolio (view post here).

Machine Learning

Machine learning is based on statistical learning methods but partly automates the construction of forecast models through the study of data patterns, the selection of best functional form for a given level of complexity, and the selection of the best level of complexity for out-of-sample forecasting. Machine learning can add efficiency to classical asset pricing models, such as factor mode, and macro trading rules, mainly because it is flexible, adaptable, and generalizes knowledge well (view post here). Beyond speed and convenience, machine learning methods are useful for macro trading research because they enable backtests that are based on methods rather than on specific factors. Backtests of specific factors are often unreliable because the factor choice itself is typically shaped by hindsight.

Machine learning is conventionally divided into three main fields: supervised learning, unsupervised learning, and reinforcement learning.

In supervised learning, one distinguishes input and output variables and uses an algorithm to learn which function maps the former to the latter. This principle underlies most statistical learning applications in financial markets. An example is the assessment of what the change in interest rate differential between two countries means for the dynamics of their exchange rate. Supervised learning can be divided into regression, where the output variable is a real number, and classification, where the output variable is a category, such as “policy easing” or “policy tightening” for central bank decisions.
Unsupervised learning only knows input data. Its goal is to model the underlying structure or distribution of the data in order to learn previously unknown patterns. Application of unsupervised machine learning techniques includes clustering (partitioning the data set according to similarity), anomaly detection, association mining, and dimension reduction. More specifically, unsupervised learning methods have been proposed to classify market regimes, i.e. persistent clusters of market conditions that affect the success of trading factors and strategies (view post here). An advanced method of unsupervised learning is autoencoders, a type of algorithm with the primary purpose of learning an informative representation of the data, as well as a latent presentation that is useful and meaningful.
Reinforcement learning is a specialized application of (deep) machine learning that interacts with the environment and seeks to improve on the way it performs a task so as to maximize its reward (view post here). The computer employs trial and error. The model designer defines the reward but gives no clues as to how to solve the problem. Reinforcement learning holds potential for trading systems because markets are highly complex and quickly changing dynamic systems. Conventional forecasting models have been notoriously inadequate. A self-adaptive approach that can learn quickly from the outcome of actions may be more suitable. Reinforcement learning can benefit trading strategies directly, by supporting trading rules, and indirectly by supporting the estimation of trading-related indicators, such as real-time growth (view post here). There are specialized python libraries supporting the use of reinforcement learning in finance, such as FinRL.

Artificial neural networks have become increasingly practical for (supervised and unsupervised) macro trading research. This is a popular machine learning method that consists of layers of data-processing units, connections between them and the application of weights and biases that are estimated based on training data. In Python neural networks can be implemented with TensorFlow or PyTorch. For example, neural networks can principally be used to estimate the state of the market on a daily or higher frequency based on an appropriate feature space, i.e. data series that characterize the market (view post here). Beyond, neural networks can be used to detect lagged correlations between different asset prices (view post here) or market price distortions (view post here). Also, long short-term memory neural networks regulate memory of past experiences by using a gating mechanism that learns which information to keep, to pass on, and to forget.

Backtesting with modern statistical tools

Backtesting refers to calculations of theoretical profits and losses that would have arisen from applying an algorithmic trading strategy in the past. Its function is to assess the quality of a trading strategy in the future. Statistical programming has made backtesting easy. However, its computational power and convenience can also be corrosive to the investment process due to its tendency to sniff out temporary patterns, while data samples for cross-validation are limited. Moreover, the business of algorithmic trading strategies, unfortunately, provides strong incentives for overfitting models and embellishing backtests (view post here). Similarly, academic researchers in the field of trading factors often feel compelled to resort to data mining in order to produce publishable ‘significant’ empirical findings (view post here).

Good backtests require sound principles and integrity (view post here). Sound principles should include [1] formulating a logical economic theory upfront, [2] choosing sample data upfront, [3] keeping the model simple and intuitive, and [4] limiting try-outs when testing ideas. Realistic performance expectations of trading strategies should be based on a range of plausible versions of a strategy, not an optimized one. Bayesian inference works well for that approach, as it estimates both the performance parameters and their uncertainty. The most important principle of all is integrity: aiming to produce good research rather than good backtests and to communicate statistical findings honestly rather than selling them.

One of the greatest ills of classical market prediction models is exaggerated performance metrics that arise from choosing the model structure with hindsight. Even if backtests estimate model parameters sequentially and apply them strictly out of sample, the choice of hyperparameters is often made with full knowledge of the history of markets and economies. For example, the type of estimation, the functional form, and – most importantly – the set of considered features are often chosen with hindsight. This hindsight bias can be reduced by sequential hyperparameter tuning or ensemble methods.

A data-driven process for tuning hyperparameters can partly endogenize model choice. In its simplest form, it involves three steps: model training, model validation, and method testing. This process [1] optimizes the parameters of a range of plausible candidate models (hyperparameters) based on a training data set, [2] chooses the best model according to some numerical criterion (such as accuracy or coefficient of determination) based on a separate validation data set, and [3] evaluates the success of the learning method, i.e. the combination of parameter estimation and model selection, by its ability to predict the targets of a further unrelated test set.
An alternative is ensemble learning. Rather than choosing a single model, ensemble methods combine the decisions of multiple models to improve prediction performance. This combination is governed by a “meta-model”. For macro trading this means that the influence of base models is endogenized and data-dependent and -hence – the overall learning method can be simulated based on the data alone, reducing the hindsight bias from model choice.
Ensemble learning is particularly useful if one uses flexible models, whose estimates vary a lot with the training set because they mitigate these models’ tendency to memorize noise. There are two types of ensemble learning methods:
- Heterogeneous ensemble learning methods train different types of models on the same data set. First, each model makes its prediction. Then a meta-model aggregates the predictions of the individual models. Preferably the different models should have different “skills” or strengths. Examples of this approach include the voting classifier, averaging ensembles, and stacking.
- Homogeneous ensemble learning methods use the same type of model but are trained on different data. The methods include bootstrap aggregation (bagging), random forests, and popular boosting methods (Adaboost and gradient boosting). Homogeneous ensemble methods have been shown to produce predictive power for credit spread forecasts (view post here), switches between risk parity strategies (paper here), stock returns(paper here), and equity reward-risk timing (view post here).

The evaluation of a trading strategy typically relies on statistical metrics. Alas, many measures are incomplete and can be outrightly misleading. An interesting concept is the discriminant ratio (‘D-ratio’), which measures an algorithm’s success in improving risk-adjusted returns versus a related buy-and-hold portfolio (view post here).