AI cybersecurity and digital ethics Systemic risk Valuation and portfolio optimisation ESG integration

Empirical Asset Pricing via Machine Learning

DOIhttps://doi.org/10.1093/rfs...
Article publication dateFebruary 14, 2020
Post publication dateSeptember 1, 2025

Can machine learning models improve traditional asset pricing strategies?

In Empirical Asset Pricing via Machine Learning, Shihao Gu, Bryan Kelly, and Dacheng Xiu test a variety of machine learning algorithms to predict stock returns and pit them against traditional linear models.

Using over 900 potential return predictors, they evaluate these methods on both cross-sectional stock selection and time-series market timing tasks and conclude:

Expanding a linear model to include 900+ predictors causes it to overfit disastrously (out-of-sample R² < 0), but applying penalisation or dimensionality reduction recovers modest predictability (~0.26% monthly R²).
Allowing nonlinear interactions like tree-based and neural network models boost stock-level predictive power to ~0.33–0.40% R², a significant improvement given the noisy nature of returns.
Market timing with ML adds substantial value: a neural-network strategy achieved an out-of-sample Sharpe ratio of 0.77, versus 0.51 for a static S&P 500).
A long–short decile portfolio of stocks ranked by a neural network's predictions earned Sharpe ratios of 1.35 (value-weighted) and 2.45 (equal-weighted), more than double the 0.61 and 0.83 achieved by a linear model.
Despite using hundreds of features, all methods consistently identify the same top predictive signals as the dominant predictors of stock returns: momentum, liquidity, and volatility.
ML's predictive edge is broad-based: neural networks achieved R² gains even for large-cap stocks (~0.5–0.7% monthly), not just in small-cap niches.

The best-performing algorithms are nonlinear models (trees and neural nets) that capture complex predictor interactions missed by linear approaches.

Importantly, the algorithms consistently rediscover known factors like momentum, liquidity, and volatility and confirm the primacy of these traditional return drivers.

Despite the promising results, even the best ML models explain only a small fraction of return variance (monthly R² well below 1%), as most fluctuations are driven by unpredictable news.

Moreover, high-turnover strategies could in practice face liquidity constraints and trading costs that erode their excess returns, especially when involving small cap stocks. The highest Sharpe (2.45) came from an equal-weighted strategy that heavily tilts toward micro-cap stocks, which may be impossible at scale due to market impact.

Partnerships

Partnerships

Empirical Asset Pricing via Machine Learning