Can an algorithm reading corporate filings build a better investment index than professional managers?
Lennart Ante and Aman Saggu study AI stock classification and index construction in their paper "Quantifying a firm's AI engagement: Constructing objective, data-driven, AI stock indices using 10-K filings".
They apply natural language processing to annual 10-K filings from 3,395 NASDAQ-listed firms between 2010 and 2022, deriving binary and weighted AI engagement scores for each firm.
Their main conclusions include:
- AI mentions in corporate 10-K filings grew dramatically over the sample period, rising from 7-11 filings per year between 2010 and 2015 to 527 in 2022.
- Companies classified as AI stocks earned cumulative average abnormal returns of 17.25% in the three months following ChatGPT's launch in November 2022, compared with 11.59% for non-AI stocks.
- AI index weights are significant positive predictors of abnormal returns, as indices based on more recent disclosures show stronger predictive power than those placing greater weight on historical AI communications.
- The four NLP-based indices outperform existing AI-themed ETFs, delivering a mean daily return of 0.076% versus 0.056% without exhibiting higher volatility.
- The NLP indices achieved mean Sharpe and Sortino ratios of 0.039 and 0.037, compared with 0.029 and 0.028 for existing AI ETFs, which is a more efficient return generation per unit of risk.
- Index providers may be overcharging investors for products that an NLP-based approach can match or exceed at a fraction of the cost, as there is no positive correlation between expense ratios and daily returns among 14 AI-themed ETFs.
For index providers and ETF sponsors, these findings make a compelling case for replacing subjective asset-selection criteria with transparent, disclosure-based metrics.
As a limitation, the methodology relies on keyword frequency in 10-K filings, which cannot distinguish substantive AI integration from superficial or aspirational mentions.
The study is also confined to NASDAQ-listed U.S. companies, limiting generalisability to other markets and reporting regimes.