AI cybersecurity and digital ethics Valuation and portfolio optimisation

The Promise and Peril of Generative AI: Evidence from GPT as Sell-Side Analysts

Can AI-powered financial analysis be trusted?

Edward Xuejun Li, Min Shen, Zhiyuan Tu, and Dexin Zhou study this question in their paper "The Promise and Peril of Generative AI: Evidence from GPT as Sell-Side Analysts".

They analyse 7,114 earnings press releases from 1,000 firms, prompting GPT to identify key sentences, extract financial metrics, and forecast one-year-ahead earnings per share.

Their goal is to explore whether GPT produces reliable earnings forecasts and whether observable processing signals predict output quality. Their main conclusions include:

  • GPT's narrative attention is consistent and human-like: it prioritises readable, forward-looking, and negatively toned sentences, closely mirroring patterns of investor attention in the literature.
  • GPT's alignment with analyst-identified metrics improves when financial disclosures are more detailed and when firms have broader analyst coverage.
  • Forecast accuracy improves when GPT selects more readable and forward-looking sentences but deteriorates when it emphasises negatively toned language.
  • A one standard deviation increase in GPT-analyst metric alignment corresponds to a 16.8% lower forecast error, which underscores the significance of the model's quantitative reasoning skills.
  • GPT's self-assessed confidence scores carry informational value but are imperfectly calibrated, as it does not systematically decline after the model's knowledge cutoff.
  • A composite diagnostic framework aggregating narrative focus, metric alignment, and confidence proposed in the article offers a practical tool to screen model outputs on forecast accuracy.

These findings highlight LLMs lower the cost of processing corporate disclosures but introduce the new challenge of evaluating the quality and reliability of AI-generated outputs.

Notably, GPT outperforms analysts in more than one-quarter of firm-quarters, suggesting meaningful potential when conditions are favourable.

Models however have limited awareness of their own informational constraints, so investors who cannot recognise when model reliability weakens risk making poorly informed decisions.

As a limitation, the study captures only one dimension of GPT's quantitative reasoning, and future refinements may reveal additional processing signals linked to accuracy.

Pre-cutoff performance may also benefit from what the authors call a "time-traveller's advantage," where GPT draws on information unavailable to other analysts at the same time.