2024-05-24

Financial Statement Analysis with Large Language Models

LLMs vs Traditional Quant Methods

Several commenters note the paper’s key result: GPT‑4 with chain-of-thought matches or only slightly outperforms a decades-old 3‑layer neural net using 59 hand-crafted variables, with overlapping confidence intervals.
Some see this as underwhelming and evidence that LLMs are not yet state of the art for prediction; others think it’s notable that a general model can approach specialized models without domain-specific training.
Practitioners emphasize the benchmark is far from current proprietary methods and that serious quant trading models have advanced substantially since the 1980s and are kept private.

Text Analysis, Sentiment, and Gaming the System

Commenters outline a historical arc:
- Diffing management statements quarter-to-quarter.
- Simple positive/negative word counts.
- More sophisticated sentiment models on earnings calls, news, and social media.
Each stage initially produced alpha but was gradually gamed by executives and polluted by noisy data (e.g., hacked news accounts, name confusions).
Many expect Goodhart’s law to apply: if LLM-based analysis becomes common, firms will optimize wording and structure to score well with models, eroding any edge.
There is discussion of “poisoning” statements to mislead LLMs while staying factual, with mixed views on feasibility.

Capabilities, Limitations, and Risks of LLMs

Concerns include weak arithmetic, hallucinations, and the paper’s lack of discussion of these issues.
Some argue an LLM can only remix existing strategies, may hallucinate plausible-sounding but flawed advice, and still requires expert oversight.
Others note that even “non-state-of-the-art” but general tools are useful for many users who can’t build custom models.
Several worry that over-reliance on LLMs could atrophy human expertise, yet also predict that obviously bad performance will push firms back to professionals.

Civic and Non-Wall-Street Uses

Strong enthusiasm for using LLMs to summarize and interrogate complex documents: municipal budgets, local financial statements, medical reports.
Hopes include enabling citizens, regulators, or rating agencies to spot waste or corruption more easily.
Skeptics question whether lack of oversight is really an information problem versus apathy, weak institutions, and limited channels to act even when problems are exposed.

Markets, Trading, and Ethics

Commenters debate whether finance/trading is largely zero-sum and ethically dubious versus a legitimate mechanism for price discovery, liquidity, and capital allocation.
There is consensus that profitable trading and analysis methods are not shared; anyone selling “magic strategies” is viewed with suspicion.

Related topics