2024-12-31

Show HN: Watch 3 AIs compete in real-time stock trading

Project setup & data

System runs three LLMs (GPT‑4o, Gemini 1.5 Pro, Claude 3 Sonnet) that each pick one stock daily.
News source: latest ~50 market articles from Alpaca News API; trading via Alpaca with $5 per trade using fractional shares where supported, currently U.S. stocks only.
Only long buys are implemented so far; no shorting; most positions are still open, so only unrealized P/L exists.

Prompting & trading logic

Prompting includes explicit “market analyst” role, sector diversification, and focus on “hidden gems” vs mega‑caps.
Models must output structured JSON, justify a thesis, specify catalysts (earnings, FDA dates, launches, conferences), and give a precise holding period.
Holding periods are currently set once at purchase and not updated with new information; some see this as a key next improvement.
Prompts bias toward buying because they explicitly ask for a stock to buy and a holding period; users notice divergence from ad‑hoc ChatGPT answers.

Benchmarks, controls & evaluation

Multiple commenters call for benchmarks: S&P 500 (e.g., VOO), leveraged ETFs (e.g., TQQQ), and random or “monkey” bots as controls.
Others argue you’d need many independent runs to estimate Sharpe ratios; one run of three bots is statistically weak.
Debate around comparing to hedge funds and quant shops, with conflicting claims about realistic Sharpe ratios and long‑term returns.

Skepticism, risks & limitations

Many expect daily forced trading to underperform due to fees, slippage, and lack of an edge, citing research that most day traders lose money.
Some see the experiment as unscientific entertainment; others still find it a valuable “real‑world eval.”
Concern that LLMs may hallucinate financial narratives (e.g., a fictitious “Phase 3 Bitcoin ETF trial”) and favor trendy themes like crypto/AI.
Discussion of alpha decay: any consistently winning strategy would lose its edge once widely copied.

Technical & UX feedback

Users report UI quirks (scrolling issues) and repeated newsletter email bugs (bad verification URLs, rate limits, duplicate mailings).
Suggestions: show unrealized gains in headline stats, expose more of the analysis process, add countdown to next trade, show fractional share amounts.
Some request open‑sourcing code and support for more or newer models (e.g., Gemini experimental, o1, Llama via LiteLLM).

Related topics