Show HN: Watch 3 AIs compete in real-time stock trading
Project setup & data
- System runs three LLMs (GPT‑4o, Gemini 1.5 Pro, Claude 3 Sonnet) that each pick one stock daily.
- News source: latest ~50 market articles from Alpaca News API; trading via Alpaca with $5 per trade using fractional shares where supported, currently U.S. stocks only.
- Only long buys are implemented so far; no shorting; most positions are still open, so only unrealized P/L exists.
Prompting & trading logic
- Prompting includes explicit “market analyst” role, sector diversification, and focus on “hidden gems” vs mega‑caps.
- Models must output structured JSON, justify a thesis, specify catalysts (earnings, FDA dates, launches, conferences), and give a precise holding period.
- Holding periods are currently set once at purchase and not updated with new information; some see this as a key next improvement.
- Prompts bias toward buying because they explicitly ask for a stock to buy and a holding period; users notice divergence from ad‑hoc ChatGPT answers.
Benchmarks, controls & evaluation
- Multiple commenters call for benchmarks: S&P 500 (e.g., VOO), leveraged ETFs (e.g., TQQQ), and random or “monkey” bots as controls.
- Others argue you’d need many independent runs to estimate Sharpe ratios; one run of three bots is statistically weak.
- Debate around comparing to hedge funds and quant shops, with conflicting claims about realistic Sharpe ratios and long‑term returns.
Skepticism, risks & limitations
- Many expect daily forced trading to underperform due to fees, slippage, and lack of an edge, citing research that most day traders lose money.
- Some see the experiment as unscientific entertainment; others still find it a valuable “real‑world eval.”
- Concern that LLMs may hallucinate financial narratives (e.g., a fictitious “Phase 3 Bitcoin ETF trial”) and favor trendy themes like crypto/AI.
- Discussion of alpha decay: any consistently winning strategy would lose its edge once widely copied.
Technical & UX feedback
- Users report UI quirks (scrolling issues) and repeated newsletter email bugs (bad verification URLs, rate limits, duplicate mailings).
- Suggestions: show unrealized gains in headline stats, expose more of the analysis process, add countdown to next trade, show fractional share amounts.
- Some request open‑sourcing code and support for more or newer models (e.g., Gemini experimental, o1, Llama via LiteLLM).