Adversarial policies beat superhuman Go AIs (2023)
Human playstyles, intuition, and ratings
- Some compare the Go exploit to humans using bizarre, unpredictable play or obscure chess openings to push opponents out of preparation.
- Discussion on human strengths: memory, calculation, and especially intuition; top players differ in which they excel at.
- Several comments debate Elo/ladder systems:
- One side enjoys 50% win rates and evenly matched games, valuing improvement and quality over sheer winning.
- Others dislike systems where win rate stabilizes, preferring formats (like tournaments or “playing down”) that reward visible progress.
- There’s tension between “winning is the fun part” vs “learning and good games are the fun part.”
Nature of the Go adversarial attack
- The attack creates positions where superhuman Go AIs mis-evaluate life-and-death, especially long “dead-man walking” situations where a group is effectively dead but not yet captured.
- The adversary often plays slightly suboptimal moves to keep the AI “confused” instead of cashing in an obvious win, because exposing the true status might let the AI recover.
- There are two main strategies discussed: a “pass adversary” exploiting a particular formal ruleset, and a “cyclic adversary” based on wrapped, circular groups.
Go rules, ladders, and persistent weaknesses
- Part of the controversy centers on rulesets:
- Some say the “pass” attack mainly abuses an artificial Tromp–Taylor variant without dead-stone removal, which is not how AIs typically play humans.
- Others argue the evaluation should match the rules the AI is configured for, regardless of human conventions.
- The more respected “cyclic” attack reveals genuine misreads of group status, not just rules quirks.
- Separate thread on ladders: early Go AIs and even modern NNs struggle with long, mechanical ladder sequences, prompting hard-coded ladder solvers.
- KataGo developers reportedly patched some cyclic flaws via extra training and larger networks, but expect other, harder-to-find flaws will always exist.
Broader AI reliability and “superhuman” claims
- Several commenters highlight that “superhuman” at a game does not mean robust or generally intelligent; narrow AIs can still have brittle, surprising failure modes.
- Some see the paper as important evidence that future powerful systems will also harbor unknown vulnerabilities; others call the conclusion empty or overgeneralized.
- A later defense paper is noted: defenses can stop known attacks but fail against newly trained adversaries, suggesting an ongoing arms race.
Parallels to chess engines and other games
- Chess examples (fortresses, locked structures) show top engines mis-evaluating drawn positions that humans see as clearly unwinnable, underscoring that engines rely on search and heuristics rather than human-like constraint reasoning.
- Past experience with anti-computer strategies in chess is cited as an analogy: you can target the evaluation function, but enough compute and better training typically overcome such tricks.
LLMs, hallucinations, and adversarial prompts
- Some draw analogies between Go adversarial policies and LLM “hallucinations” and jailbreaks.
- Debate over terminology:
- One view treats hallucinations as attempts to extrapolate from data.
- Another stresses they’re simply outputs violating constraints (e.g., fake cases, unsafe recipes) and not real “reasoning.”
- Adversarial attacks on LLMs are noted as an active research area, reinforcing the general theme: powerful models can be steered into failure regimes their creators didn’t anticipate.