Adversarial policies beat superhuman Go AIs (2023)

Human playstyles, intuition, and ratings

  • Some compare the Go exploit to humans using bizarre, unpredictable play or obscure chess openings to push opponents out of preparation.
  • Discussion on human strengths: memory, calculation, and especially intuition; top players differ in which they excel at.
  • Several comments debate Elo/ladder systems:
    • One side enjoys 50% win rates and evenly matched games, valuing improvement and quality over sheer winning.
    • Others dislike systems where win rate stabilizes, preferring formats (like tournaments or “playing down”) that reward visible progress.
    • There’s tension between “winning is the fun part” vs “learning and good games are the fun part.”

Nature of the Go adversarial attack

  • The attack creates positions where superhuman Go AIs mis-evaluate life-and-death, especially long “dead-man walking” situations where a group is effectively dead but not yet captured.
  • The adversary often plays slightly suboptimal moves to keep the AI “confused” instead of cashing in an obvious win, because exposing the true status might let the AI recover.
  • There are two main strategies discussed: a “pass adversary” exploiting a particular formal ruleset, and a “cyclic adversary” based on wrapped, circular groups.

Go rules, ladders, and persistent weaknesses

  • Part of the controversy centers on rulesets:
    • Some say the “pass” attack mainly abuses an artificial Tromp–Taylor variant without dead-stone removal, which is not how AIs typically play humans.
    • Others argue the evaluation should match the rules the AI is configured for, regardless of human conventions.
  • The more respected “cyclic” attack reveals genuine misreads of group status, not just rules quirks.
  • Separate thread on ladders: early Go AIs and even modern NNs struggle with long, mechanical ladder sequences, prompting hard-coded ladder solvers.
  • KataGo developers reportedly patched some cyclic flaws via extra training and larger networks, but expect other, harder-to-find flaws will always exist.

Broader AI reliability and “superhuman” claims

  • Several commenters highlight that “superhuman” at a game does not mean robust or generally intelligent; narrow AIs can still have brittle, surprising failure modes.
  • Some see the paper as important evidence that future powerful systems will also harbor unknown vulnerabilities; others call the conclusion empty or overgeneralized.
  • A later defense paper is noted: defenses can stop known attacks but fail against newly trained adversaries, suggesting an ongoing arms race.

Parallels to chess engines and other games

  • Chess examples (fortresses, locked structures) show top engines mis-evaluating drawn positions that humans see as clearly unwinnable, underscoring that engines rely on search and heuristics rather than human-like constraint reasoning.
  • Past experience with anti-computer strategies in chess is cited as an analogy: you can target the evaluation function, but enough compute and better training typically overcome such tricks.

LLMs, hallucinations, and adversarial prompts

  • Some draw analogies between Go adversarial policies and LLM “hallucinations” and jailbreaks.
  • Debate over terminology:
    • One view treats hallucinations as attempts to extrapolate from data.
    • Another stresses they’re simply outputs violating constraints (e.g., fake cases, unsafe recipes) and not real “reasoning.”
  • Adversarial attacks on LLMs are noted as an active research area, reinforcing the general theme: powerful models can be steered into failure regimes their creators didn’t anticipate.