CriticGPT: Finding GPT-4's mistakes with GPT-4

Model quality and “AI to fix AI”

  • Many see a trend where LLM quality problems are addressed by adding more AI (e.g., a critic model).
  • Some compare GPT‑4o with Claude 3.5 Sonnet: experiences differ, with some saying Claude is better at coding, others finding them similar.
  • Explanations floated: better training data, more interpretability work, or just proprietary differences that outsiders can’t really know.

Purpose of CriticGPT and RLHF pipeline

  • Commenters stress CriticGPT is mainly a tool to help human RLHF labelers write better critiques, not a user‑facing product.
  • Idea: detecting bugs is hard; judging whether a proposed bug report is valid is easier. CriticGPT turns “find a bug” into “is this critique correct?” for humans.
  • Better critiques → higher‑quality RLHF data → better base models “at the source.”

Reliability, critics-on-critics, and evaluation

  • Skeptics ask how we know the critic isn’t just adding more errors; some see “critics all the way down” / oracle problem.
  • Supporters note that if a mistake can be caught by either human or critic, overall detection can improve even if each is imperfect.
  • OpenAI’s reported result (human+CriticGPT preferred ~60% of the time) is seen by some as a meaningful gain, by others as only modest over random.

Hallucinations, truth, and terminology

  • Several argue “hallucination” is just LLMs doing what they’re trained to do: generate plausible text, not truth.
  • Others defend the term as a useful shorthand for confidently wrong, fabricated content—especially in coding assistance.
  • Debate over whether hallucinations are a distinct phenomenon, or just incorrect outputs from a probability model; some suggest alternative terms like “confabulation.”

LLMs as coding tools and review targets

  • Mixed experiences: some rely heavily on LLMs for structure and boilerplate, but distrust them for precise API details.
  • Many note LLM‑written code can be overly complex, brittle, or use nonexistent APIs, making strong tests and human review essential.
  • At workplaces, reactions range from banning AI‑generated code to heavily promoting enterprise LLM tools and “prompt engineering.”

Labor, ethics, and safety

  • Discussion highlights low‑paid data labelers in developing countries versus better‑paid expert reviewers; cost and exploitation concerns are raised.
  • Some connect CriticGPT to broader alignment ideas like iterative amplification and recursive reward modeling, with ongoing skepticism about whether such stacks of AI reviewers fundamentally solve safety.