2024-06-27

CriticGPT: Finding GPT-4's mistakes with GPT-4

Model quality and “AI to fix AI”

Many see a trend where LLM quality problems are addressed by adding more AI (e.g., a critic model).
Some compare GPT‑4o with Claude 3.5 Sonnet: experiences differ, with some saying Claude is better at coding, others finding them similar.
Explanations floated: better training data, more interpretability work, or just proprietary differences that outsiders can’t really know.

Purpose of CriticGPT and RLHF pipeline

Commenters stress CriticGPT is mainly a tool to help human RLHF labelers write better critiques, not a user‑facing product.
Idea: detecting bugs is hard; judging whether a proposed bug report is valid is easier. CriticGPT turns “find a bug” into “is this critique correct?” for humans.
Better critiques → higher‑quality RLHF data → better base models “at the source.”

Reliability, critics-on-critics, and evaluation

Skeptics ask how we know the critic isn’t just adding more errors; some see “critics all the way down” / oracle problem.
Supporters note that if a mistake can be caught by either human or critic, overall detection can improve even if each is imperfect.
OpenAI’s reported result (human+CriticGPT preferred ~60% of the time) is seen by some as a meaningful gain, by others as only modest over random.

Hallucinations, truth, and terminology

Several argue “hallucination” is just LLMs doing what they’re trained to do: generate plausible text, not truth.
Others defend the term as a useful shorthand for confidently wrong, fabricated content—especially in coding assistance.
Debate over whether hallucinations are a distinct phenomenon, or just incorrect outputs from a probability model; some suggest alternative terms like “confabulation.”

LLMs as coding tools and review targets

Mixed experiences: some rely heavily on LLMs for structure and boilerplate, but distrust them for precise API details.
Many note LLM‑written code can be overly complex, brittle, or use nonexistent APIs, making strong tests and human review essential.
At workplaces, reactions range from banning AI‑generated code to heavily promoting enterprise LLM tools and “prompt engineering.”

Labor, ethics, and safety

Discussion highlights low‑paid data labelers in developing countries versus better‑paid expert reviewers; cost and exploitation concerns are raised.
Some connect CriticGPT to broader alignment ideas like iterative amplification and recursive reward modeling, with ongoing skepticism about whether such stacks of AI reviewers fundamentally solve safety.

Related topics