CriticGPT: Finding GPT-4's mistakes with GPT-4
Model quality and “AI to fix AI”
- Many see a trend where LLM quality problems are addressed by adding more AI (e.g., a critic model).
- Some compare GPT‑4o with Claude 3.5 Sonnet: experiences differ, with some saying Claude is better at coding, others finding them similar.
- Explanations floated: better training data, more interpretability work, or just proprietary differences that outsiders can’t really know.
Purpose of CriticGPT and RLHF pipeline
- Commenters stress CriticGPT is mainly a tool to help human RLHF labelers write better critiques, not a user‑facing product.
- Idea: detecting bugs is hard; judging whether a proposed bug report is valid is easier. CriticGPT turns “find a bug” into “is this critique correct?” for humans.
- Better critiques → higher‑quality RLHF data → better base models “at the source.”
Reliability, critics-on-critics, and evaluation
- Skeptics ask how we know the critic isn’t just adding more errors; some see “critics all the way down” / oracle problem.
- Supporters note that if a mistake can be caught by either human or critic, overall detection can improve even if each is imperfect.
- OpenAI’s reported result (human+CriticGPT preferred ~60% of the time) is seen by some as a meaningful gain, by others as only modest over random.
Hallucinations, truth, and terminology
- Several argue “hallucination” is just LLMs doing what they’re trained to do: generate plausible text, not truth.
- Others defend the term as a useful shorthand for confidently wrong, fabricated content—especially in coding assistance.
- Debate over whether hallucinations are a distinct phenomenon, or just incorrect outputs from a probability model; some suggest alternative terms like “confabulation.”
LLMs as coding tools and review targets
- Mixed experiences: some rely heavily on LLMs for structure and boilerplate, but distrust them for precise API details.
- Many note LLM‑written code can be overly complex, brittle, or use nonexistent APIs, making strong tests and human review essential.
- At workplaces, reactions range from banning AI‑generated code to heavily promoting enterprise LLM tools and “prompt engineering.”
Labor, ethics, and safety
- Discussion highlights low‑paid data labelers in developing countries versus better‑paid expert reviewers; cost and exploitation concerns are raised.
- Some connect CriticGPT to broader alignment ideas like iterative amplification and recursive reward modeling, with ongoing skepticism about whether such stacks of AI reviewers fundamentally solve safety.