Structured Outputs in the API

Feature & Implementation

  • Structured Outputs enforces JSON-schema-shaped responses via constrained decoding: invalid next tokens are masked out during sampling.
  • Under the hood it’s similar to earlier grammar-based decoding (e.g., CFG/BNF approaches, Earley-style parsing) used in llama.cpp and libraries like Outlines/jsonformer.
  • The model can still “refuse”; in that case a separate refusal flag is set rather than bypassing safety with JSON constraints.

Reliability, Accuracy, and Limitations

  • Many see this as fixing long‑standing JSON mode / function‑calling brittleness (missing commas, bad brackets, broken schemas).
  • Others warn that correct shape ≠ correct content. LLMs can still hallucinate structured but wrong data, risking over-trust and silent database errors.
  • Some note that constrained decoding can cause degenerate but valid outputs (e.g., loops, nonsensical but schema‑valid values); model alignment to avoid this is seen as the hard part.
  • One paper (linked in thread) suggests JSON constraints may degrade reasoning performance in some tasks; several commenters say this matches their intuition.

API Design, Schemas, and strict

  • strict: true triggers grammar-constrained decoding; strict: false uses the schema more loosely.
  • Reasons given for strict: false: unsupported schema features, avoiding heavy first-call CFG preprocessing latency, and preferring fast, explicit failures over rare but slow infinite-ish loops.
  • Only JSON Schema objects are allowed at the top level (no top-level arrays), which some find annoying but others defend for extensibility and uniformity.
  • Supported schema subset is limited (no complex/long-tail features, limited field typing vs. regex/pattern support in some OSS tools).

Comparison to Prior Work & Ecosystem Impact

  • Many point out that open-source and alternative providers (llama.cpp, Outlines, vLLM, BoundaryML, others) have offered structured / grammar-constrained outputs for a year+.
  • Some criticize OpenAI for leveraging OSS ideas without open-sourcing or meaningfully funding counterparts; others reply that permissive licenses allow this and that free ChatGPT access is itself a contribution.
  • Concern about vendor lock‑in is raised, but others note similar features already exist across multiple providers, and OpenAI’s interface will likely become a de facto standard.

Model, Pricing, and Behavior Changes

  • New gpt‑4o‑2024‑08‑06 is reported as ~50% cheaper on input, ~33% cheaper on output, and supports up to 16k output tokens (vs. 4k), plus cheaper image input.
  • Some fear quality regressions in newer, cheaper models (compared to earlier GPT‑4/Turbo and competitor models), while others report better performance in their own benchmarks.
  • Several note increased verbosity in recent models; one OpenAI employee explicitly says verbosity is tuned for user satisfaction, but the new 4o variant is claimed to be less verbose than its predecessor.