Structured Outputs in the API
Feature & Implementation
- Structured Outputs enforces JSON-schema-shaped responses via constrained decoding: invalid next tokens are masked out during sampling.
- Under the hood it’s similar to earlier grammar-based decoding (e.g., CFG/BNF approaches, Earley-style parsing) used in llama.cpp and libraries like Outlines/jsonformer.
- The model can still “refuse”; in that case a separate refusal flag is set rather than bypassing safety with JSON constraints.
Reliability, Accuracy, and Limitations
- Many see this as fixing long‑standing JSON mode / function‑calling brittleness (missing commas, bad brackets, broken schemas).
- Others warn that correct shape ≠ correct content. LLMs can still hallucinate structured but wrong data, risking over-trust and silent database errors.
- Some note that constrained decoding can cause degenerate but valid outputs (e.g., loops, nonsensical but schema‑valid values); model alignment to avoid this is seen as the hard part.
- One paper (linked in thread) suggests JSON constraints may degrade reasoning performance in some tasks; several commenters say this matches their intuition.
API Design, Schemas, and strict
strict: truetriggers grammar-constrained decoding;strict: falseuses the schema more loosely.- Reasons given for
strict: false: unsupported schema features, avoiding heavy first-call CFG preprocessing latency, and preferring fast, explicit failures over rare but slow infinite-ish loops. - Only JSON Schema objects are allowed at the top level (no top-level arrays), which some find annoying but others defend for extensibility and uniformity.
- Supported schema subset is limited (no complex/long-tail features, limited field typing vs. regex/pattern support in some OSS tools).
Comparison to Prior Work & Ecosystem Impact
- Many point out that open-source and alternative providers (llama.cpp, Outlines, vLLM, BoundaryML, others) have offered structured / grammar-constrained outputs for a year+.
- Some criticize OpenAI for leveraging OSS ideas without open-sourcing or meaningfully funding counterparts; others reply that permissive licenses allow this and that free ChatGPT access is itself a contribution.
- Concern about vendor lock‑in is raised, but others note similar features already exist across multiple providers, and OpenAI’s interface will likely become a de facto standard.
Model, Pricing, and Behavior Changes
- New
gpt‑4o‑2024‑08‑06is reported as ~50% cheaper on input, ~33% cheaper on output, and supports up to 16k output tokens (vs. 4k), plus cheaper image input. - Some fear quality regressions in newer, cheaper models (compared to earlier GPT‑4/Turbo and competitor models), while others report better performance in their own benchmarks.
- Several note increased verbosity in recent models; one OpenAI employee explicitly says verbosity is tuned for user satisfaction, but the new 4o variant is claimed to be less verbose than its predecessor.