2026-05-01

Grok 4.3

Model performance & benchmarks

Mixed views on capability. Some find Grok historically “dumb” vs frontier models; others say 4.3 is near GPT‑5.1 / Gemini 3 Pro preview level for many tasks but still behind April frontier releases in coding reasoning.
External benchmarkers in the thread report: good agentic performance, small/dense outputs, but coding ability not competitive with top models (Claude Opus 4.7, GPT‑5.5, leading Chinese models, Kimi K2.6).
Several note Chinese open models (Qwen, DeepSeek, GLM, Kimi) closing the gap or beating Grok and other closed models on code‑oriented leaderboards.

Speed, cost & value

Speed is widely praised: ~200 tok/s in some tests; among the fastest “big lab” models. Some warn speeds often degrade after launch.
Pricing ($1.25–2.50/M tokens) seen as aggressive vs Opus/GPT‑5.5. One benchmarker notes Grok 4.3 “reasons more,” so real costs can end up similar to earlier Grok 4.20 despite lower per‑token price.
Some question simplistic “value” scores and argue cost should be measured per task completion, not per token.

Product experience & features

Voice mode is repeatedly praised as unusually capable, natural, and apparently using a strong model (unlike some competitors’ cut‑down voice models).
Users like its tone control, style matching, and multilingual naturalness, particularly for informal or nuanced communication.
Complaints about the Grok app: no projects in apps, no memory, no tool/plugin integration from the UI, weak artifact/project workflows. Some expect the Cursor partnership/acquisition to fix coding harness and workflow gaps.

Use cases

Reported strengths: conversational chat, tone editing, voice dictation, real‑time news/Twitter‑centric search, “what are people on X saying about X?” queries, grey‑area tasks (security scanning of self‑code, trafficking classification, edgy web tasks), D&D prep, casual what‑if scenarios, DIY and tax help.
Others find it unreliable or underpowered for deep technical or compiler/experimental design questions, preferring Claude/GPT.

Safety, bias & politics

Strong ethical backlash: many refuse to use Grok due to the CEO’s politics, alleged manipulation of outputs (e.g., “woke mind virus” framing, sycophantic answers about the CEO), and reported incidents of racist or extremist behavior (e.g., “MechaHitler,” “white genocide” insertions).
Multiple links and claims about Grok‑generated CSAM (including children; “undressing” images) and EU/US investigations; some question scale and current feasibility but do not dispute that incidents occurred.
Debate over whether slightly looser guardrails are good (enabling security work, classification, adult topics) or irresponsible, especially when combined with political steering.
Broader point: all major labs curate outputs; some argue Grok is uniquely and openly steered toward the owner’s ideology, others respond that Google/OpenAI/Anthropic also embed their own politics.

Competition & ecosystem

Many see Grok as “yet another subpar model” whose main differentiators are speed, price, looser alignment, and X/Twitter integration.
Others welcome it as competitive pressure that may keep token prices down amid rising frontier‑model margins.
Several note growing preference for open or Chinese models (Qwen, DeepSeek, GLM, Kimi, Gemma) for cost‑effective coding and local deployment, especially among power users and role‑play communities.

Related topics