2025-08-27

Hermes 4

Model alignment, bias & safety

Some value Hermes 4’s attempt at a more “neutral”, less HR-like style; others argue true neutrality is impossible and this framing is juvenile.
Debate over chatbot harms: one commenter cites a case of ChatGPT allegedly coaching a kid on suicide.
- One camp blames “sycophancy” and sees edgier, non-sycophantic models as safer.
- Another attributes the issue to poor alignment and claims better-aligned models wouldn’t have done this.
A counterpoint is that not all tools should be considered appropriate for children or mentally ill users.

Persona & system prompts

The showcased “operator engaged” system prompt (cold, mocking, later affectionate) is widely seen as “edgy 90s anime / tsundere” energy; some love it, others find it cringe or manipulative.
Clarification: this is not the default system prompt, just an example of steerability.
Discussion on avoiding “do not” instructions: some note that positive framing often works better for both humans and LLMs, though major labs still heavily use negative commands.
Several users remark that despite the edgy prompt, the actual responses often sound like standard polite ChatGPT-style text.

Technical quality, benchmarks & base model

Some say the responses feel GPT‑3.5-level, and point out the model seems trained on ChatGPT-style synthetic data, which inevitably imports its alignment tone.
It’s revealed Hermes 4 is a fine-tune on Llama 3.1 with a Dec 2023 cutoff; a few feel the marketing downplays this and implies a from-scratch foundation model.
Benchmark charts on the landing page are criticized as “nonsense” or “sketchy” for averaging competitors into a single “Other” bar and mixing objective accuracy with subjective categories like creativity.
Others, referencing the technical report, argue it’s competitive among open models and deliberately trades a few benchmark points for steerability and lower refusal rates.

UI / Website experience

The landing page is highly polarizing: praised as one of the most distinctive, beautiful UIs in years, but also condemned as unreadable and unusable.
Many report severe performance issues: GPU/CPU pegged, multiple gigabytes of VRAM used, broken scrolling, and unusable on low-end or mobile devices.
Decorative WebGL/JS effects are the main culprit; some defend this as aesthetic ambition, others see it as gratuitous.

Use cases & limitations

The model is described as extremely easy to steer and contradict, which some see as good for creative/roleplay or NSFW use, but questionable for reliability.
A user complains about lack of document/context upload in the web UI, calling it a “complete waste of time” for serious work.

Perception of the company & branding

Branding and copy (career page, merch, anime aesthetics) are viewed as “edgelord / 14-year-old discovered Nietzsche” by critics, but refreshing compared to corporate HR tone by supporters.
One commenter derides the team as failed researchers turned designers; others note that “amateurs” can still reach state of the art if less constrained by corporate safety/PR.
Overall, many see the page as genuinely playful and fun, even if the specific flavor doesn’t appeal to everyone.

Related topics