2024-10-03

FLUX1.1 [pro] – New SotA text-to-image model from Black Forest Labs

Overall impressions and SOTA claims

Many commenters find Flux (including 1.1 Pro and earlier dev/schnell) to be the best text‑to‑image model they’ve used, especially for prompt adherence and complex scenes.
Several people say it outperforms SDXL, DALL‑E 3, and Midjourney for following detailed prompts, and ELO leaderboards are cited as evidence.
Others push back on the “state of the art” marketing language, arguing it’s vague, overused jargon even if the underlying metrics are strong.

Censorship, faces, and ethical constraints

Users report that Flux tends to converge on a “flux face” archetype, with limited diversity in realistic men and women; celebrity names have weak influence.
Multiple comments claim the model is notably worse at following detailed descriptions of women’s faces than men’s, and that this is likely due to dataset captioning and anti‑lewd tuning, not technical limits.
Some complain this “Big Prude” style censorship makes it harder to create non‑lewd female characters for games and art.
Broader frustration exists with opaque, proprietary safety policies (Flux, Ideogram, DALL‑E, etc.) and blocked “copyrighted” content; others defend companies’ need to manage legal/PR risk.

Training data and artistic styles

Several users say Flux is weak on “art‑art”: impressionist painters (e.g., Degas) and specific photographic styles, even when public‑domain sources exist.
Hypothesis: broad removal of artist/photographer names and certain datasets; Flux often returns anime‑ish or generic illustration when asked for painterly oil styles.
LoRAs are seen as the main workaround; detailed workflows and costs for training Flux LoRAs are discussed. Some success with illustration styles, less with classic painting.
One view: narrower, more homogeneous training data might be part of why Flux is so consistent.

Local running, tools, and performance

Flux.dev/schnell can run locally but are resource‑hungry; unquantized models fit best on 24 GB GPUs, though people report success on 10–12 GB with slow or quantized runs.
Quantized GGUF variants (e.g., Q8_0) are praised as nearly indistinguishable visually while fitting on more hardware.
Recommended tooling: ComfyUI, Forge, InvokeAI, Hugging Face Diffusers, Draw Things (Mac), DiffusionBee (with caveats about stale source), stable‑diffusion.cpp.
Some share environment tweaks to avoid CUDA OOM, and note that modern Macs with sufficient unified memory run Flux well.

Model behavior, creativity, and biases

Flux excels at literal, detailed prompt adherence and associative understanding (e.g., “Friends” → “Central Perk”, metaphorical “rose of passion”), but some miss earlier models’ more surprising, under‑specified outputs.
Users observe persistent weaknesses: periodic structures (keyboards), accurate missile/rocket launchers, and some instruments, though accordions seem improved.
One commenter notes gendered bias in outputs (e.g., “someone playing accordion” often yields a woman).
There is concern that many users mainly want NSFW content and that mainstream models’ restrictions push them to uncensored services or local models.

Related topics