FLUX.2: Frontier Visual Intelligence
Competition and Positioning
- Many see FLUX.2 as much-needed competition to Google’s new image model (“Nano Banana Pro”) and Chinese offerings, especially valuable for Europe and regions where US services (OpenAI, Google, Anthropic) are restricted.
- There’s debate on “openness”: weights are downloadable and a VAE is Apache 2.0, but the main FLUX.2-dev model is non‑commercial and IP-filtered, so commenters stress it’s “open weights,” not open source.
- Some argue BFL should have waited for their fully Apache 2.0 distilled model, especially given Alibaba/Qwen and other Chinese models that are both strong and more permissively licensed.
Architecture, Size, and Local Use
- FLUX.2 switches to a large multimodal text encoder (Mistral-Small 24B) instead of the previous CLIP+T5 setup; several say CLIP contributed little in prior models.
- The text encoder (~48 GB) plus ~64 GB for the 32B generator makes >100 GB of weights; running full precision locally is hard except on very high‑end or multi‑GPU setups.
- NVIDIA/ComfyUI fp8 optimizations and VRAM–RAM swapping reportedly let a 4090/5090 run it (slowly, ~1 minute for 1024×1024). Quantized variants (e.g., 4‑bit ~18 GB) are emerging, but quality impact is unknown.
Quality, Aesthetics, and Benchmarks
- Some users praise FLUX.2’s naturalistic look and understanding; others find outputs plasticky with “AI aura,” especially skin and faces, and clearly below Midjourney and even SDXL for aesthetics.
- Benchmarks shared in the thread place FLUX.2 Pro roughly middle-of-the-pack for image editing, only slightly better than BFL’s older Kontext model, and behind Google’s model on many tasks.
- Strengths: better prompt adherence than FLUX 1.x, JSON-structured prompts, hex color control, and optional “prompt upsampling” via an LLM to improve reasoning-heavy prompts.
- Weaknesses: struggles with some editing tasks (e.g., TV stills, line-art coloring), costly multi-image reference use, and inconsistent style transfer. High resolution can introduce unwanted “upscale-like” artifacts.
Pricing and Business Strategy
- Pricing per megapixel (including per-input-image fees) is widely criticized; adding reference images quickly makes FLUX.2 Pro more expensive than Google’s model.
- BFL is seen as pivoting from an abandoned/paused video line to focus on images, with arguments that image models are more foundational and controllable for now.
- Some worry BFL is getting squeezed between hyperscalers and Chinese labs; others point to large enterprise deals and developer focus as evidence they’re doing well.