FLUX.2: Frontier Visual Intelligence

Competition and Positioning

  • Many see FLUX.2 as much-needed competition to Google’s new image model (“Nano Banana Pro”) and Chinese offerings, especially valuable for Europe and regions where US services (OpenAI, Google, Anthropic) are restricted.
  • There’s debate on “openness”: weights are downloadable and a VAE is Apache 2.0, but the main FLUX.2-dev model is non‑commercial and IP-filtered, so commenters stress it’s “open weights,” not open source.
  • Some argue BFL should have waited for their fully Apache 2.0 distilled model, especially given Alibaba/Qwen and other Chinese models that are both strong and more permissively licensed.

Architecture, Size, and Local Use

  • FLUX.2 switches to a large multimodal text encoder (Mistral-Small 24B) instead of the previous CLIP+T5 setup; several say CLIP contributed little in prior models.
  • The text encoder (~48 GB) plus ~64 GB for the 32B generator makes >100 GB of weights; running full precision locally is hard except on very high‑end or multi‑GPU setups.
  • NVIDIA/ComfyUI fp8 optimizations and VRAM–RAM swapping reportedly let a 4090/5090 run it (slowly, ~1 minute for 1024×1024). Quantized variants (e.g., 4‑bit ~18 GB) are emerging, but quality impact is unknown.

Quality, Aesthetics, and Benchmarks

  • Some users praise FLUX.2’s naturalistic look and understanding; others find outputs plasticky with “AI aura,” especially skin and faces, and clearly below Midjourney and even SDXL for aesthetics.
  • Benchmarks shared in the thread place FLUX.2 Pro roughly middle-of-the-pack for image editing, only slightly better than BFL’s older Kontext model, and behind Google’s model on many tasks.
  • Strengths: better prompt adherence than FLUX 1.x, JSON-structured prompts, hex color control, and optional “prompt upsampling” via an LLM to improve reasoning-heavy prompts.
  • Weaknesses: struggles with some editing tasks (e.g., TV stills, line-art coloring), costly multi-image reference use, and inconsistent style transfer. High resolution can introduce unwanted “upscale-like” artifacts.

Pricing and Business Strategy

  • Pricing per megapixel (including per-input-image fees) is widely criticized; adding reference images quickly makes FLUX.2 Pro more expensive than Google’s model.
  • BFL is seen as pivoting from an abandoned/paused video line to focus on images, with arguments that image models are more foundational and controllable for now.
  • Some worry BFL is getting squeezed between hyperscalers and Chinese labs; others point to large enterprise deals and developer focus as evidence they’re doing well.