Computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
Computer Use: What It Is and How It Works
- Models can now control a sandboxed desktop via screenshots + mouse/keyboard actions in a loop.
- Reference implementation uses Docker/VMs; not a native “Claude Desktop” app.
- It can scroll, click, type, open apps/browsers, and even persist through slow app startups, but struggles with finer actions like dragging/zooming.
- Many see this as ideal for GUI-based automation, end-to-end tests, and “agents that actually do work,” including on legacy Windows/Mac apps.
Privacy, Security, and Safety Concerns
- Strong worries about sending screenshots and granting remote control, especially on real workstations with PII/PHI or corporate data.
- Multiple commenters advocate strict sandboxing (VMs, remote desktops, limited accounts) and “read-only” or confirm-before-click modes.
- People anticipate incidents: accidental deletion, being tricked by phishing UIs, or exfiltration of sensitive data.
- Some see this as a likely way CAPTCHAs and web anti-bot defenses will be bypassed.
RPA, Legacy Software, and Accessibility
- Widely compared to Robotic Process Automation (UiPath, etc.): same idea of automating GUIs when no clean API exists.
- Many note this may be the only practical way to integrate with entrenched, GUI-only enterprise tools (medical, tax, ERP, banking).
- Others highlight accessibility potential: AI as a powerful screen-reader / voice-driven operator for people with visual or motor impairments.
Model Quality: Coding and Reasoning
- New Claude 3.5 Sonnet (“New”/20241022) is reported to be much better at coding than GPT-4o by several users, with fewer hallucinations and cleaner Python/Rust.
- Benchmarks cited: big gains on SWE-bench Verified and Aider’s coding/refactor leaderboards; competitive but below o1-preview on some reasoning tests.
- Haiku 3.5 is said to reach roughly prior Opus-level performance at much lower cost, though pricing vs 4o-mini draws some criticism.
Versioning, Product Positioning, and UX
- Heavy confusion/annoyance over naming: “Claude 3.5 Sonnet (New)” instead of 3.6 or 4.0, plus dated model IDs.
- Opus 3.5’s status is unclear; some think Sonnet 3.5 has effectively displaced Opus 3.0 for most tasks.
- Rate limits on the chat UI frustrate frequent users; many route through APIs or third‑party tools.
- Branding and UX are praised as warmer and less “dramatic” than competitors, but missing features (e.g., robust LaTeX, real-time voice) are noted.
Developer Workflow and Tools
- Strong migration pattern: many coders report switching from GPT-based tools to Claude, especially via editors like Cursor, Continue.dev, Cody, Aider, etc.
- Desired next step: tight integration between code edits and browser results using Computer Use, so agents can iteratively debug UIs on their own.
Broader Implications and Skepticism
- Some see this as a step toward “FSD for computers” and a threat to many remote/white-collar roles.
- Others argue reliability, error handling, and organizational constraints will keep humans heavily in the loop for the foreseeable future.