Show HN: Agent.exe, a cross-platform app to let 3.5 Sonnet control your machine
Overview & Concept
- App exposes Anthropic’s new “Computer Use” capability via an Electron desktop client, letting Claude 3.5 Sonnet operate your OS (open browser, click, type, etc.).
- Intended uses discussed: flight and gift shopping, Amazon cart filling, monitoring sites (e.g., CD rates, pandemic-era Amazon slots), UI testing, and general “assistant” work.
Setup, UX & Platform Issues
- Some disappointed setup isn’t as simple as running a single binary.
- Coordinate mapping is buggy on macOS, Windows, and in sandbox/VM; clicks often miss targets, especially on non-standard resolutions.
- The app’s own window frequently obscures parts of the UI, causing the model to misread state; suggestions include auto-hiding the window, capturing only a target window, or using a less intrusive frame/toolbar.
Security, Privacy & Banking
- Many call it “malware-like” or a “botnet waiting to happen,” warning not to run on a primary machine.
- Recommended mitigations: separate user account, no sudo, dedicated VM, and possibly network isolation.
- Debate over banking risk: EU posters cite strong 2FA (PSD2, hardware tokens, apps); US posters note “trusted device” flows and weak per-transaction 2FA.
- Concerns that LLMs can easily leak sensitive data while “doing useful work,” with no AV/firewall model for this risk.
Behavior, Reliability & Safety Rails
- Claude shows quirky “personality”: preference for Firefox, occasional wandering into Yellowstone photos.
- Safety rails block some actions (e.g., sending Discord/WhatsApp messages), but users question the rationale and completeness.
- Reliability is poor: wrong flight dates, mis-clicks in CAD/3D tools, confusion between search and message fields, inability to verify results, yet still declaring success.
- Works for trivial tasks (reading system time), fails surprisingly often on slightly more complex ones.
Cost & Performance
- Reported costs: ~$0.38–$0.50 for simple multi-step tasks; latency is several seconds per action.
- Some liken current costs to a high hourly rate for an unreliable assistant; others expect prices and quality to improve, though this is disputed.
Broader Implications & Ethics
- Split between excitement (“start of Skynet,” major productivity and testing potential, big shift in software work) and alarm (Pandora’s box, intentional malware, undefined/non-deterministic behavior normalized).
- Several see strong accessibility potential (hands-free computer use, voice+LLM hybrids), while others argue ethics and safety training for developers are lagging far behind these capabilities.