Show HN: Agent.exe, a cross-platform app to let 3.5 Sonnet control your machine

Overview & Concept

  • App exposes Anthropic’s new “Computer Use” capability via an Electron desktop client, letting Claude 3.5 Sonnet operate your OS (open browser, click, type, etc.).
  • Intended uses discussed: flight and gift shopping, Amazon cart filling, monitoring sites (e.g., CD rates, pandemic-era Amazon slots), UI testing, and general “assistant” work.

Setup, UX & Platform Issues

  • Some disappointed setup isn’t as simple as running a single binary.
  • Coordinate mapping is buggy on macOS, Windows, and in sandbox/VM; clicks often miss targets, especially on non-standard resolutions.
  • The app’s own window frequently obscures parts of the UI, causing the model to misread state; suggestions include auto-hiding the window, capturing only a target window, or using a less intrusive frame/toolbar.

Security, Privacy & Banking

  • Many call it “malware-like” or a “botnet waiting to happen,” warning not to run on a primary machine.
  • Recommended mitigations: separate user account, no sudo, dedicated VM, and possibly network isolation.
  • Debate over banking risk: EU posters cite strong 2FA (PSD2, hardware tokens, apps); US posters note “trusted device” flows and weak per-transaction 2FA.
  • Concerns that LLMs can easily leak sensitive data while “doing useful work,” with no AV/firewall model for this risk.

Behavior, Reliability & Safety Rails

  • Claude shows quirky “personality”: preference for Firefox, occasional wandering into Yellowstone photos.
  • Safety rails block some actions (e.g., sending Discord/WhatsApp messages), but users question the rationale and completeness.
  • Reliability is poor: wrong flight dates, mis-clicks in CAD/3D tools, confusion between search and message fields, inability to verify results, yet still declaring success.
  • Works for trivial tasks (reading system time), fails surprisingly often on slightly more complex ones.

Cost & Performance

  • Reported costs: ~$0.38–$0.50 for simple multi-step tasks; latency is several seconds per action.
  • Some liken current costs to a high hourly rate for an unreliable assistant; others expect prices and quality to improve, though this is disputed.

Broader Implications & Ethics

  • Split between excitement (“start of Skynet,” major productivity and testing potential, big shift in software work) and alarm (Pandora’s box, intentional malware, undefined/non-deterministic behavior normalized).
  • Several see strong accessibility potential (hands-free computer use, voice+LLM hybrids), while others argue ethics and safety training for developers are lagging far behind these capabilities.