2025-05-27

Show HN: My LLM CLI tool can run tools now, from Python code or plugins

Core Capabilities and CLI Use Cases

Single CLI interface to “hundreds” of models, with automatic logging of prompts/responses in SQLite for experiment tracking.
Strong shell integration: pipe files and command output into models for transformations and explanations (e.g., add type hints to code, generate commit messages from git diff, explain complex CSS).
Supports multimodal (e.g., llm 'describe this photo' -a photo.jpg).
Tool plugins allow natural-language -> command workflows (e.g., propose ffmpeg commands, then confirm to run), and substantial coding assistance by combining multiple input files/URLs.

Plugins, Ecosystem, and UIs

Rich plugin ecosystem: model backends (Anthropic, Gemini, Ollama, llama.cpp), MCP experiments, QuickJS and SQLite tools, terminal helpers, tmux-based assistants, Zsh/Fish helpers that turn English into shell commands, and an external GTK desktop chat UI integrating with llm.
Streaming Markdown rendering (Streamdown) is highlighted as a nontrivial but important UX component; there’s interest in “semantic routing” of streamed output.
Some users maintain shell completion plugins and small wrappers for “quick answer” or “conceptual grep” workflows.

Installation, Upgrades, and Performance

Users report plugins disappearing on upgrade (with uv tool or Homebrew); recommended workaround is llm install -U llm or reinstalling with --with flags. There’s a proposal to auto-restore plugins from a plugins.txt.
Some see slow startup (even for --help), possibly due to heavy plugin imports; profiling and lazy-import guidance are suggested.

Tool Calling Behavior and Reliability

Tool-calling is seen as powerful but finicky: some experience models “gaslighting” about tool execution (e.g., calendar events) when tools weren’t called.
One key insight: high-quality tool use often depends on very detailed system prompts and examples (thousands of tokens), which some find unsettling and brittle.

Safety, Footguns, and Responsibility

Strong concern that tools, especially with authenticated actions (e.g., brokerage accounts, GitHub MCP), massively increase “footgun” risk.
Debate over whether this is “just another tool” vs. qualitatively new risk because LLM decisions are non-deterministic and opaque.
Extended ethical discussion: who is responsible when an LLM-enabled system causes harm, even if builders followed “best practices”? Opinions range from “clearly the human” to deeper critiques of deploying non-verifiable models in safety-critical contexts.
Proposed mitigations: sandboxing, explicit user confirmation for dangerous actions, read-only tools, and designs where tools hold credentials and only expose scoped tokens/symbols to the model.

Models, Local Backends, and Cost

GPT‑4.1 mini is praised as very cheap and surprisingly capable; heavier models (e.g., o3/o4) used selectively for coding.
Local tool-calling via llama.cpp + llm-llama-server is demonstrated; users note they can also enable tools via extra-openai-models.yaml with flags like supports_tools: true.
Some experiment with local multimodal models and ask about latency for real-time UI automation, though actual performance remains unclear in the thread.

Broader Reflections and Limitations

Some see llm turning the terminal into an “AI playground,” simpler than frameworks like LangChain or OpenAI Agents for many use cases.
Others are uneasy: long hidden prompts for tools, lack of deterministic behavior, and inability to write strong automated tests make this feel unlike previous abstraction jumps (e.g., assembly → C).
There’s philosophical disagreement over whether LLMs “understand” language vs. merely simulate it—but several participants emphasize that even as “language toys,” they’re already extremely useful.
Minor critiques: the project name (llm) is too generic, documentation is scattered across multiple sources, and there’s a desire for more canonical, consolidated docs and a web UI.

Related topics