Show HN: My LLM CLI tool can run tools now, from Python code or plugins
Core Capabilities and CLI Use Cases
- Single CLI interface to “hundreds” of models, with automatic logging of prompts/responses in SQLite for experiment tracking.
- Strong shell integration: pipe files and command output into models for transformations and explanations (e.g., add type hints to code, generate commit messages from
git diff, explain complex CSS). - Supports multimodal (e.g.,
llm 'describe this photo' -a photo.jpg). - Tool plugins allow natural-language -> command workflows (e.g., propose
ffmpegcommands, then confirm to run), and substantial coding assistance by combining multiple input files/URLs.
Plugins, Ecosystem, and UIs
- Rich plugin ecosystem: model backends (Anthropic, Gemini, Ollama, llama.cpp), MCP experiments, QuickJS and SQLite tools, terminal helpers, tmux-based assistants, Zsh/Fish helpers that turn English into shell commands, and an external GTK desktop chat UI integrating with
llm. - Streaming Markdown rendering (Streamdown) is highlighted as a nontrivial but important UX component; there’s interest in “semantic routing” of streamed output.
- Some users maintain shell completion plugins and small wrappers for “quick answer” or “conceptual grep” workflows.
Installation, Upgrades, and Performance
- Users report plugins disappearing on upgrade (with
uv toolor Homebrew); recommended workaround isllm install -U llmor reinstalling with--withflags. There’s a proposal to auto-restore plugins from aplugins.txt. - Some see slow startup (even for
--help), possibly due to heavy plugin imports; profiling and lazy-import guidance are suggested.
Tool Calling Behavior and Reliability
- Tool-calling is seen as powerful but finicky: some experience models “gaslighting” about tool execution (e.g., calendar events) when tools weren’t called.
- One key insight: high-quality tool use often depends on very detailed system prompts and examples (thousands of tokens), which some find unsettling and brittle.
Safety, Footguns, and Responsibility
- Strong concern that tools, especially with authenticated actions (e.g., brokerage accounts, GitHub MCP), massively increase “footgun” risk.
- Debate over whether this is “just another tool” vs. qualitatively new risk because LLM decisions are non-deterministic and opaque.
- Extended ethical discussion: who is responsible when an LLM-enabled system causes harm, even if builders followed “best practices”? Opinions range from “clearly the human” to deeper critiques of deploying non-verifiable models in safety-critical contexts.
- Proposed mitigations: sandboxing, explicit user confirmation for dangerous actions, read-only tools, and designs where tools hold credentials and only expose scoped tokens/symbols to the model.
Models, Local Backends, and Cost
- GPT‑4.1 mini is praised as very cheap and surprisingly capable; heavier models (e.g., o3/o4) used selectively for coding.
- Local tool-calling via
llama.cpp+llm-llama-serveris demonstrated; users note they can also enable tools viaextra-openai-models.yamlwith flags likesupports_tools: true. - Some experiment with local multimodal models and ask about latency for real-time UI automation, though actual performance remains unclear in the thread.
Broader Reflections and Limitations
- Some see
llmturning the terminal into an “AI playground,” simpler than frameworks like LangChain or OpenAI Agents for many use cases. - Others are uneasy: long hidden prompts for tools, lack of deterministic behavior, and inability to write strong automated tests make this feel unlike previous abstraction jumps (e.g., assembly → C).
- There’s philosophical disagreement over whether LLMs “understand” language vs. merely simulate it—but several participants emphasize that even as “language toys,” they’re already extremely useful.
- Minor critiques: the project name (
llm) is too generic, documentation is scattered across multiple sources, and there’s a desire for more canonical, consolidated docs and a web UI.