Show HN: GuardLLM, hardened tool calls for LLM apps

github.com

1 points by mhcoen 10 hours ago

Most agent frameworks treat prompt injection as a model-level problem. In practice, once your agent ingests untrusted text and has tool access, you need application-layer controls — structural isolation, tool-call gating, exfiltration detection — that don't depend on the model behaving correctly. I built guardllm to provide those controls. guardllm is a small, auditable Python library that provides:

Inbound hardening: sanitize and structurally isolate untrusted content (web, email, docs, tool output) so it is treated as data, not instructions. Tool-call firewall: deny-by-default destructive operations unless explicitly authorized; fail-closed confirmation when no confirmation handler is wired. Request binding: bind (tool name, canonical args, message hash, TTL) to prevent replay and argument substitution. Exfiltration detection: scans outbound tool arguments for secret patterns and flags substantial verbatim overlap with recently ingested untrusted content. Provenance tracking: enforces stricter no-copy rules on content with known untrusted origin, independent of the overlap heuristic. Canary tokens: per-session canary generation and detection to catch prompt leakage into outputs. Source gating: blocks high-risk sources from being promoted into long-lived memory or KG extraction to reduce memory poisoning.

It is intentionally minimal and not framework-specific. It does not replace least-privilege credentials or sandboxing — it sits above them. Repo: https://github.com/mhcoen/guardllm I'd like feedback on: what threat model gaps you see; whether the default overlap thresholds are reasonable for summarization and quoting workflows; and which framework adapters would make this easiest to adopt (LangChain, OpenAI tool calling, MCP proxy, etc.).