← Blog
2026-03-26
Security
AI Agents
Best Practices
The 10 commands AI agents get wrong (and how to gate them)
AI coding agents are getting good at writing code. They're much less good at knowing when a command is irreversible,
contextually dangerous, or just subtly wrong for your situation. Here's a list of the commands that bite people,
and the approval-gate patterns that prevent it.
This isn't about catastrophic failures — most aren't. It's about the class of mistake where the agent did
exactly what you asked, at the wrong time, in the wrong environment, or with slightly wrong scope. No model
is immune. Even with a thoughtful prompt, agents operate under uncertainty about context that humans resolve
by instinct.
Each entry below includes the failure mode, why it happens, and a recommended approval gate strategy.
Risk scores are from expacti's scoring engine — a rough guide to how quickly a command should escalate.
The classic. An agent trying to clean up build artifacts runs rm -rf $OUTDIR.
If OUTDIR is unset, it becomes rm -rf /. If it's set to /home/user
instead of /home/user/build, that's a year of work gone.
Why it happens: Agents reason about the intent ("delete the build directory") but don't
always trace the full expansion of variables, especially when environment differs from the agent's training context.
Gate strategy
Require manual approval for any rm -rf outside /tmp and a small set of known
safe build directories. Use the whitelist to fast-path rm -rf /tmp/build-* once it's
been reviewed once. Flag anything containing a variable expansion as HIGH regardless of path.
An agent rebasing a branch to resolve conflicts decides it needs to force-push. If this is a shared branch,
it just rewrote history for everyone. If it's main, your CI pipeline is broken and your team's local copies
are now diverged.
Why it happens: Force push is the correct solution to "I need to push after a rebase."
The agent knows the technical answer. It doesn't always verify branch protection rules or whether
the branch is shared.
Gate strategy
Never whitelist git push --force or git push --force-with-lease on production
or shared branches. Make every force-push a manual review, with the reviewer seeing the target branch
in the command. Consider a CRITICAL score override for any force push to main/master/production.
An agent migrating a schema drops a table that still has foreign key references, or a data cleanup task
deletes all rows when it was supposed to delete rows matching a condition. The difference between
DELETE FROM events WHERE created_at < '2024-01-01' and DELETE FROM events
is one missing clause.
Why it happens: Agents draft SQL based on the described intent. "Delete old events"
becomes "DELETE FROM events WHERE..." but the WHERE clause might be wrong, off by one, or missing entirely
if the model generated code for the wrong table.
Gate strategy
Score any DROP TABLE, DROP DATABASE, or bare DELETE FROM as CRITICAL.
Require multi-party approval (two reviewers) for production databases. Allow DELETE FROM ... WHERE
with a time-window filter on staging after one review. Never auto-approve DDL on prod.
Installing a dependency or tool via the standard quick-install pattern:
curl https://example.com/install.sh | bash. This runs arbitrary remote code with no integrity check.
Most of the time it works fine. When the CDN is compromised, or the URL resolves to a different host than
intended, it's a full system compromise.
Why it happens: This is how most CLI tools tell you to install them. Agents follow
installation instructions literally.
Gate strategy
Require manual review for any pipe-to-shell pattern. The reviewer should verify the URL, check that
it's the official installer for the tool, and ideally substitute with a checksum-verified download.
Whitelist specific trusted patterns (curl https://sh.rustup.rs | sh) only after manual
verification.
An agent debugging a permission error takes the fastest path: chmod 777 /var/www.
Permissions problem solved. Security model also solved — now any process on the system can write there.
On a multi-tenant server, that's a lateral movement path.
Why it happens: chmod 777 always fixes permission errors. Agents optimize
for the immediate problem without reasoning about the secondary security effects.
Gate strategy
Flag all chmod 777, chmod o+w, and chmod a+w as HIGH.
Require manual review. In the review UI, the reviewer should see a suggested safer alternative
(e.g., chmod 755 or chown www-data) to approve instead.
An agent spinning up a container for testing uses --privileged because it saw that flag
in an example. A privileged container has access to the host's kernel, devices, and namespaces.
Escaping to the host is trivial from a privileged container.
Why it happens: --privileged appears in legitimate examples (Docker-in-Docker,
certain testing setups). Models that were trained on Stack Overflow answers will reproduce it without
understanding when it's appropriate.
Gate strategy
Score docker run --privileged as HIGH. Require review. Also watch for
-v /:/host (full host mount) and --cap-add=SYS_ADMIN — same risk class.
Whitelist specific test containers with --privileged only after a human verifies the use case.
An agent restarting a service to pick up a config change runs systemctl stop nginx without
first verifying it can start again. Or it disables a service that a monitoring system expects to be up,
silently breaking alerting. On a production host, stopping nginx means your site is down.
Why it happens: "Restart service to apply config" is a standard DevOps pattern.
Agents don't always distinguish between staging (where it's safe) and production (where it affects users).
Gate strategy
Allow systemctl restart on known safe services after one review. Require manual approval
for systemctl stop and systemctl disable always. Consider tagging these with
an environment label in the review UI so the reviewer sees "PROD" immediately.
An agent trying to undo some changes runs git reset --hard HEAD~3. It just discarded
three commits of work that hadn't been pushed. Or it ran git reset --hard origin/main
on a branch that had local-only commits you were planning to push later.
Why it happens: git reset --hard is the correct way to discard local changes.
The agent doesn't know which commits are recoverable (pushed) vs. not (local only). It treats git history
as a state machine, not a collaboration medium.
Gate strategy
Always require review for git reset --hard, git clean -f, and git clean -fd.
The reviewer should verify what commits would be lost before approving. Consider a 60-second timeout with
auto-deny as a speed bump.
An agent cleaning up old artifacts runs aws s3 rm s3://your-bucket/backups/ --recursive.
S3 objects have no trash. If the path was wrong, or the bucket was wrong, or "backups" turned out to
contain your production database snapshots — that data is gone.
Why it happens: Cloud CLI commands look just like local commands to a model. The agent
doesn't have an intuition that remote object stores are permanent, expensive, and unrecoverable.
Gate strategy
Score all aws s3 rm, aws s3 sync --delete, gsutil rm, and similar
cloud storage delete operations as CRITICAL. Require explicit manual approval. In the review UI, have
the reviewer confirm the bucket name matches the intended target. Never auto-approve cloud delete operations.
An agent constructing a shell command dynamically uses eval to run it. If any part of the
input came from an external source — a file, an API response, a git commit message — you now have
a code injection vector. The agent trusted the source. The source was poisoned.
Why it happens: eval is the fastest way to run dynamically-constructed commands.
Agents generating scripts often reach for it without considering that the data flowing in might not be
safe. This is a classic prompt injection path: poisoned source → agent constructs eval → execution.
Gate strategy
Flag any eval, exec, or sh -c "$(…)" as CRITICAL when the command
contains variable expansion or command substitution. Require mandatory human review. The reviewer should
understand what data flows into the string before approving. This is one case where auto-deny-on-timeout
is strongly recommended — if the reviewer doesn't actively approve an eval, it shouldn't run.
The pattern behind all of these
Each of these commands shares a common failure structure: the agent's model of the world is correct
about the immediate goal and wrong about the context. The agent knows how to delete a directory.
It doesn't know that $DIR is unset in the current environment. It knows how to push a rebase.
It doesn't know your team's branch protection conventions.
The solution isn't a smarter agent. Context-sensitivity of this kind is genuinely hard — it requires knowing
things about your organization, your environment, and your conventions that aren't in any training dataset.
The solution is a human who does know those things, in the loop at decision time.
The whitelist is the product
The goal of an approval gate isn't to make humans review every command forever. It's to build a whitelist
that reflects your organization's risk tolerance. The first time you see docker ps, you
approve it — it's whitelisted. After a week, the only things reaching a human are commands that are
genuinely novel or high-risk. That's the right steady state.
What "gated" looks like in practice
With expacti, the agent blocks on each of these commands, waiting for a reviewer decision. The reviewer sees:
- The full command, unexpanded
- The risk score and risk category (what triggered the HIGH/CRITICAL classification)
- The session context — what commands ran before this one
- Any anomaly signals — off-hours, unusual rate, exfiltration pattern
If the reviewer approves, the command runs immediately. If they deny, the agent receives an error and can try
a different approach. If no one responds within the configured timeout (default: 60 seconds), the command
is auto-denied.
For LangChain agents, this is a two-line integration:
from expacti import ExpactiTool
tools = [ExpactiTool(backend_url="wss://api.expacti.com/shell/ws", token=SHELL_TOKEN)]
# Agent now routes all shell commands through expacti for approval
For direct shell use, expacti-sh is a drop-in replacement for your shell that intercepts
commands at the prompt level. No agent code changes required.
One more thing: the commands you don't expect
This list covers the obvious ones. The more interesting problem is the command you didn't anticipate.
An agent trying to check disk space runs df -h | awk '...' | xargs rm -rf. A cleanup script
that started as simple becomes complex because the agent added a pipe. A "read-only" health check
accidentally modifies state because the tool the agent called has side effects.
The value of an approval gate isn't just that it blocks the known-bad patterns. It's that it creates
a forcing function: before anything executes, a human who understands the context has a chance to see it.
The reviewer catches the things you didn't know to put on a blocklist.
The window matters
The most dangerous commands are the ones an agent runs at 3am during an automated pipeline when no
reviewer is watching. Configure your approval gate to auto-deny anything CRITICAL outside business hours
unless there's an on-call reviewer actively connected. The cost of a delayed pipeline is far lower
than the cost of an unreviewed production delete.
Try the interactive demo
See what per-command approval looks like in practice — no account required.
Open demo
More posts