2026-03-26 Security AI Agents Best Practices

The 10 commands AI agents get wrong (and how to gate them)

AI coding agents are getting good at writing code. They're much less good at knowing when a command is irreversible, contextually dangerous, or just subtly wrong for your situation. Here's a list of the commands that bite people, and the approval-gate patterns that prevent it.

This isn't about catastrophic failures — most aren't. It's about the class of mistake where the agent did exactly what you asked, at the wrong time, in the wrong environment, or with slightly wrong scope. No model is immune. Even with a thoughtful prompt, agents operate under uncertainty about context that humans resolve by instinct.

Each entry below includes the failure mode, why it happens, and a recommended approval gate strategy. Risk scores are from expacti's scoring engine — a rough guide to how quickly a command should escalate.

#1 rm -rf with a variable path CRITICAL

The classic. An agent trying to clean up build artifacts runs rm -rf $OUTDIR. If OUTDIR is unset, it becomes rm -rf /. If it's set to /home/user instead of /home/user/build, that's a year of work gone.

Why it happens: Agents reason about the intent ("delete the build directory") but don't always trace the full expansion of variables, especially when environment differs from the agent's training context.

Gate strategy

Require manual approval for any rm -rf outside /tmp and a small set of known safe build directories. Use the whitelist to fast-path rm -rf /tmp/build-* once it's been reviewed once. Flag anything containing a variable expansion as HIGH regardless of path.

#2 git push --force CRITICAL

An agent rebasing a branch to resolve conflicts decides it needs to force-push. If this is a shared branch, it just rewrote history for everyone. If it's main, your CI pipeline is broken and your team's local copies are now diverged.

Why it happens: Force push is the correct solution to "I need to push after a rebase." The agent knows the technical answer. It doesn't always verify branch protection rules or whether the branch is shared.

Gate strategy

Never whitelist git push --force or git push --force-with-lease on production or shared branches. Make every force-push a manual review, with the reviewer seeing the target branch in the command. Consider a CRITICAL score override for any force push to main/master/production.

#3 DROP TABLE or DELETE FROM without WHERE CRITICAL

An agent migrating a schema drops a table that still has foreign key references, or a data cleanup task deletes all rows when it was supposed to delete rows matching a condition. The difference between DELETE FROM events WHERE created_at < '2024-01-01' and DELETE FROM events is one missing clause.

Why it happens: Agents draft SQL based on the described intent. "Delete old events" becomes "DELETE FROM events WHERE..." but the WHERE clause might be wrong, off by one, or missing entirely if the model generated code for the wrong table.

Gate strategy

Score any DROP TABLE, DROP DATABASE, or bare DELETE FROM as CRITICAL. Require multi-party approval (two reviewers) for production databases. Allow DELETE FROM ... WHERE with a time-window filter on staging after one review. Never auto-approve DDL on prod.

#4 curl | sh HIGH

Installing a dependency or tool via the standard quick-install pattern: curl https://example.com/install.sh | bash. This runs arbitrary remote code with no integrity check. Most of the time it works fine. When the CDN is compromised, or the URL resolves to a different host than intended, it's a full system compromise.

Why it happens: This is how most CLI tools tell you to install them. Agents follow installation instructions literally.

Gate strategy

Require manual review for any pipe-to-shell pattern. The reviewer should verify the URL, check that it's the official installer for the tool, and ideally substitute with a checksum-verified download. Whitelist specific trusted patterns (curl https://sh.rustup.rs | sh) only after manual verification.

#5 chmod 777 / world-writable permissions HIGH

An agent debugging a permission error takes the fastest path: chmod 777 /var/www. Permissions problem solved. Security model also solved — now any process on the system can write there. On a multi-tenant server, that's a lateral movement path.

Why it happens: chmod 777 always fixes permission errors. Agents optimize for the immediate problem without reasoning about the secondary security effects.

Gate strategy

Flag all chmod 777, chmod o+w, and chmod a+w as HIGH. Require manual review. In the review UI, the reviewer should see a suggested safer alternative (e.g., chmod 755 or chown www-data) to approve instead.

#6 docker run --privileged HIGH

An agent spinning up a container for testing uses --privileged because it saw that flag in an example. A privileged container has access to the host's kernel, devices, and namespaces. Escaping to the host is trivial from a privileged container.

Why it happens: --privileged appears in legitimate examples (Docker-in-Docker, certain testing setups). Models that were trained on Stack Overflow answers will reproduce it without understanding when it's appropriate.

Gate strategy

Score docker run --privileged as HIGH. Require review. Also watch for -v /:/host (full host mount) and --cap-add=SYS_ADMIN — same risk class. Whitelist specific test containers with --privileged only after a human verifies the use case.

#7 systemctl stop / systemctl disable HIGH

An agent restarting a service to pick up a config change runs systemctl stop nginx without first verifying it can start again. Or it disables a service that a monitoring system expects to be up, silently breaking alerting. On a production host, stopping nginx means your site is down.

Why it happens: "Restart service to apply config" is a standard DevOps pattern. Agents don't always distinguish between staging (where it's safe) and production (where it affects users).

Gate strategy

Allow systemctl restart on known safe services after one review. Require manual approval for systemctl stop and systemctl disable always. Consider tagging these with an environment label in the review UI so the reviewer sees "PROD" immediately.

#8 git reset --hard HIGH

An agent trying to undo some changes runs git reset --hard HEAD~3. It just discarded three commits of work that hadn't been pushed. Or it ran git reset --hard origin/main on a branch that had local-only commits you were planning to push later.

Why it happens: git reset --hard is the correct way to discard local changes. The agent doesn't know which commits are recoverable (pushed) vs. not (local only). It treats git history as a state machine, not a collaboration medium.

Gate strategy

Always require review for git reset --hard, git clean -f, and git clean -fd. The reviewer should verify what commits would be lost before approving. Consider a 60-second timeout with auto-deny as a speed bump.

#9 aws s3 rm --recursive CRITICAL

An agent cleaning up old artifacts runs aws s3 rm s3://your-bucket/backups/ --recursive. S3 objects have no trash. If the path was wrong, or the bucket was wrong, or "backups" turned out to contain your production database snapshots — that data is gone.

Why it happens: Cloud CLI commands look just like local commands to a model. The agent doesn't have an intuition that remote object stores are permanent, expensive, and unrecoverable.

Gate strategy

Score all aws s3 rm, aws s3 sync --delete, gsutil rm, and similar cloud storage delete operations as CRITICAL. Require explicit manual approval. In the review UI, have the reviewer confirm the bucket name matches the intended target. Never auto-approve cloud delete operations.

#10 eval with dynamic input CRITICAL

An agent constructing a shell command dynamically uses eval to run it. If any part of the input came from an external source — a file, an API response, a git commit message — you now have a code injection vector. The agent trusted the source. The source was poisoned.

Why it happens: eval is the fastest way to run dynamically-constructed commands. Agents generating scripts often reach for it without considering that the data flowing in might not be safe. This is a classic prompt injection path: poisoned source → agent constructs eval → execution.

Gate strategy

Flag any eval, exec, or sh -c "$(…)" as CRITICAL when the command contains variable expansion or command substitution. Require mandatory human review. The reviewer should understand what data flows into the string before approving. This is one case where auto-deny-on-timeout is strongly recommended — if the reviewer doesn't actively approve an eval, it shouldn't run.

The pattern behind all of these

Each of these commands shares a common failure structure: the agent's model of the world is correct about the immediate goal and wrong about the context. The agent knows how to delete a directory. It doesn't know that $DIR is unset in the current environment. It knows how to push a rebase. It doesn't know your team's branch protection conventions.

The solution isn't a smarter agent. Context-sensitivity of this kind is genuinely hard — it requires knowing things about your organization, your environment, and your conventions that aren't in any training dataset. The solution is a human who does know those things, in the loop at decision time.

The whitelist is the product

The goal of an approval gate isn't to make humans review every command forever. It's to build a whitelist that reflects your organization's risk tolerance. The first time you see docker ps, you approve it — it's whitelisted. After a week, the only things reaching a human are commands that are genuinely novel or high-risk. That's the right steady state.

What "gated" looks like in practice

With expacti, the agent blocks on each of these commands, waiting for a reviewer decision. The reviewer sees:

The full command, unexpanded
The risk score and risk category (what triggered the HIGH/CRITICAL classification)
The session context — what commands ran before this one
Any anomaly signals — off-hours, unusual rate, exfiltration pattern

If the reviewer approves, the command runs immediately. If they deny, the agent receives an error and can try a different approach. If no one responds within the configured timeout (default: 60 seconds), the command is auto-denied.

For LangChain agents, this is a two-line integration:

from expacti import ExpactiTool

tools = [ExpactiTool(backend_url="wss://api.expacti.com/shell/ws", token=SHELL_TOKEN)]
# Agent now routes all shell commands through expacti for approval

For direct shell use, expacti-sh is a drop-in replacement for your shell that intercepts commands at the prompt level. No agent code changes required.

One more thing: the commands you don't expect

This list covers the obvious ones. The more interesting problem is the command you didn't anticipate. An agent trying to check disk space runs df -h | awk '...' | xargs rm -rf. A cleanup script that started as simple becomes complex because the agent added a pipe. A "read-only" health check accidentally modifies state because the tool the agent called has side effects.

The value of an approval gate isn't just that it blocks the known-bad patterns. It's that it creates a forcing function: before anything executes, a human who understands the context has a chance to see it. The reviewer catches the things you didn't know to put on a blocklist.

The window matters

The most dangerous commands are the ones an agent runs at 3am during an automated pipeline when no reviewer is watching. Configure your approval gate to auto-deny anything CRITICAL outside business hours unless there's an on-call reviewer actively connected. The cost of a delayed pipeline is far lower than the cost of an unreviewed production delete.

Try the interactive demo

See what per-command approval looks like in practice — no account required.

Open demo More posts