2026-03-26 Engineering Rust Security

How we built the whitelist engine (and what we got wrong the first time)

The whitelist is the core mechanic of expacti. Everything else — the approval flow, the risk scoring, the anomaly detection — is built on top of it. Here's how it works, what design decisions we made, and what we had to redo.

What the whitelist needs to do

The fundamental problem: an AI agent submits docker ps --format '{{`{{.Names}}`}}'. Is this safe? Should it require human review?

A whitelist needs to answer that question in milliseconds, for every command, across a constantly-changing set of rules. It also needs to:

Support multiple match types (exact string, glob patterns, regex)
Be scoped (some rules apply globally, some only to specific servers)
Be mutable at runtime (add/remove rules without restart)
Handle expiry (rules with TTL should auto-deactivate)
Be fast (sub-millisecond check for the common case)

The last point matters more than you'd think. The whitelist check sits in the hot path of every shell command. If it's slow, users notice.

V1: what we built first

The first version used a simple Vec<Rule> — load all rules from the DB on startup, iterate through them linearly on every check.

// V1 — simple but slow for large rule sets
pub fn check(&self, command: &str) -> WhitelistResult {
    for rule in &self.rules {
        if rule.matches(command) {
            return if rule.allow {
                WhitelistResult::Allow
            } else {
                WhitelistResult::Deny
            };
        }
    }
    WhitelistResult::NoMatch
}

This worked fine for 50 rules. At 500 rules it started to show latency. And we were expecting teams to have thousands of rules after months of use. Linear scan doesn't scale.

More importantly, it had a race condition: rules were loaded into a Vec, but could be modified via the API at any time. We were using a Mutex<Vec<Rule>> which meant every whitelist check held a lock.

V2: what we actually ship

The current engine uses arc_swap::ArcSwap<Vec<Rule>> — a lock-free reader-writer mechanism. Reads never block (zero contention), writes swap the entire rule set atomically.

use arc_swap::ArcSwap;
use std::sync::Arc;

pub struct WhitelistEngine {
    rules: Arc<ArcSwap<Vec<Rule>>>,
}

impl WhitelistEngine {
    pub async fn check(&self, command: &str, cwd: Option<&str>) -> WhitelistResult {
        let rules = self.rules.load();
        let now = unix_now();

        for rule in rules.iter() {
            // Skip expired rules
            if let Some(expires_at) = rule.expires_at {
                if now >= expires_at { continue; }
            }
            // Skip if CWD filter doesn't match
            if let Some(ref cwd_filter) = rule.cwd_filter {
                if let Some(cwd) = cwd {
                    if !cwd.starts_with(cwd_filter.as_str()) { continue; }
                } else {
                    continue;
                }
            }
            // Pattern match
            if rule.matches(command) {
                return if rule.allow {
                    WhitelistResult::Allow
                } else {
                    WhitelistResult::Deny
                };
            }
        }
        WhitelistResult::NoMatch
    }
}

The key insight: reads are free. When a rule is added, we build a new Arc<Vec<Rule>>, swap it in, and the old one gets dropped once all readers finish. No lock, no contention.

Pattern matching: exact, glob, regex

Three match types, ordered by how often they're used in practice:

Exact match

Default for all rules added via approval. Byte-for-byte string equality. Fast and unambiguous.

PatternType::Exact => command == self.pattern,

Glob match

Uses the glob crate. Useful for patterns like docker logs * or git checkout feature/*. The glob pattern is pre-compiled and cached on the rule, not on every check.

PatternType::Glob => {
    if let Ok(pattern) = glob::Pattern::new(&self.pattern) {
        pattern.matches(command)
    } else {
        false
    }
}

Regex match

Full regex via the regex crate. Powerful but requires manual approval — you can't auto-add a regex rule from an approval decision, because a regex that's "too broad" would silently permit commands the reviewer didn't intend.

Design decision

Auto-approval always adds an exact match. If you approve docker logs abc123, that exact string gets whitelisted — nothing else. Upgrading to a glob (docker logs *) is a deliberate, manual action. This prevents "approval creep" where a broadly-written rule silently permits dangerous variants.

First-match-wins semantics

The engine returns on the first match. This means rule order matters, and it's intentional.

You can create a deny rule that blocks a specific command even if a broader allow rule would match it later in the list. Example: allow all docker commands via glob, but explicitly deny docker run --privileged * by putting the deny rule first.

// Rule 1 (deny): docker run --privileged*  → Deny
// Rule 2 (allow): docker *                  → Allow
//
// "docker run --privileged ubuntu" matches Rule 1 first → Deny
// "docker ps" doesn't match Rule 1, matches Rule 2 → Allow

This mirrors how firewall rules work — most network engineers intuitively understand it. We considered "most-specific-wins" semantics but rejected it: the specificity calculation for regex patterns is ambiguous, and firewall-style ordering is simpler to reason about.

TTL-based expiry

Every rule has an optional expires_at Unix timestamp. Expired rules are skipped during matching (not deleted, to preserve audit history).

This solves a real security problem: permission accumulation. Over time, whitelists grow. Commands that were safe to whitelist during a migration become risky once the migration is done. TTL makes temporary permissions revoke themselves.

In the reviewer UI, you can set an expiry when adding a rule ("allow this pattern for the next 7 days"). Rules show orange/red badges as they approach expiry.

The reload problem

Rules are stored in the database, but the engine works from an in-memory copy. When a rule is added via API, the engine needs to reload.

The naive approach — reload on every API write — works but causes a brief gap where a new rule hasn't taken effect yet. We handle this with a two-step process:

Write the rule to the DB
Call engine.reload_from_db() — this fetches all non-expired rules and does an atomic swap

The reload is async and takes typically <1ms for <10k rules. During the reload, the old rule set is still active (no gap, no lock). After the swap, new checks use the new rules immediately.

Risk scoring: a separate concern

The whitelist answers "is this command known-safe?". Risk scoring answers "how dangerous is this command?". They're separate modules.

For a command that matches the whitelist (allow), it still gets a risk score — reviewers see it in the history. For a command that needs approval, the risk score is shown prominently in the review card.

The risk scorer looks at the command string and assigns scores based on categories (base score) and modifiers. A simplified version:

// Base score by command category
let base = match base_command(cmd) {
    c if READ_ONLY.contains(&c)       => 0,
    c if DANGEROUS.contains(&c)       => 50,
    c if PRIVILEGE_ESC.contains(&c)   => 40,
    c if PACKAGE_MGMT.contains(&c)    => 35,
    c if DATABASE_TOOLS.contains(&c)  => 15,
    _                                  => 0,
};

// Modifiers stack on top
if cmd_lower.contains("| bash") || cmd_lower.contains("| sh") {
    score += 30; // pipe to shell
}
if contains_sql_destructive(&cmd_lower) {
    score += 65; // DROP TABLE, TRUNCATE, DELETE FROM
}
// ... 23 more modifiers
score.min(100)

A few critical patterns bypass the scoring entirely and return 100 (CRITICAL) unconditionally: rm -rf /, fork bomb, writing to a block device with dd, formatting with mkfs.

What we'd do differently

Three things we'd change with hindsight:

1. Store compiled patterns. Currently, glob and regex patterns are compiled from the string on every match. For a whitelist with many glob rules, this is wasteful. We should compile patterns when rules are loaded into memory and store the compiled form.

2. Semantic similarity for suggestions. The AI suggestions engine currently uses heuristics (group by base command, extract UUID patterns). A small embedding model could produce better suggestions by grouping semantically similar commands. docker logs abc123 and docker logs def456 are obviously the same command type — but docker exec -it abc123 sh and docker exec -it abc123 bash also are, even though the heuristic doesn't catch it.

3. Rule conflict detection. Currently you can add two rules for the same pattern (one allow, one deny). The first one wins silently. We should detect and warn on conflicting rules.

Numbers

The whitelist check in our benchmarks:

Exact match (100 rules): ~450ns
Glob match (100 rules, no match): ~12µs
Reload from DB (500 rules, SQLite): ~800µs
Arc swap overhead: ~15ns

In production, most checks hit the exact match path. Reload happens on rule writes (rare). The overall latency impact on the command approval flow is negligible compared to WebSocket round-trip time and human review time.

See it in action

The whitelist builds up as you use expacti — try the interactive demo to watch it happen.

▶ Interactive demo View source