# Keep AI On Spec

Agents drift, models decay, context becomes polluted, and user behavior evolves, all causing incorrect or even dangerous actions. The non-deterministic nature of AI means a system that performed well to start can quietly degrade in production.

AI is supposed to scale what humans can accomplish. That only works if we trust the output, and trust requires supervision. But humans can't supervise at scale. As AI use grows, reviewing every output becomes impossible.

Swept Supervision is the active middle layer between your people and your AI agents: the infrastructure that lets you scale AI use without being capped by how much human oversight is available.

[Contact us](/contact)

## Supervision and Governance Are Related But Distinct

Governance is a framework and infrastructure that enforces your organization's rules across all AI systems. Supervision monitors specific agents in production and operates within the guidelines of your Governance system. This page covers Supervision. If you have AI agents running in production, you likely need both. [Learn more about Governance](/offering/governance).

## Why Doesn't Traditional Supervision Work?

A classic take on supervision is to have an AI agent or two managed by a human on the IT or Security team, or to combine systems like Observability, Pre-Production Evals, Documentation, and Orchestration Tooling. The trouble with these methods:

- Limiting AI use to what a human can actively supervise caps AI in a way that functionally eliminates its scalability and ROI.
- The older systems and methods are passive rather than active. Each is useful, but none of them actively supervise.

Swept Supervision is active: continuous measurement, outlier detection, and policy enforcement with targeted human oversight.

## What We Do

- **Set a Baseline.** We measure behavior across representative and noisy inputs and record expected ranges for accuracy, escalation rate, token cost and use, and more. This becomes the bespoke standard for your organization.
- **Monitor.** We capture inputs, outputs, plans, and tool calls from live traffic so internal and external behavior is continuously compared to the baseline.
- **Detect.** Our Supervision layer flags outliers automatically, looking for subtle extraction mistakes, unusual refusal patterns, and slow increases in escalations or cost. Targeted reviews and role-based approvals bring humans into the loop only for true anomalies.
- **Investigate.** We automatically send alerts with a replayable bundle: version, prompt changes, recent data updates, and user context.
- **Enforce.** Based on your internal guidelines, we apply hard stops and approvals for high-risk actions. Think of it as a circuit breaker for potentially harmful AI behavior.
- **Improve.** We feed confirmed incidents back into evaluations, update prompts and policies, and refresh baselines. Supervision is iterative.

## What You Get

- **Sampling That Fits Your Risk.** Random sampling for broad coverage, stratified sampling by intent or user segment, and burst sampling during spikes. Swept also sets redaction rules for sensitive fields, plus encryption in transit and at rest.
- **Monitoring at a Glance.** Baseline bands for accuracy and refusal hygiene, safety and hallucination flags by endpoint and role, drift scores for language patterns and output mix, and token cost and usage monitoring with caps and warnings.
- **Drift, Bias, and Variance Detection.** Tracking of semantic drift across intent mix and language patterns, outcome drift against ground truth, bias checks across sensitive attributes and cohorts, and variance by prompt, agent or model version, or time of day.
- **Alerts and Triage Workflows.** Threshold-based alerts with severity levels, defined incidents grouped with examples and reproducible steps, and one-click issue creation with links to failing examples and baseline comparisons, sent to Slack, Teams, or a custom destination.
- **Collaboration and Governance.** Roles and permissions for who can change thresholds and approve fixes, comment threads on incidents with mentions and attachments, and a full audit log of changes to prompts, models, and thresholds.

## FAQ

- **How much traffic should I sample?** It depends on your risk tolerance. High-stakes endpoints (sensitive data, financial decisions, patient-facing outputs) warrant higher sampling rates. Lower-risk, high-volume endpoints can run on lighter random sampling. Swept helps you set rates at kickoff and adjust them over time.
- **How are baselines set?** Baselines are locked from your last approved evaluation. They capture expected ranges for accuracy, escalation rate, refusal patterns, cost, and latency across representative and intentionally noisy inputs.
- **What counts as drift?** Any meaningful deviation from baseline behavior: semantic drift (intent mix or language patterns), outcome drift (drops in accuracy or rises in hallucinations), bias (score deltas across sensitive attributes or cohorts), and variance (instability by prompt, model version, or time of day). Swept monitors all four.
- **Can I monitor token cost and usage?** Yes. Tokens are continually tracked with configurable caps and warnings. You set the thresholds and Swept alerts you when behavior approaches or crosses them.
- **How do I keep data private?** Sampling includes redaction rules for sensitive fields and encryption in transit and at rest. Your data does not leave your environment without your authorization. For strict data sovereignty requirements, Swept can operate within your private cloud, which is also possible within our [Governance offering](/offering/governance).
- **What is the difference between Supervision and Governance?** Supervision is the AI middle layer between your people and your agents: it samples live traffic, measures behavior against a baseline, detects drift, and enforces policy at the moment an action is about to happen. [Governance](/offering/governance) is broader: the continuous practice of enforcing your organization's rules across all AI systems, including vendor tools, access controls, and risk-tolerance expectations. Supervision is one of the systems that Governance oversees. Swept also offers [Evaluation](/offering/evaluation) and Implementation.