# What is AI Monitoring?

_AI monitoring is the ongoing tracking, analysis, and interpretation of AI system behavior and performance so teams can detect issues early and keep outcomes dependable._

AI monitoring is the ongoing tracking, analysis, and interpretation of AI system behavior and performance so teams can detect issues early and keep outcomes dependable. Definitions across the industry emphasize continuous measurement of models, inputs, outputs, and supporting infrastructure, with attention to drift, bias, latency, and cost.

Why it matters:

- Prevent incidents before users feel them (e.g., rising error or hallucination rates).
- Control spend by watching tokens, call rates, and model selection.
- Shorten MTTR with trace-level visibility into prompts, contexts, tool calls, and responses.

## Monitoring vs. Supervision (Why Supervision Wins)

**TL;DR:** Monitoring tells you what happened; **[Supervision](/ai-supervision)** controls what's allowed to happen.

### Prevention vs. Detection

- *Monitoring* detects issues after they occur (alerts, dashboards).
- *Supervision* prevents bad outputs/actions with in-line policies, guardrails, and approvals.

### Unit of Control

- *Monitoring* works with metrics, logs, and traces.
- *Supervision* works with **policies**, **schemas**, and **decision gates** that must be satisfied.

### Timing

- *Monitoring* is reactive (alert → investigate → fix).
- *Supervision* is proactive (block/allow/redo/approve at runtime).

### Quality Assurance

- *Monitoring* observes hallucinations, refusals, and regressions.
- *Supervision* **enforces** groundedness, citation accuracy, and strict output formats before responses ship.

### Safety & Misuse

- *Monitoring* surfaces prompt-injection or jailbreak signals.
- *Supervision* **denies/strips/contains** unsafe content and isolates untrusted context by default.

### Tool & Data Access

- *Monitoring* measures error rates, latency, and cost across tools.
- *Supervision* **constrains** tools via allowlists, scoped keys, rate/cost guards, and human-in-the-loop for sensitive actions.

### Compliance & Auditability

- *Monitoring* proves SLO health over time.
- *Supervision* proves **policy conformance** with auditable traces (who approved, which rule triggered, what was blocked).

### [Drift](/ai-model-drift) & Decay Response

- *Monitoring* alerts when trends slip.
- *Supervision* **auto-interrogates/regenerates** under policy until outputs meet quality thresholds.

### Cost Governance

- *Monitoring* spots anomalies (tokens per task, spend spikes).
- *Supervision* **routes** to cheaper models when policy allows and enforces budget caps in real time.

### Outcome

- *Monitoring* delivers faster triage and learning loops.
- *Supervision* delivers **fewer incidents** and **stronger guarantees** by design.

### What to use when

- Choose **Supervision** when you need **assurances** (regulated workflows, customer-facing assistants, financial/clinical decisions).
- Use **Monitoring** everywhere to improve reliability, performance, and spend—and to inform how you tune supervision policies.

### How they fit together

- **Supervision = policy + enforcement + human-in-the-loop** at the moment of decision.
- **Monitoring** surrounds supervision with visibility (KPIs, SLOs, trends) so you can iterate on prompts, models, and policies intelligently.

> For the full approach, see **[AI Supervision](/ai-supervision)**.

## Monitoring vs. Observability vs. APM

- **Monitoring** tracks known signals and thresholds for health, cost, and quality.
- **[Observability](/ai-observability)** provides the deeper, correlated picture across data, models, and infra to explain why behavior changed. Think continuous instrumentation to detect drift, decay, and bias early.
- **APM for AI** extends classic application monitoring with model-aware traces, prompt/response inspection, and model comparisons across environments.

## Where It Matters

- **[Customer-facing assistants](/solutions/customer-experience) and search**: protect CX KPIs while controlling LLM spend.
- **Operational and IT systems**: unify visibility across cloud, data pipelines, and model services to reduce downtime and speed incident response.
- **Predictive and time-series workloads**: use continuous signals to anticipate failures and performance regressions.

## The AI Monitoring Stack

### 1. Data layer

- Data freshness, schema drift, PII leakage, source coverage.
- Time-series pipelines for high-resolution metrics.

### 2. Model layer

- **Quality**: groundedness, citation accuracy, refusal rate, hallucination trend.
- **Safety**: toxicity, bias indicators, prompt-injection attempts.
- **Performance**: latency p50/p95, throughput, error codes.
- **Cost**: tokens, per-request and per-feature cost.

### 3. Application & tools

- Tool call success rate, retries, guardrail denials, human-approval hits.
- Session traces that tie user steps to model events for root cause.

### 4. Infrastructure & operations

- GPU/CPU utilization, queue depth, saturation, network errors.
- Cross-stack correlation for faster triage and fewer blind spots.

## KPIs, SLOs, and Alerts

- **Availability SLOs**: model event success rate, tool success rate.
- **Latency SLOs**: p95 end-to-end response under target by route or feature.
- **Quality SLOs**: groundedness score, citation accuracy, hallucination rate per domain.
- **Cost SLOs**: tokens per successful task, cost per resolved ticket or per lead qualified.

Alert examples:

- Spike in refusal or hallucination rate for a specific model version.
- Drift detected in input distribution for a key workflow.
- Cost anomaly: tokens per task up 30% after a prompt change.

## How Swept Implements AI Monitoring

[Swept AI Supervise](/product/supervise) combines monitoring with active enforcement:

- **End-to-end traces for AI events**: prompt template ID, context objects, model/version, sampling params, tool calls, guardrail decisions, outputs, and costs. Works across OpenAI, Bedrock, and other providers.
- **Quality analytics**: groundedness and citation accuracy scoring with per-source coverage, refusal analysis, and red-flag patterns.
- **Safety & misuse signals**: injection and jailbreak indicators surfaced from inputs and retrieved context, with block/allow outcomes logged.
- **Cost governance**: usage budgets, per-feature spend dashboards, model-comparison views to pick the right cost-quality curve.
- **Operational integration**: unify infra metrics and logs with model events so on-call can correlate GPU saturation, queueing, and user impact.

## Quick Readiness Checklist

- Model-aware tracing turned on in all environments
- Quality KPIs (groundedness, hallucination, refusal) reported per route
- Cost budgets with anomaly alerts and model comparison views
- Data drift and PII leakage checks on inputs and retrieved context
- Guardrail outcomes and human-approval hits visible in traces
- Infra and AI signals unified for incident triage and MTTR gains