We recommend starting with a higher sampling rate for key endpoints and high-risk roles, then tuning down as you learn your agents’ drift patterns. Our system can dynamically adjust sampling as it detects where risk and change are concentrated, so you get strong coverage without overwhelming reviewers.
Baselines are locked from your last approved evaluation run. Once a model and prompt set passes your thresholds, Swept saves that as the production baseline to compare live traffic against over time.
Drift is any meaningful shift away from your baseline—changes in language or intent mix, drops in accuracy, higher hallucination or refusal rates, emerging bias across cohorts, or instability by model, prompt, or time of day.
Yes. Swept tracks average and p95 latency and unit cost per endpoint and role, with thresholds and alerts so you see when performance or spend slips out of your acceptable range.
Apply redaction rules to sensitive fields, enforce encryption in transit and at rest, and control access with roles and permissions—so only the right people can see examples, thresholds, and incident details.