A procurement team at a large financial institution recently shared their AI vendor evaluation process with us. They had a 200-question security questionnaire. SOC 2 compliance, data encryption standards, incident response timelines, penetration testing cadence. Every question you would expect from a mature security organization.
Not a single question addressed model drift. Nothing about output validation. Nothing about how the vendor handles hallucinated responses in production. The questionnaire was thorough, rigorous, and designed for a category of software that no longer describes what they were buying.
The SaaS Questionnaire Applied to a Non-SaaS Problem
Security questionnaires evolved alongside enterprise SaaS. They reflect the assumptions of deterministic software: the same input produces the same output, behavior changes only through versioned releases, and security boundaries are well-defined. SOC 2 Type II, ISO 27001, data residency controls, access management, encryption standards. These frameworks work because SaaS applications behave predictably.
AI systems do not behave predictably. A language model can produce different outputs for identical inputs across consecutive requests. Model behavior changes without a software release, sometimes because the vendor updated weights, sometimes because upstream foundation model providers pushed changes. Data flows through training pipelines, embedding stores, inference logs, and fine-tuning loops in ways that traditional data handling questions never anticipated.
Applying a SaaS security questionnaire to an AI vendor is like auditing a restaurant's food safety by inspecting the building's fire exits. The fire exits matter. They just have nothing to do with whether the food will make anyone sick.
Where Traditional Questionnaires Fall Short
The gaps between SaaS questionnaires and AI vendor risk cluster around five areas.
Model Behavior and Reliability
SaaS vendors ship code that does what it does until the next release. AI vendors ship models whose outputs are probabilistic. A claims processing model that performed well during evaluation can degrade within months as input distributions shift. Traditional questionnaires ask about uptime and availability. They do not ask whether the vendor monitors for model drift, measures hallucination rates, or tracks accuracy degradation over time.
Data Handling Beyond Storage
SaaS questionnaires cover data at rest and in transit. AI vendors handle data in additional dimensions: training datasets, fine-tuning data, retrieval-augmented generation corpora, embedding vectors, prompt logs, and inference outputs. Each carries distinct privacy and security implications. Customer data embedded in a vector database persists differently than data stored in a relational table. Prompt logs can contain sensitive information that traditional data classification frameworks do not account for.
Output Accountability
When a SaaS application returns incorrect data, you can trace the bug to a code path. When an AI system produces harmful, biased, or fabricated output, the failure mode is fundamentally different. There is no single line of code responsible. The questionnaire should probe how the vendor validates outputs, what guardrails prevent harmful content, and what happens when the system produces something wrong.
Supply Chain Depth
Most AI vendors build on foundation models they did not create. OpenAI, Anthropic, Google, Meta, and others provide the base models that downstream vendors fine-tune and deploy. A change to GPT or Claude ripples through every application built on top of it. Traditional vendor assessments evaluate the vendor in front of you. AI vendor assessments need to evaluate the model providers behind them: the fourth-party risk.
Governance Maturity
SOC 2 certifies that security controls exist. It says nothing about whether the vendor has a model registry, tracks model versions, maintains rollback capabilities, or runs continuous bias and fairness monitoring. These are governance capabilities specific to AI, and they distinguish vendors who manage AI responsibly from vendors who simply deploy it.
The Questions You Should Be Asking
We work with organizations building governance infrastructure for AI deployments. Through that work, we have identified the categories that separate meaningful AI vendor evaluation from security theater.
Model Governance and Lifecycle
Start with how the vendor manages models across their lifecycle.
- Do you maintain a centralized registry of all AI models in production, including model version, training data sources, and performance baselines?
- What is your process for model updates? Do customers receive advance notice? Can customers pin to specific model versions?
- How do you handle rollbacks when a model update degrades performance?
- Do you publish model cards documenting capabilities, limitations, and known failure modes?
- What governance framework guides your AI development? Do you align with NIST AI RMF, ISO 42001, or comparable standards?
These questions reveal whether the vendor treats models as managed assets or as black boxes they deploy and forget.
Continuous Monitoring and Observability
A vendor that cannot tell you how their models perform in production cannot tell you much that matters.
- Do you monitor model performance in production? What metrics do you track, and at what frequency?
- How do you detect and respond to model drift? What thresholds trigger alerts or automated interventions?
- Do you maintain audit logs of model inputs and outputs? What retention policies apply, and who has access?
- How do you measure and track hallucination rates?
- What is your mean time to detect a model performance degradation?
The difference between a vendor with monitoring and a vendor with continuous supervision is the difference between a vendor who knows when something breaks and a vendor who knows before it breaks.
Data Handling for AI-Specific Workflows
Move beyond "data at rest and in transit" to address how data flows through AI-specific pipelines.
- Is customer data used for model training or fine-tuning? If so, can customers opt out, and how is data deletion enforced?
- How are prompt logs and inference outputs stored, retained, and access-controlled?
- How do you handle data embedded in vector databases or retrieval-augmented generation systems?
- What controls prevent sensitive information in prompts from being logged or persisted?
- For cross-border deployments, where does model inference occur, and how do you handle data residency for AI-specific data types like embeddings?
Output Validation and Safety
Probe the vendor's approach to ensuring AI outputs meet quality and safety standards.
- What guardrails prevent the model from producing harmful, biased, or fabricated content?
- How do you validate output accuracy for your specific use case?
- Do you implement content filtering, output scoring, or human-in-the-loop review for high-risk decisions?
- How do you handle adversarial inputs, including prompt injection attacks?
- What is your process when a model produces an output that causes customer harm?
Bias, Fairness, and Responsible AI
These questions determine whether a vendor's responsible AI commitments extend beyond marketing copy.
- How do you measure bias across protected attributes in model outputs?
- What mitigation mechanisms exist when bias is detected?
- For high-risk use cases (hiring, lending, insurance underwriting), do you implement human review requirements?
- Do you conduct third-party audits of model fairness? How frequently?
- What prohibited-use policies do you enforce, and how are they technically implemented rather than just documented?
Fourth-Party and Supply Chain Risk
Understanding the vendor's dependencies is as important as understanding the vendor.
- Which foundation model providers do you depend on? What happens if they change their models?
- How do you evaluate and monitor the security and governance practices of your upstream model providers?
- What concentration risk exists in your model supply chain? Do you have fallback models from alternative providers?
- How do you contractually address liability for failures caused by upstream model changes?
From Checklist to Continuous Verification
The deeper problem with security questionnaires, even improved ones, is that they capture a point-in-time snapshot. A vendor's answers on day one of the contract may not reflect their practices on day 180. Models change. Data pipelines evolve. Monitoring capabilities get deprioritized as the vendor scales.
Effective AI vendor governance requires continuous verification, not periodic assessment. Organizations need the ability to independently validate vendor claims: to monitor model performance, track output quality, and detect drift regardless of what the vendor reports.
At Swept AI, we build the governance infrastructure that makes this possible. A centralized model registry that tracks every AI system, whether internal or vendor-provided. Continuous monitoring that detects drift, bias, and performance degradation in real time. Evaluation frameworks that validate AI outputs against defined quality and safety standards. Certification workflows that provide auditable evidence of ongoing governance.
The questionnaire gets the conversation started. The infrastructure keeps it honest.
The Questionnaire Is a Starting Point
A better security questionnaire does not solve AI vendor risk. It surfaces it. The organizations that manage AI vendor relationships well are the ones that treat the questionnaire as the beginning of a governance relationship, not the entirety of one.
That procurement team we mentioned at the start? After restructuring their questionnaire around these categories, they discovered that two of their four AI vendors had no model monitoring in production. One vendor could not tell them which foundation model version was currently running. Another had no process for notifying customers of model updates.
None of these gaps appeared in 200 questions designed for SaaS. They appeared in the first five questions designed for AI.
The security questionnaire was never wrong. It was written for a different kind of software. AI vendors require a different kind of question, and a different kind of ongoing scrutiny, to evaluate properly.
