AI Vendor Risk in Financial Services: How the FS AI RMF Changes Third-Party and Fourth-Party AI Oversight

AI GovernanceLast updated on
AI Vendor Risk in Financial Services: How the FS AI RMF Changes Third-Party and Fourth-Party AI Oversight

Ask a financial institution's risk team where their AI risk lives, and they'll point to internal models: the credit scoring system, the fraud detection engine, the customer service chatbot. But for most institutions, the majority of AI risk sits in systems they don't control. Vendor-provided AI powers customer service platforms, document processing pipelines, compliance screening tools, and advisory systems. And those vendors often build on foundation models from a handful of providers.

The FS AI RMF, published in February 2026 by the Cyber Risk Institute and 108 financial institutions, devotes significant control objectives to third-party and fourth-party AI oversight. Governing AI you don't build requires different strategies than governing AI you do.

Most of Your AI Risk Lives Outside Your Organization

Financial institutions have always relied on third-party technology. What changed is the nature of the dependency. A traditional software vendor delivers deterministic functionality: the same input produces the same output. An AI vendor delivers probabilistic functionality: outputs can vary, drift, or degrade over time.

Existing vendor risk frameworks weren't built for this. A SOC 2 report tells you about a vendor's security controls. It tells you nothing about whether their AI hallucinated in 3% of customer interactions last month.

The dependency runs deeper than most institutions realize. Many fintech vendors don't build their own models. They build on foundation models from OpenAI, Anthropic, Google, or others. A foundation model update ripples through every downstream vendor, changing behavior across the chain. That's fourth-party risk: your vendor's vendor changes the AI that powers your operations.

What the FS AI RMF Requires for Third-Party AI

The FS AI RMF's Map and Govern functions include specific control objectives for vendor-provided AI:

Due diligence must include AI-specific assessment. Standard vendor questionnaires ask about security and data handling. The FS AI RMF requires institutions to evaluate vendor AI directly: model performance, bias characteristics, hallucination rates, and security posture against AI-specific attacks.

Contracts must address AI behavior. Agreements with AI vendors should specify performance thresholds (accuracy, error rates), bias testing obligations (testing against protected classes), incident notification (when AI produces harmful outputs), and audit rights (independent testing of vendor AI).

Monitoring must track AI performance continuously. Annual vendor reviews are insufficient for AI systems that can change behavior between reviews. The framework expects ongoing monitoring against agreed standards.

These requirements extend what model risk teams already do for internal models to systems the institution doesn't own.

The Fourth-Party Problem: When Your Vendor's AI Isn't Their AI

Fourth-party AI risk introduces a layer that most vendor management programs miss entirely.

Consider the chain: A bank uses a customer service platform (third party) that runs on GPT-4 (fourth party). The bank evaluated the platform in January. In March, OpenAI updates GPT-4. The platform's behavior changes. The bank's evaluation is stale.

This pattern repeats across the industry. Document processing vendors use foundation models for extraction. Compliance tools use LLMs for analysis. Advisory platforms use generative AI for reports. In each case, the vendor controls the application layer, but the foundation model provider controls the core AI.

The FS AI RMF addresses this through downstream dependency documentation (institutions must map their full AI supply chain), change notification requirements (vendors must disclose when underlying models change), and independent validation (when AI changes, validation must be refreshed).

Fourth-party risk is concentration risk disguised as vendor diversity. Five different AI vendors built on the same foundation model provide less diversification than they appear to.

Due Diligence Beyond the Questionnaire

Traditional vendor due diligence runs on trust-but-verify principles, heavily weighted toward trust. A vendor completes a questionnaire, provides a SOC 2 report, and the institution checks the boxes. With AI systems, that process breeds false confidence.

The FS AI RMF pushes toward evidence-based assessment:

Independent testing: Send representative inputs to vendor AI, measure outputs, and compare against institutional standards. Don't accept vendor claims at face value.

Bias audits: Conduct or commission independent bias testing, particularly for AI affecting lending, insurance, or other protected decisions.

Hallucination measurement: For GenAI-powered systems, measure hallucination rates independently. A vendor claiming 99% accuracy may define accuracy differently than regulators require.

Security testing: AI-specific testing (prompt injection, data extraction) should complement traditional penetration testing.

The evidence categories extend to model cards, testing results across demographic groups, production monitoring data, and incident histories documenting when and how AI failures occurred.

Continuous Monitoring of Vendor AI

Vendor AI does not hold static performance. It drifts. It degrades. It changes when vendors update models. The FS AI RMF's Manage function requires ongoing monitoring, not annual snapshots.

Output-level monitoring: Track what the vendor's AI produces in your environment. Monitor the responses a customer service AI gives, the decisions a fraud detection system makes, and the classifications a document processor assigns.

Performance benchmarking: Compare current performance against contractual thresholds. A vendor's AI that drops below agreed accuracy or exceeds bias limits should show up in your monitoring before it shows up in the vendor's quarterly report.

Change detection: Identify when vendor models have been updated or replaced. Output distribution shifts, latency changes, and behavioral differences signal model changes.

This level of monitoring requires operational protocol, not just policy. Annual reviews don't catch an AI system that started hallucinating last Tuesday.

Concentration Risk: When Everyone Uses the Same AI

Picture 40% of the banking sector's customer service AI running on the same foundation model. A vulnerability in that model hits 40% of the sector at once. The FS AI RMF addresses concentration risk explicitly.

AI concentration differs from traditional technology concentration in two ways. First, it is often invisible. Five different customer service platforms look like vendor diversity. Three of them running on the same foundation model? Less diversification than anyone assumed.

Second, AI failures propagate differently. A software bug produces consistent, identifiable errors. An AI vulnerability produces inconsistent, hard-to-detect failures across every deployment simultaneously. A biased training dataset poisons every system built on it. A prompt injection technique effective against one deployment is effective against all deployments of the same model.

The framework requires institutions to monitor concentration across their AI supply chain and consider diversification, mirroring concerns that insurance governance teams have also identified.

How Swept AI Enables Vendor AI Oversight

Governing vendor AI without visibility into vendor systems is a structural problem. At Swept AI, we've built our platform to provide institution-side oversight:

Evaluate provides independent assessment. Our evaluation layer tests vendor AI from the outside, sending representative inputs, measuring outputs, and scoring performance against institutional standards. This works regardless of vendor cooperation.

Supervise provides production monitoring. Our supervision layer monitors vendor AI outputs in real time. Drift, degradation, unexpected outputs: supervision catches the change and triggers escalation before it compounds.

Certify generates governance evidence. Our certification capabilities create audit-ready documentation: evaluation results, monitoring history, incident records, and compliance mapping to FS AI RMF control objectives.

Risk teams that have governed internal models under SR 11-7 now face the same obligation for AI they don't control. The 108 institutions that wrote the FS AI RMF understand that in financial services, AI risk is vendor risk. The institutions that build oversight infrastructure for vendor AI will manage that risk. Those that rely on vendor self-reporting will discover they haven't.

Join our newsletter for AI Insights