12 min read
Enterprise Guide
uk-bank-ai-audit-90-days
Last Updated on
Key Takeaways
The bank's registered AI inventory was a fraction of the AI actually running in production. Most of the gap was embedded high-risk AI under Article 6 and Annex III.
Ninety days closed the audit gap. Week 1 was an inventory sweep beyond labelled AI. Week 12 was a working evidence trail across Test & Detect, Protect & Enforce, and Prove & Comply.
SMCR turns AI accountability from a policy paragraph into a named human. The bank mapped each high-risk agent to a specific SMF holder before audit.
PowerPoint Governance fails the moment internal audit asks for evidence on a specific agent at a specific timestamp. The risk register is not the answer.
The bank's internal audit team reviewed the evidence chain in three days. The previous attempt at the same review had taken four weeks.
The starting state: registered AI vs actual AI in production
A tier-one UK bank opened the engagement with a registered AI inventory in the low double digits. The number on the slide was confident. The number was wrong.
The live AI surface was several times larger. Some of it was the obvious work: scoring models in credit, fraud surveillance, model-risk-managed pipelines that had been through SR 11-7 and the bank's internal validation function for years. That part was governed.
The rest was not. LLM tools quietly added to engineering and operations workflows. Embedded vendor AI inside SaaS the bank already licensed (the contract said "analytics", the release notes said "generative summarisation"). Agentic copilots booking actions inside support and middle-office workflows. None of it was on the inventory because none of it was procured as AI.
This is the gap that defines enterprise AI audit in 2026. The risk register lists what was registered. The audit team needs what is running. Those are two different documents, and PowerPoint Governance is the slide where the firm pretends they are the same.
The brief was direct. Close the gap in 90 days. Produce evidence the FCA and the bank's internal audit function would both read as control.
Week 1: the inventory sweep beyond labelled AI
The week-one work was not a survey. Surveys catch what teams remember to declare. We instrumented the bank's actual AI surface: API traffic to known model providers, vendor SaaS metadata for embedded AI features, internal model registries, and the tool-call traces from any agent already in a workflow.
The output was a single live inventory of every AI system touching the bank's data, customers, or operational decisions. Labelled AI and unlabelled AI on the same sheet.
The numbers landed hard. Registered AI assets had been managed through the model risk framework. The unregistered surface was a multiple of that, and the majority of it sat in workflows that would classify as high-risk under Annex III of the EU AI Act: creditworthiness assessment, fraud detection adjacent to access decisions, and employment-related screening run by an HR vendor with a generative module the procurement team had not flagged.
PowerPoint Governance survives until the inventory sweep. After it, the slide deck and the operational reality are visibly different documents, and the only question is which one the supervisor reads.
Week 4: regulatory-grade classification under Article 6 and Annex III
The inventory was the easy part. The classification was the work.
Each system on the live sheet was assessed against Article 6 of the EU AI Act and the Annex III categories that map to financial services use cases. The line between limited-risk and high-risk is not a matter of vendor branding. It turns on the function the system performs in the workflow, the data it operates on, and the decision it influences or makes.
The exercise reclassified a meaningful share of the bank's AI surface upward. Several systems the business had treated as productivity tools were, on assessment, high-risk under Annex III 5(b) and 5(c). Two embedded vendor features fell inside the Article 9 lifecycle risk management obligation that the bank had not been applying to them, because the bank had not known they were AI.
Classification produced the second hard number of the engagement. The count of high-risk AI systems requiring lifecycle controls was significantly higher than what the AI policy had assumed. The risk register was updated. The policy did not change. The controls did.
Week 8: SMCR responsibilities mapping
UK financial services is the regulatory regime where AI accountability stops being abstract. SMCR requires a named Senior Manager to hold accountability for prescribed responsibilities. The Act describes what good looks like. SMCR names who is on the hook when it does not.
Week eight mapped each high-risk AI system to the specific SMF holder whose prescribed responsibility covered its outputs. SMF3 for overall conduct. SMF16 for compliance oversight of the controls. SMF18 for the risk function's view of the lifecycle. SMF24 for the operational resilience of the systems running underneath. Each agent in production got a name, not a committee.
The distinction between we have a policy on AI and we can name the SMF holder for this agent's outputs at this timestamp is the distinction SMCR was written for. The first is a document. The second is testimony.
The mapping exercise surfaced control gaps the policy had not. Two high-risk systems had no clean owner under the existing SMF allocations. One was reallocated to the SMF16 holder with revised prescribed responsibilities. The other was paused until ownership was resolved. (That is what continuous risk management looks like in practice: a high-risk system without a named owner does not run.)
Week 12: a continuous evidence trail across the AI Assurance Lifecycle
Week twelve delivered the working evidence trail. One Window for the Full AI Assurance Lifecycle, running across the bank's high-risk AI surface.
Test & Detect. Continuous adversarial testing for every agent before it reached a live workflow and after. Prompt injection coverage, tool misuse scenarios, threshold-based sign-off, vulnerability detection across the agentic stack. Our continuously updated jailbreak library ran 84 jailbreak techniques and 65 input validators against each agent. Each test run wrote an evidence record the audit function could pull by model version and workflow. The Test & Detect pillar covers find-the-failure-mode-first across the lifecycle, not just at gating.
Protect & Enforce. Run-time protection at the inference layer. Policy enforcement on every agent decision. Inline blocking of behaviour that breached policy, with a record of what was blocked, when, and against which control. Continuous monitoring while the AI is live sits inside this pillar. The dashboards stopped being the governance artifact and started being one input to it. Protect & Enforce is the layer that runs while the system runs.
Prove & Comply. Step-level logging mapped to the standard set in Article 12 of the EU AI Act. Automated compliance reporting mapped to EU AI Act, FCA, SEC, and ISO/IEC 42001. Audit-ready evidence with enterprise-grade auditability and explainability (SOC2, SSO/SCIM, RBAC). Not final outputs. The trail of why each agent action was admissible at the moment it ran. That is the receipt the audit function asked for. Article 72 post-market monitoring obligations sit on top of the Prove & Comply pillar.
The three pillars wrote evidence on a shared model, so the audit team could query one trail instead of stitching three. That is the difference between a stack and a layer. For financial services AI compliance, the Assurance Layer for Enterprise AI is the unit of governance now.
What the audit team actually found
The bank's internal audit function reviewed the evidence chain in three days. The previous attempt at the same scope of review, against the old combination of model risk artefacts and runtime telemetry, had taken four weeks.
What changed was not auditor effort. What changed was that the evidence existed in a form the audit team could read directly. Each high-risk agent had a named SMF owner, a documented Article 6 classification, a pre-deployment test record, a runtime protection log, and a step-level monitoring trail. Five artefacts on one query.
The findings were not zero. Internal audit is not internal audit if it finds nothing. Two controls were tightened. One vendor relationship was renegotiated to bring an embedded AI feature inside our testing framework. One workflow was paused pending re-classification.
The headline was operational, not commercial. The bank's audit function moved from we cannot evidence this to we can produce the record on demand, agent by agent, within a quarter. That is the bar a regulated buyer is procuring against now.
Bottom Line
Ninety days does not buy AI governance. Ninety days, with the right operational model, closes the gap between registered AI and running AI, and produces the audit-ready evidence the firm can stand behind when the supervisor or the audit committee asks.
Policy is a promise. Assurance is the receipt. The question that lands first in a regulated workflow is the simplest one: who owns this agent, and where is the evidence the action it just took was admissible?
FAQs
What is an AI assurance case study?
An AI assurance case study documents how an enterprise tested, protected, and proved its AI systems across the lifecycle, with compliance evidence that maps to specific regulatory obligations. For regulated buyers, the artefact of interest is the evidence chain itself: classification, testing record, runtime controls, and step-level logs against named owners.
How does an enterprise AI audit differ from a model risk review?
What does an AI model inventory case study look like in financial services?
What does financial services AI compliance require under SMCR?
uk-bank-ai-audit-90-days


