The challenge
The bank's corporate banking team reviewed contracts manually. Median review time was six hours per contract, against a backlog of ~1.4M pages. The previous attempt — an external vendor's "AI-powered document review" — had been running for eleven months without a production deployment, and an internal proof-of-concept using off-the-shelf retrieval was producing answers that didn't survive audit because they couldn't cite their sources.
The legal and compliance team had two non-negotiables: every answer had to point at a specific clause in a specific document, and the system had to pass internal model risk review before it touched a real contract.
The architecture
We replaced the existing build with a new architecture. Ingestion ran semantic chunking with structural awareness — preserving section headings, tables, and clause numbering — then embedded with text-embedding-3-large and stored in Qdrant with per-document filters. Retrieval was hybrid: vector + BM25 + structured filters on contract type and counterparty. We added a cross-encoder reranker in front of generation, and citation tagging at the prompt layer that refused to emit a claim without a source span.
The eval harness ran a curated golden set of 240 queries across contract types, with retrieval metrics (hit rate at k, MRR, NDCG) and generation metrics (faithfulness, answer relevancy) using LLM-as-judge. Every PR ran the harness in CI and blocked merges on regression.
The handover
The pod ran for nine weeks. Deliverables were the running system, the eval harness with the bank's golden set extended internally, the runbook, and a knowledge transfer to the bank's existing engineering team. The bank's MLOps function took ownership at week ten; we stayed on retainer for thirty days for incident support.
The outcome
Median review time fell from six hours to ~95 minutes — a 73% reduction — across the first 8,200 contracts reviewed under the system. Zero unsupported claims appeared in the audit window, and internal model risk review passed on first submission.
The system is now extending to two adjacent legal review workflows under the same architecture, run by the bank's internal team.