A production multi-agent orchestration platform built from scratch — sole architect, coder, and operator — that transforms unstructured client communications into executable, auditable financial workflows. Live across three global regions.
No firm names. No promotional language. Just the architecture, the decisions, and the proof.
The Vocabulary Shift
Generative AI — Content
Action AI — Execution
“I shifted the engineering org's vocabulary from Generative AI to Action AI. Once people had the right word, the right design patterns followed.”
The Problem This Platform Solved
Hundreds of Emails / Day
Unstructured client communications with PDFs, Excel attachments, images — each requiring human interpretation before any action.
Manual Re-keying
A human team reading, triaging, and manually initiating transactions in backend systems — error-prone, latency-heavy, unscalable.
The platform replaced this entire loop — from inbox to executed transaction — with a supervised, validated, auditable multi-agent system that humans oversee rather than operate.
Core Processing Pipeline — 4 Phases
Parse every signal, lose nothing.
Every client communication arrives as multi-modal noise — email body, PDF attachments, Excel files, scanned images. Phase 1 routes each modality through the appropriate extraction pipeline, preserving fidelity that downstream reasoning depends on.
A2A Agent Network
Supervisor Agent
PRIMARY ORCHESTRATOR
GPT-4o powered. Generates Workflow JSON. Manages A2A delegation via A2AClient. Enforces prerequisite_task_ids and prerequisite_step_ids.
CashFlow Agent
PRODUCTIONLLM-to-Backend feedback loop. Auto-fix up to 4 retry attempts. Five-tier account-name search. Validator → Initiator pattern with gate checks. Sanitizes for dates, amounts, duplicates.
Signatory Verification Agent
STUBAuthorized signatory verification in-line between validation and initiation. Reduces fraud and authorization risk. Designed for full integration sequencing.
Contact Data Management Agent
PLANNEDManages updates to email, address, and phone records. Operates independently via A2A protocol without disrupting concurrent flows.
A2A Protocol Design
Dependency & Concurrency
MongoDB — State & Audit Store
Every workflow, task, step, and agent decision persisted. Query APIs power SLA dashboards. Partial workflow state survives for post-mortem and re-submission after partial fix.
W3C Trace Engine — Full Observability
W3C trace context propagates end-to-end across every agent and service hop. metric_summary captures phase-level timings, LLM call breakdowns, and checkpoint outcomes. Reconstructable for regulators.
White-Box Audit Trail — Not Black-Box AI
Typical AI System — Black Box
This Platform — White Box
Why This Matters in Regulated Environments
Regulators will ask: “Why did your AI system authorize that transaction?”“We tested it” is not an answer. Every decision in this platform produces a W3C trace ID, a MongoDB audit record with agent_id and decision metadata, and a metric_summary with LLM call breakdowns. The answer is a trace ID, a timestamp, and a reconstructable call graph — not arm-waving.
Human in Lead — Not Just Human in the Loop
Reactive
Human in the Loop
The conventional model. Humans react to AI outputs — approving, correcting, overriding. Humans are gatekeepers, but the AI drives. Trust is earned slowly; errors are caught late.
Proactive Steering
Human in Lead
This platform's model. Humans set intent, constraints, and tolerance bounds. Agents execute within those bounds autonomously. Humans are notified for ACTION_REQUIRED states and can steer, halt, or re-submit at any checkpoint.
Multi-Day Resilience
Long-Running Processes
Multi-day business processes — where a partial fix triggers re-submission, or an upstream dependency resolves after human intervention — are supported via persistent workflow state and re-entry at the last successful checkpoint.
Conversation Threading — One Thread, Many Workflows
A client email conversation is managed as a persistent Conversation thread. Each email or reply can spawn one or more Workflows, each decomposed into Tasks and Steps — all linked to the same Conversation ID for continuity, context, and audit.
Resilience, SLA Accountability & Sync/Async Modes
8+ Checkpoint Layers
Proactive checkpoint framework spans 8+ pipeline locations, enabling precise failure isolation and safe resumability without full restart.
4-Layer Timeout Nesting
Caller → Supervisor → A2A Transport → Backend API. Dynamic timeout scaling based on transaction count and complexity.
Sync + Async Execution
Short-duration requests handled synchronously. Long-running and multi-day business processes execute asynchronously with persistent state and re-entry points.
SLA Accountability
metric_summary captures phase-level timings and checkpoint outcomes per agent. MongoDB dashboards expose SLA compliance per sub-domain agent and transaction type.
Timeout Nesting — Outermost to Innermost
Quality Engineering — Designed for LLM Non-Determinism
One-Way Door Test Corpus
Test corpus only grows — never shrinks. Every regression scenario becomes a permanent test case. Coverage compounds over time.
Five-Pass Consistency
Each test case runs five passes against the LLM. Results must be consistent across passes — catching flakiness that single-run testing misses entirely.
Four-Dimensional Classification
Assertions on structure and semantics — not string equality. Four dimensions: account code accuracy, date logic correctness, amount fidelity, and duplicate detection.
Feature-Flag-First Delivery
Every release shadows live traffic via feature flags before cutover. Any execution path can be killed in seconds. No deployment required to halt a failing path.
Walk-Through Q&A — SMART STAR Approach
Each question below surfaces the thought process behind the architectural decisions. Expand any question to see the Situation, Task, Action, and Result — and the architect's note on why this answer lands in a regulated-industry context.
Technology Stack
Every section above is derived from production experience, not theory. The decisions, the tradeoffs, and the scars are all mine.