Use Case · Adaptive Service Orchestrator

Action AI,Not Generative AI.

A production multi-agent orchestration platform built from scratch — sole architect, coder, and operator — that transforms unstructured client communications into executable, auditable financial workflows. Live across three global regions.

No firm names. No promotional language. Just the architecture, the decisions, and the proof.

3
Global Regions
170+
E2E Tests
8+
Checkpoints
5-pass
Consistency
4-layer
Timeouts
Feb 2026
GA Launch

The Vocabulary Shift

Generative AI — Content

Summarize. Suggest. Wait for humans.

Receives unstructured input
Generates a summary or draft
Human reads and decides
Human manually executes
No audit trail on the AI decision

Action AI — Execution

Comprehend. Plan. Validate. Execute. Audit.

Comprehends intent across multi-modal inputs
Generates executable Workflow JSON with dependency chains
Validator sub-agent gates before any initiation
Initiator sub-agent executes with idempotency and retry logic
Every decision traced — white-box audit trail in MongoDB

“I shifted the engineering org's vocabulary from Generative AI to Action AI. Once people had the right word, the right design patterns followed.”

The Problem This Platform Solved

📧

Hundreds of Emails / Day

Unstructured client communications with PDFs, Excel attachments, images — each requiring human interpretation before any action.

👤

Manual Re-keying

A human team reading, triaging, and manually initiating transactions in backend systems — error-prone, latency-heavy, unscalable.

The platform replaced this entire loop — from inbox to executed transaction — with a supervised, validated, auditable multi-agent system that humans oversee rather than operate.

Core Processing Pipeline — 4 Phases

▶ AUTO-PLAY MODE— cycles every 4.5s · click any phase to pause
📥
PHASE 1 OF 4

Instruction Comprehension

Parse every signal, lose nothing.

Every client communication arrives as multi-modal noise — email body, PDF attachments, Excel files, scanned images. Phase 1 routes each modality through the appropriate extraction pipeline, preserving fidelity that downstream reasoning depends on.

PDF → Vision API + native text reconciliation (dual-pass for accuracy)
Excel → GPT-4o structured analysis with column inference and value extraction
HTML → dual-extraction: structural inference combined with plain-text ground truth
Images → vision model pipeline with semantic annotation
FastAPIVision APIGPT-4oMulti-modal Extraction

A2A Agent Network

🧠

Supervisor Agent

PRIMARY ORCHESTRATOR

GPT-4o powered. Generates Workflow JSON. Manages A2A delegation via A2AClient. Enforces prerequisite_task_ids and prerequisite_step_ids.

💰

CashFlow Agent

PRODUCTION

LLM-to-Backend feedback loop. Auto-fix up to 4 retry attempts. Five-tier account-name search. Validator → Initiator pattern with gate checks. Sanitizes for dates, amounts, duplicates.

✍️

Signatory Verification Agent

STUB

Authorized signatory verification in-line between validation and initiation. Reduces fraud and authorization risk. Designed for full integration sequencing.

📋

Contact Data Management Agent

PLANNED

Manages updates to email, address, and phone records. Operates independently via A2A protocol without disrupting concurrent flows.

A2A Protocol Design

ID-based resolution — stable identifiers at runtime
Skills abstracted at LLM layer by name, not hard-coded
Cross-agent data sharing via hierarchical action_meta_data
Status lifecycle: IN_PROGRESS → COMPLETED / FAILED / ACTION_REQUIRED / FAILED_AUTH
Idempotent retries with risk gating on every state transition

Dependency & Concurrency

prerequisite_task_ids encode cross-agent sequencing
prerequisite_step_ids enforce intra-agent step ordering
Steps with no shared prerequisites run in parallel within an agent
WorkflowProcessor coordinates without sequential round-trips to Supervisor
Four-layer timeout nesting: Caller → Supervisor → A2A Transport → Backend API
🗄️

MongoDB — State & Audit Store

Every workflow, task, step, and agent decision persisted. Query APIs power SLA dashboards. Partial workflow state survives for post-mortem and re-submission after partial fix.

🔍

W3C Trace Engine — Full Observability

W3C trace context propagates end-to-end across every agent and service hop. metric_summary captures phase-level timings, LLM call breakdowns, and checkpoint outcomes. Reconstructable for regulators.

White-Box Audit Trail — Not Black-Box AI

Typical AI System — Black Box

InputClient email arrives
?Model processes (opaque)
OutputDecision or action taken
AuditNo trace of reasoning

This Platform — White Box

Phase 1Comprehension — PDF extracted via Vision API (0.8s)
Phase 2Assembly — Instruction object validated (Pydantic)
Phase 3Supervisor — Workflow JSON generated, 3 tasks, 7 steps
Phase 4CashFlow Agent — Validator PASSED, Initiator sent
TraceW3C context ID: abc-123 — full call graph reconstructable
AuditMongoDB: workflow_id, agent_id, decision, timestamp

Why This Matters in Regulated Environments

Regulators will ask: “Why did your AI system authorize that transaction?”“We tested it” is not an answer. Every decision in this platform produces a W3C trace ID, a MongoDB audit record with agent_id and decision metadata, and a metric_summary with LLM call breakdowns. The answer is a trace ID, a timestamp, and a reconstructable call graph — not arm-waving.

Human in Lead — Not Just Human in the Loop

🔄

Reactive

Human in the Loop

The conventional model. Humans react to AI outputs — approving, correcting, overriding. Humans are gatekeepers, but the AI drives. Trust is earned slowly; errors are caught late.

🎯

Proactive Steering

Human in Lead

This platform's model. Humans set intent, constraints, and tolerance bounds. Agents execute within those bounds autonomously. Humans are notified for ACTION_REQUIRED states and can steer, halt, or re-submit at any checkpoint.

🤝

Multi-Day Resilience

Long-Running Processes

Multi-day business processes — where a partial fix triggers re-submission, or an upstream dependency resolves after human intervention — are supported via persistent workflow state and re-entry at the last successful checkpoint.

Conversation Threading — One Thread, Many Workflows

A client email conversation is managed as a persistent Conversation thread. Each email or reply can spawn one or more Workflows, each decomposed into Tasks and Steps — all linked to the same Conversation ID for continuity, context, and audit.

📧Conversation · conv-4821 · Portfolio Operations
├─📋Workflow #1: CashFlow TransferCOMPLETED
│ ├─Task 1.1: Validate accounts (CashFlow) — 0.8s
│ ├─Task 1.2: Verify signatory (Signatory Agent) — 0.4s
│ ├─Task 1.3: Initiate transfer (CashFlow) — 1.2s
├─📋Workflow #2: Contact Address UpdateCOMPLETED
│ ├─Task 2.1: Update address (Contact Agent) — 0.3s
├─📋Workflow #3: Follow-up AmendmentACTION_REQUIRED
│ ├─⚠️Task 3.1: CashFlow — retry 2/4, partial fix
│ ├─👤Human-in-Lead: reviewing → re-submission queued

Resilience, SLA Accountability & Sync/Async Modes

🛡️

8+ Checkpoint Layers

Proactive checkpoint framework spans 8+ pipeline locations, enabling precise failure isolation and safe resumability without full restart.

⏱️

4-Layer Timeout Nesting

Caller → Supervisor → A2A Transport → Backend API. Dynamic timeout scaling based on transaction count and complexity.

Sync + Async Execution

Short-duration requests handled synchronously. Long-running and multi-day business processes execute asynchronously with persistent state and re-entry points.

📊

SLA Accountability

metric_summary captures phase-level timings and checkpoint outcomes per agent. MongoDB dashboards expose SLA compliance per sub-domain agent and transaction type.

Timeout Nesting — Outermost to Innermost

CallerOverall request timeout
SupervisorOrchestration budget per workflow
A2A TransportPer-agent delegation timeout
Backend APIIndividual system call timeout

Quality Engineering — Designed for LLM Non-Determinism

🚪

One-Way Door Test Corpus

Test corpus only grows — never shrinks. Every regression scenario becomes a permanent test case. Coverage compounds over time.

🔁

Five-Pass Consistency

Each test case runs five passes against the LLM. Results must be consistent across passes — catching flakiness that single-run testing misses entirely.

📐

Four-Dimensional Classification

Assertions on structure and semantics — not string equality. Four dimensions: account code accuracy, date logic correctness, amount fidelity, and duplicate detection.

🚩

Feature-Flag-First Delivery

Every release shadows live traffic via feature flags before cutover. Any execution path can be killed in seconds. No deployment required to halt a failing path.

Walk-Through Q&A — SMART STAR Approach

Each question below surfaces the thought process behind the architectural decisions. Expand any question to see the Situation, Task, Action, and Result — and the architect's note on why this answer lands in a regulated-industry context.

Technology Stack

Python · FastAPIGPT-4o · Azure OpenAIMongoDBW3C Distributed TracingDocker · KubernetesTerraformSpinnaker · HarnessOAuth2 C2CAWS Secret ManagerA2A ProtocolFeature FlagsPydantic · JSON SchemaAPM Dashboards

Ready to discuss the architecture?

Every section above is derived from production experience, not theory. The decisions, the tradeoffs, and the scars are all mine.