Before agentic orchestration, I built the foundation: RAG pipelines over dense financial documents — mutual fund prospectuses, SAIs, fact sheets — using vector embeddings, KNN retrieval, and LLMs to deliver grounded, cited answers to natural language questions.
Demonstrated with the Invesco QQQ Trust (Nasdaq-100 ETF) — one of the world's largest and most actively traded ETFs.
The Gen AI Evolution — Where This Fits
Phase 1 — Gen AI / RAG Era · This Use Case
Phase 2 — Agentic AI Era · Use Case 1
“RAG taught me how LLMs ground themselves in evidence. Agentic AI taught me how to make them act on it.”
The Use Case — Fund Document Q&A
Dense Documents
Fund prospectuses, SAIs, and fact sheets are 100–400 page PDFs. Keyword search cannot surface nuanced answers about fees, risks, or methodology.
Natural Language Q&A
Analysts and advisors ask plain English questions and get answers grounded in the actual document text, with citations.
QQQ (Invesco QQQ Trust)is used as the demonstration corpus — one of the world's largest ETFs with ~$260B AUM tracking the Nasdaq-100 Index. Its prospectus covers fees, holdings, index methodology, risks, and legal structure across 200+ pages — the exact challenge RAG is built to solve.
RAG Architecture — Two Pipelines
RAG operates in two distinct phases: an offline Indexing Pipeline that builds the knowledge base, and a real-time Query Pipeline that answers questions. Toggle between them below.
Source truth from every format.
The QQQ prospectus PDF is uploaded to S3 and routed through Amazon Textract for structure-aware text extraction. Unlike naive PDF parsing, Textract preserves table structure, column relationships, and section hierarchy — essential for financial documents with embedded tables of holdings, fees, and risk factors.
Chunking — Splitting the Document with Overlap
The QQQ prospectus is split into 512-token chunks with a 50-token sliding overlap. Overlap prevents context loss at chunk boundaries — sentences that straddle a boundary are captured by both adjacent chunks.
"The Invesco QQQ Trust seeks to track the investment results, before fees and expenses, of the Nasdaq-100 Index®. The Fund generally invests at least 99% of its total assets..."
"...invests at least 99% of its total assets in the securities that comprise the Index. The Nasdaq-100 Index includes the 100 largest non-financial companies listed..."
"...listed on the Nasdaq Stock Market. The Index is a modified market-capitalization-weighted index reviewed and rebalanced quarterly by Nasdaq."
Vector Embedding — Meaning as Math
Each chunk is converted to a 1,024-dimensional dense vector by Amazon Titan Embeddings V2. The model places semantically similar text nearby in vector space — even when exact words differ. This is what makes semantic search possible.
Chunk Text
“The Fund's total annual fund operating expense ratio is 0.20% of average daily net assets...”
Titan Embeddings V2
AWS Bedrock
1,024-dim Vector
[0.12, -0.87, 0.34, 0.91, -0.23, 0.67, 0.24, -0.55, 0.67, 0.42 ... ×1,024]
HIGH SIMILARITY — cos: 0.94
“expense ratio is 0.20%” ↔ “annual management fee of 20 basis points”
Different words. Same meaning. Small angle between vectors.
LOW SIMILARITY — cos: 0.23
“expense ratio is 0.20%” ↔ “quarterly index rebalancing schedule”
Different meaning. Large angle between vectors. Not retrieved.
KNN Retrieval — Finding the Nearest Neighbors
At query time, the question is embedded and compared against all 187 chunk vectors using cosine similarity. OpenSearch's HNSW algorithm returns the K=5 nearest neighbors in under 50ms. Chunks below a 0.60 cosine threshold are discarded.
USER QUERY
“What expense ratio does QQQ charge?”
→ embedded → [0.08, -0.91, 0.41, ...]
1,024 dimensions · <100ms
K=5 chunks assembled as context window · ~2,200 tokens · sent to Claude 3 Haiku on Bedrock
Live Simulation — Ask QQQ Anything
Select a question to run the full RAG pipeline simulation: embedding → KNN search → context assembly → Claude 3 Haiku on Bedrock → grounded answer with citations. All steps simulated client-side using pre-indexed QQQ prospectus data.
↑ Click a question to run the full RAG pipeline simulation
Technology Stack
The Bridge — RAG Becomes a Skill
RAG is a pattern, not a product. In the pre-agentic era, RAG was the system — question in, answer out. In the agentic era, RAG becomes one tool an agent can invoke when it needs to ground a decision in source documents. The same vector index, the same retrieval pipeline — now called by an orchestrator rather than a human.
RAG Era
Question → Answer
Bridge
RAG as a Skill
Agentic Era
Agent → Tool → Action
The Adaptive Service Orchestrator (Use Case 1) uses vector retrieval as one of its validation steps — querying a policy knowledge base before an agent decides whether to initiate a financial transaction. The indexing pipeline I built here became the knowledge layer that agentic systems query at runtime.
See how RAG-era patterns evolved into a production multi-agent system — or let's talk about what a RAG or agentic pipeline could do for your organization.