Use Case 2 · Pre-Agentic Gen AI · RAG + Vector Embeddings

Gen AI Before Agents:RAG-Powered Fund Intelligence

Before agentic orchestration, I built the foundation: RAG pipelines over dense financial documents — mutual fund prospectuses, SAIs, fact sheets — using vector embeddings, KNN retrieval, and LLMs to deliver grounded, cited answers to natural language questions.

Demonstrated with the Invesco QQQ Trust (Nasdaq-100 ETF) — one of the world's largest and most actively traded ETFs.

187
QQQ Chunks
512
Tokens/Chunk
1,024
Embed Dims
K=5
KNN Retrieved
Bedrock
LLM Platform
< 2s
E2E Latency

The Gen AI Evolution — Where This Fits

Phase 1 — Gen AI / RAG Era · This Use Case

Ask a question. Get a grounded answer.

Documents chunked, embedded, and indexed in a vector store
User question embedded in the same vector space
KNN retrieval surfaces the most semantically relevant chunks
LLM synthesizes an answer grounded in the retrieved context
No hallucination — model only uses what was retrieved

Phase 2 — Agentic AI Era · Use Case 1

RAG becomes one tool in an agent's toolkit.

Supervisor agent receives unstructured client communications
RAG is now a sub-skill — one tool among many that agents invoke
Agent decides when to retrieve context, when to execute, when to validate
Retrieved context informs decisions, not just answers
The LLM orchestrates — RAG supports reasoning along the way

“RAG taught me how LLMs ground themselves in evidence. Agentic AI taught me how to make them act on it.”

The Use Case — Fund Document Q&A

📄

Dense Documents

Fund prospectuses, SAIs, and fact sheets are 100–400 page PDFs. Keyword search cannot surface nuanced answers about fees, risks, or methodology.

💬

Natural Language Q&A

Analysts and advisors ask plain English questions and get answers grounded in the actual document text, with citations.

QQQ (Invesco QQQ Trust)is used as the demonstration corpus — one of the world's largest ETFs with ~$260B AUM tracking the Nasdaq-100 Index. Its prospectus covers fees, holdings, index methodology, risks, and legal structure across 200+ pages — the exact challenge RAG is built to solve.

RAG Architecture — Two Pipelines

RAG operates in two distinct phases: an offline Indexing Pipeline that builds the knowledge base, and a real-time Query Pipeline that answers questions. Toggle between them below.

▶ AUTO-PLAY MODE
📄
PHASE 1 OF 4 · INDEXING

Document Ingestion

Source truth from every format.

The QQQ prospectus PDF is uploaded to S3 and routed through Amazon Textract for structure-aware text extraction. Unlike naive PDF parsing, Textract preserves table structure, column relationships, and section hierarchy — essential for financial documents with embedded tables of holdings, fees, and risk factors.

PDF stored in S3 with versioning — every prospectus update creates a new indexed version
Amazon Textract extracts text with structure: tables, headings, and column relationships preserved
Section metadata tagged: Risk Factors, Holdings, Fees, Index Methodology, Legal
Raw text passed to the chunking stage with document ID and section lineage attached
Amazon S3Amazon TextractPDF ParsingMetadata Tagging

Chunking — Splitting the Document with Overlap

The QQQ prospectus is split into 512-token chunks with a 50-token sliding overlap. Overlap prevents context loss at chunk boundaries — sentences that straddle a boundary are captured by both adjacent chunks.

CHUNK 001512 tokens · § Fund Overview← 50-token overlap →

"The Invesco QQQ Trust seeks to track the investment results, before fees and expenses, of the Nasdaq-100 Index®. The Fund generally invests at least 99% of its total assets..."

CHUNK 002512 tokens · § Fund Overview← 50-token overlap →

"...invests at least 99% of its total assets in the securities that comprise the Index. The Nasdaq-100 Index includes the 100 largest non-financial companies listed..."

CHUNK 003512 tokens · § Index Methodology

"...listed on the Nasdaq Stock Market. The Index is a modified market-capitalization-weighted index reviewed and rebalanced quarterly by Nasdaq."

187
Total Chunks
512
Tokens / Chunk
50
Overlap Tokens

Vector Embedding — Meaning as Math

Each chunk is converted to a 1,024-dimensional dense vector by Amazon Titan Embeddings V2. The model places semantically similar text nearby in vector space — even when exact words differ. This is what makes semantic search possible.

Chunk Text

“The Fund's total annual fund operating expense ratio is 0.20% of average daily net assets...”

Titan Embeddings V2

AWS Bedrock

1,024-dim Vector

···

[0.12, -0.87, 0.34, 0.91, -0.23, 0.67, 0.24, -0.55, 0.67, 0.42 ... ×1,024]

HIGH SIMILARITY — cos: 0.94

“expense ratio is 0.20%” ↔ “annual management fee of 20 basis points”

Different words. Same meaning. Small angle between vectors.

LOW SIMILARITY — cos: 0.23

“expense ratio is 0.20%” ↔ “quarterly index rebalancing schedule”

Different meaning. Large angle between vectors. Not retrieved.

KNN Retrieval — Finding the Nearest Neighbors

At query time, the question is embedded and compared against all 187 chunk vectors using cosine similarity. OpenSearch's HNSW algorithm returns the K=5 nearest neighbors in under 50ms. Chunks below a 0.60 cosine threshold are discarded.

USER QUERY

“What expense ratio does QQQ charge?”

→ embedded → [0.08, -0.91, 0.41, ...]

1,024 dimensions · <100ms

OpenSearch k-NN Search · 187 chunks indexed · HNSW · cosine similarity
1"The Fund's total annual fund operating expense ratio is 0.20%..."
0.942
2"Management fees are paid...at an annual rate of 0.20%..."
0.879
3"The Fund does not charge any front-end sales loads..."
0.761
4"Expense ratios for similar Nasdaq-100 ETFs range from 0.15% to 0.33%..."
0.703
5"Invesco Capital Management LLC serves as the investment adviser..."
0.648
6"The Index is reviewed and rebalanced quarterly by Nasdaq..."
0.521
7"The Fund is non-diversified and may invest a significant portion..."
0.489
··· 180 more chunks — all below threshold

K=5 chunks assembled as context window · ~2,200 tokens · sent to Claude 3 Haiku on Bedrock

Live Simulation — Ask QQQ Anything

Select a question to run the full RAG pipeline simulation: embedding → KNN search → context assembly → Claude 3 Haiku on Bedrock → grounded answer with citations. All steps simulated client-side using pre-indexed QQQ prospectus data.

↑ Click a question to run the full RAG pipeline simulation

Technology Stack

Amazon S3Amazon TextractAmazon Titan Embeddings V2AWS BedrockAmazon OpenSearch ServerlessHNSW k-NN IndexClaude 3 HaikuAWS LambdaPython · Boto3LangChainCosine SimilarityPrompt EngineeringStructured JSON Output

The Bridge — RAG Becomes a Skill

RAG is a pattern, not a product. In the pre-agentic era, RAG was the system — question in, answer out. In the agentic era, RAG becomes one tool an agent can invoke when it needs to ground a decision in source documents. The same vector index, the same retrieval pipeline — now called by an orchestrator rather than a human.

💬

RAG Era

Question → Answer

🔧

Bridge

RAG as a Skill

🤖

Agentic Era

Agent → Tool → Action

The Adaptive Service Orchestrator (Use Case 1) uses vector retrieval as one of its validation steps — querying a policy knowledge base before an agent decides whether to initiate a financial transaction. The indexing pipeline I built here became the knowledge layer that agentic systems query at runtime.

Want to explore the full evolution?

See how RAG-era patterns evolved into a production multi-agent system — or let's talk about what a RAG or agentic pipeline could do for your organization.