Architecture
System design diagrams and key architectural decisions behind my production systems. These illustrate how I think about scale, reliability, and maintainability.
IDP System — Batch Processing Pipeline
The 14-step document pipeline processes uploaded documents through OCR, AI extraction, validation, and human review. Each page goes through a 7-step sub-pipeline. State is tracked in Redis for real-time WebSocket updates.
Upload -> Page Split -> Enhancement -> OCR/Vision -> AI Extraction
| |
v v
Metadata Sub-doc Detection -> Merge -> Field Validation -> Confidence Check
| |
v v
Batch Low Confidence -----> HITL Review Queue High Confidence
State | |
(Redis) v v
| Human Review Auto-Approve
v | |
WebSocket <--- Status Updates <-----+------------------------+
| |
v v
Frontend Approved Data -> PDB (separate DB) IDP System — RAG Document Q&A
Documents are chunked, embedded via the configured AI provider, and stored in PostgreSQL with pgvector. User queries trigger similarity search, and retrieved chunks are passed as context to the LLM for freeform answers.
Document Upload
|
v
Chunking (overlapping windows)
|
v
Embedding (Ollama / OpenAI / Anthropic)
|
v
pgvector Storage (PostgreSQL)
|
+---> User Query
| |
| v
| Embed Query
| |
| v
+---> Similarity Search (cosine distance)
|
v
Top-K Chunks + Query -> LLM
|
v
Freeform Answer (with source citations) IDP System — Multi-Tenant RBAC
Five-role hierarchy with JWT authentication. Each role inherits permissions from the roles below it. Organizations contain teams, and data access is scoped accordingly.
super_admin ─── Cross-org access, system settings, user management
|
org_admin ─── Organization-wide access, team management, email config
|
manager ─── Team-level access, schema management, batch operations
|
operator ─── Document upload, extraction, HITL review within team
|
viewer ─── Read-only access to assigned team data
Auth Flow:
Login -> JWT (httpOnly cookie) -> Middleware extracts role + org + team
Image URLs -> sessionStorage token -> ?token= query param
WebSocket -> Token-based authentication for real-time updates Key Architectural Decisions
asyncpg over ORMs
Raw parameterized SQL for full query control, better performance, and no abstraction leaks. Every query is explicit and auditable.
Multi-provider AI with LiteLLM abstraction
Support for 5 AI providers (Ollama, OpenAI, Anthropic, Gemini, Groq) through a unified interface. Switch providers per-schema or per-organization without code changes.
Redis for batch state, PostgreSQL for persistence
Transient pipeline state lives in Redis (fast reads, TTL-based cleanup). Final results persist in PostgreSQL. WebSocket updates stream from Redis pub/sub.
PDB in separate database
Approved extraction data lives in a dedicated Production Database, not the IDP application DB. Clean separation of processing state from production data.
On-prem AI via Ollama for data sovereignty
Government documents never leave the network. GPU-accelerated local inference with Ollama provides privacy guarantees that cloud APIs cannot.