Architecture

System design diagrams and key architectural decisions behind my production systems. These illustrate how I think about scale, reliability, and maintainability.

IDP System — Batch Processing Pipeline

The 14-step document pipeline processes uploaded documents through OCR, AI extraction, validation, and human review. Each page goes through a 7-step sub-pipeline. State is tracked in Redis for real-time WebSocket updates.

Upload -> Page Split -> Enhancement -> OCR/Vision -> AI Extraction
   |                                                        |
   v                                                        v
Metadata    Sub-doc Detection -> Merge -> Field Validation -> Confidence Check
   |                                                        |
   v                                                        v
Batch       Low Confidence -----> HITL Review Queue         High Confidence
State                               |                        |
(Redis)                              v                        v
   |                           Human Review              Auto-Approve
   v                               |                        |
WebSocket <--- Status Updates <-----+------------------------+
   |                                                        |
   v                                                        v
Frontend                        Approved Data -> PDB (separate DB)

IDP System — RAG Document Q&A

Documents are chunked, embedded via the configured AI provider, and stored in PostgreSQL with pgvector. User queries trigger similarity search, and retrieved chunks are passed as context to the LLM for freeform answers.

Document Upload
     |
     v
  Chunking (overlapping windows)
     |
     v
  Embedding (Ollama / OpenAI / Anthropic)
     |
     v
  pgvector Storage (PostgreSQL)
     |
     +---> User Query
     |        |
     |        v
     |     Embed Query
     |        |
     |        v
     +---> Similarity Search (cosine distance)
              |
              v
          Top-K Chunks + Query -> LLM
              |
              v
          Freeform Answer (with source citations)

IDP System — Multi-Tenant RBAC

Five-role hierarchy with JWT authentication. Each role inherits permissions from the roles below it. Organizations contain teams, and data access is scoped accordingly.

super_admin  ─── Cross-org access, system settings, user management
     |
org_admin    ─── Organization-wide access, team management, email config
     |
manager      ─── Team-level access, schema management, batch operations
     |
operator     ─── Document upload, extraction, HITL review within team
     |
viewer       ─── Read-only access to assigned team data

Auth Flow:
  Login -> JWT (httpOnly cookie) -> Middleware extracts role + org + team
  Image URLs -> sessionStorage token -> ?token= query param
  WebSocket -> Token-based authentication for real-time updates

Key Architectural Decisions

asyncpg over ORMs

Raw parameterized SQL for full query control, better performance, and no abstraction leaks. Every query is explicit and auditable.

Multi-provider AI with LiteLLM abstraction

Support for 5 AI providers (Ollama, OpenAI, Anthropic, Gemini, Groq) through a unified interface. Switch providers per-schema or per-organization without code changes.

Redis for batch state, PostgreSQL for persistence

Transient pipeline state lives in Redis (fast reads, TTL-based cleanup). Final results persist in PostgreSQL. WebSocket updates stream from Redis pub/sub.

PDB in separate database

Approved extraction data lives in a dedicated Production Database, not the IDP application DB. Clean separation of processing state from production data.

On-prem AI via Ollama for data sovereignty

Government documents never leave the network. GPU-accelerated local inference with Ollama provides privacy guarantees that cloud APIs cannot.