Back to all work

AI / RAG Pipeline

CogniGen FAQ Chatbot

Document-based FAQ chatbot with RAG pipeline, multi-tenant vector search, and confidence-scored responses.

PythonFastAPIChromaDBOpenAIReactMySQL
CogniGen FAQ Chatbot screenshot 1

Overview

CogniGen FAQ Chatbot is a production RAG chatbot built for real business document Q&A. The system ingests PDF, DOCX, and TXT documents, chunks them with semantic coherence, embeds via OpenAI, and stores in ChromaDB with per-organization vector isolation.

The RAG pipeline uses distance-based confidence scoring (high/medium/low/none) to transparently indicate retrieval quality. When retrieved context isn't strong enough, the system says so instead of hallucinating. Each tenant's documents are isolated in dynamic ChromaDB collections using a `project_{id}` naming pattern.

Background document processing handles ingestion with status tracking (pending/processing/completed/failed), rate-limited API endpoints protect against abuse, and a React SPA provides the chat and document management interface.

The platform includes a companion Productivity Assistant module that routes natural language requests through keyword-based intent classification to four specialized services (meeting planner, email drafter, task breakdown, followup extractor), generating downloadable artifacts in multiple formats — ICS calendar invites, CSV task lists, JSON data, and Markdown action items — using regex-based structured output extraction from LLM responses.

Hard Problems

Challenge

Multi-tenant vector store isolation — documents from Organization A must never appear in Organization B's search results.

Solution

Dynamic ChromaDB collection naming with `project_{id}` pattern. Each tenant gets its own vector space with independent embedding indices.

Challenge

Preventing LLM hallucinations when retrieved context is insufficient.

Solution

Distance-based confidence scoring (high/medium/low/none) using average embedding distances and chunk counts. System prompt enforces context-only responses with source citations.

Challenge

Processing diverse document types while preserving semantic coherence for embedding.

Solution

Integrated PyPDF2, python-docx, and LangChain RecursiveCharacterTextSplitter with 1000-char chunks and 200-char overlap. Background task processing with status tracking.

Challenge

Reliably extracting structured data from free-form LLM text output for artifact generation.

Solution

Regex-based markdown code block parser with type markers (meeting_json, tasks_json). LLM prompted for specific format, parser extracts and validates typed objects.

Key Decisions

DecisionChoseOverBecause
Vector databaseChromaDBPineconeLocal-first deployment, zero external dependencies, HNSW indexing out of the box. No API costs for vector operations.
Task processingFastAPI background tasksCelerySimpler deployment, no Redis/RabbitMQ dependency. Document ingestion workload doesn't justify distributed task queue.
Embedding modelOpenAI text-embedding-3-smallSentence TransformersProduction-grade quality, consistent with LLM provider, cost-efficient for the retrieval workload.

Tech Stack

languages

Python 3.12JavaScript

frameworks

FastAPIReact 18LangChainSQLAlchemy

data

ChromaDBMySQL 8OpenAI Embeddings

tools

Docker ComposeUvicornSlowAPIVite