Skip to main content

Knowledge Base (KB) and RAG Flow Guide

This document provides an overview of the Knowledge Base (KB) system and explains how Retrieval-Augmented Generation (RAG) is implemented within the Voicing AI.
It covers RAG fundamentals, KB architecture, indexing workflows, and the retrieval API.


1. Overview

The Knowledge Base subsystem allows users to attach documents (URLs or files) and convert them into searchable vector embeddings stored in Qdrant.
These embeddings are later retrieved using similarity search to support Retrieval-Augmented Generation (RAG).

Table of Contents


2. What is RAG?

RAG stands for:

  • Retrieval
  • Augmented
  • Generation

RAG enables an AI system to answer user queries using external knowledge, such as documentation, PDFs, webpages, or manuals.

2.1 How RAG Works

Step 1: Indexing

Documents are processed into smaller text chunks.
Each chunk is converted into a vector embedding and stored in a vector database such as Qdrant.

Step 2: Retrieval

A user query is embedded and compared against stored vectors.
The most relevant chunks are returned.

Step 3: Generation

An LLM uses these retrieved chunks as context to produce an answer.

2.2 RAG in Voicing AI

The Knowledge Base subsystem implements only the Retrieval portion of RAG:

  • Indexing → delegated to an external knowledge_base.Indexer library
  • Storage → handled by Qdrant
  • Retrieval → exposed via /knowledge-base/query/{kb_id}

Actual text generation using LLMs happens in downstream services (eg. RAGService).


3. High-Level KB Architecture

The Knowledge Base system consists of the following components:

ComponentDescription
KB APIExposes endpoints for creating KBs, adding sources, and querying
KnowledgeBaseServiceCore orchestration logic
KnowledgeBaseIndexerExternal library that chunks, embeds, and indexes data
QdrantVector database storing embeddings
DocumentRetrieverExternal library used for similarity search
SchemasPydantic models for request/response formats

3.1 System Flow

Add Source (URL/File)

Store source entry in DB

Background Indexing Task

KnowledgeBaseIndexer

Chunks + Embeddings → Qdrant Collection

Query Endpoint performs Retrieval via DocumentRetriever

4. Knowledge Base Data Model

A KB record contains:

{
"id": "uuid",
"name": "Help Center",
"collection_name": "kb_xxxxxxxxxxxxxxxxxxxx",
"sources": [...],
"stats": {
"documents": 12,
"chunks": 187
},
"status": "active"
}

Each source entry contains:

{
"id": "uuid",
"type": "url",
"name": "FAQ",
"content": "https://example.com/faq",
"status": "completed",
"metadata": {
"pages": 12,
"chunks": 48
}
}

5. KB Workflow

This section describes the workflow from KB creation to indexing and retrieval.


5.1 Create a Knowledge Base

Endpoint

POST /api/v1/knowledge-base/

Request

{
"name": "Support Docs",
"description": "Internal documentation"
}

Internal Behavior

  • KnowledgeBaseService.create_knowledge_base() generates a unique Qdrant collection_name.
  • A new KB entry is created in Postgres.
  • The KB is ready to accept sources for indexing.

5.2 Add URL Source

Endpoint

POST /api/v1/knowledge-base/{kb_id}/sources/url

Request

{
"url": "https://docs.example.com/faq",
"name": "FAQ",
"indexer_config": {
"chunk_size": 1000,
"chunk_overlap": 200
}
}

Workflow

  1. A source record is appended to the KB with status = "pending".
  2. A background task _index_url_source(...) begins indexing.
  3. The external KnowledgeBaseIndexer performs:
    • Web scraping
    • Text extraction
    • Chunking
    • Embedding
    • Upserting vectors into Qdrant (collection_name)
  4. Source status is updated to "completed" with metadata (pages, chunks).

5.3 Add File Source

Endpoint

POST /api/v1/knowledge-base/{kb_id}/sources/file

Internals

  • File is validated, uploaded to storage, and saved as a KB source.
  • Background indexing reads the file and processes it:
stats = await indexer.index_file(uploaded_file)
  • Chunks and embeddings are stored in Qdrant.

6. Retrieval (R in RAG)

Retrieval performs vector similarity search against the KB’s Qdrant collection.

6.1 Query Endpoint

POST /api/v1/knowledge-base/query/{kb_id}

Request

{
"query": "How do I reset my password?",
"top_results": 5
}

Internal Workflow (KnowledgeBaseService.search())

retriever = DocumentRetriever(self.indexer_config)

result = await retriever.search(
query=query,
collection_name=kb.collection_name,
limit=top_results
)

Retrieval Behavior

  • Embeds the query.
  • Searches Qdrant using cosine similarity.
  • Returns an ordered list of chunks.

Example Response

{
"documents": [
{
"content": "To reset your password, open Settings...",
"metadata": { "filename": "help.pdf" },
"score": 0.93,
"chunk_id": "chunk-uuid",
"source_url": "https://docs.example.com/help",
"title": "Help Center"
}
],
"query": "How do I reset my password?",
"total_found": 42,
"retrieval_time": 0.12
}

7. Augmentation (A in RAG)

Augmentation is the step where retrieved KB chunks are transformed into LLM-ready context.

Although the detailed orchestration lives outside this repo, the typical flow in Voicing AI looks like:

  1. Call the KB retrieval API

    • A higher-level service (e.g. RAGService, an orchestration layer for AI assistants) calls
      POST /api/v1/knowledge-base/query/{kb_id} and receives a RetrievalResult.
  2. Select and post-process chunks

    • Optionally re-rank, deduplicate, or filter by score or metadata (for example, limit to certain document types or recency).
    • Truncate or summarize content to fit within the target model’s context window (based on max tokens/characters).
  3. Build the augmented prompt

    • Serialize chunks into a prompt section such as:
      • CONTEXT:\n[chunk1]\n\n[chunk2]\n...
    • Attach them to the LLM request as:
      • System messages (instructions + context),
      • Tool/function parameters, or
      • Extra fields in a JSON payload interpreted by the LLM orchestration layer.
    • Include identifiers (titles, filenames, URLs) so the model can reference or cite specific sources.
  4. Add guardrails and formatting

    • Prepend clear instructions like:
      “Answer only using the provided context. If the answer is not present, say you don’t know.”
    • Optionally inject citation markers (e.g. [1], [2]) tied to retrieved documents so the UI can show which snippet supported which part of the answer.

In summary, the Augmentation layer is responsible for turning the raw documents[] from the Knowledge Base retrieval API into a structured, well-constrained prompt that the LLM can safely and effectively use for final answer generation.


8. Generation

The Knowledge Base module does not generate answers.

After retrieval:

  • RAGService
  • RagOpenAILLMService
  • Or LLM orchestrators

inject KB chunks into prompts and generate the final natural-language response.

The KB is responsible only for supplying relevant context.


9. End-to-End Example

User uploads PDF

KB stores file → background indexing

KnowledgeBaseIndexer chunks + embeds → Qdrant

User queries KB

DocumentRetriever fetches top-k relevant chunks

LLM consumes chunks → final answer (outside KB)

10. Summary

  • Voicing AI KB implements the Retrieval component of RAG.
  • URL and file sources are indexed asynchronously into Qdrant.
  • Retrieval API returns relevant chunks using vector similarity search.
  • LLM-based answer generation is performed outside the KB subsystem.

This forms the foundation for all RAG-assisted workflows within Voicing AI.


Last updated: [04/12/2025]
Version: 1.0