Skip to main content

Purpose

The document-api is the knowledge backend for the Rapida platform. It ingests documents (PDF, Word, CSV, and others), splits them into chunks, generates vector embeddings, and indexes everything in OpenSearch. At call time, assistant-api queries this service to inject relevant context into the LLM prompt.

Port

9010 — HTTP (FastAPI / uvicorn)

Language

Python 3.11+ FastAPI + Celery

Storage

PostgreSQL assistant_db Redis (Celery broker) OpenSearch (vectors + text)
Document processing is asynchronous. Upload returns immediately with status: processing. Text extraction, chunking, and embedding generation run as Celery background tasks.

Document Processing Pipeline


Supported File Formats

FormatLibraryWhat is Extracted
PDFPyPDF2, pdfplumberText content + metadata
Word (.docx)python-docxText + paragraph structure
Excel (.xlsx)openpyxl, pandasCell values as text
CSVpandasRow data as text
Markdown (.md)built-inText preserving structure
HTMLBeautifulSoupCleaned text from HTML
Plain text (.txt)built-inDirect read
Imagespytesseract (OCR)OCR-extracted text

At call time, assistant-api sends a text query to document-api. The service performs vector similarity search and returns the top-k most relevant chunks. Request:
curl -X POST http://localhost:9010/api/v1/document/search \
  -H "Authorization: Bearer <jwt>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "customer billing issue",
    "knowledge_base_id": "kb_123",
    "top_k": 5,
    "threshold": 0.5
  }'
Response:
{
  "results": [
    {
      "chunk_id": "chunk_123",
      "document_id": "doc_456",
      "content": "Billing errors are handled by submitting a refund request...",
      "similarity_score": 0.87,
      "metadata": {
        "page_no": 5,
        "section": "Billing Policy"
      }
    }
  ]
}

Embedding Models

Embeddings are generated using sentence-transformers. The model is configurable via EMBEDDINGS_MODEL in config.yaml:
ModelDimensionsNotes
all-MiniLM-L6-v2384Default — ~80 MB, fast
all-mpnet-base-v2768Higher quality, larger
all-MiniLM-L12-v2384Lighter variant of L6
multilingual-e5-base768100+ languages
If you change EMBEDDINGS_MODEL, you must also update EMBEDDINGS_DIMENSION to match and re-index all existing documents. Existing embeddings stored with a different dimension will not match.

Running

make up-document

make logs-document

make rebuild-document

Health Endpoints

EndpointPurpose
GET /readiness/Service ready
GET /healthz/Liveness probe
curl http://localhost:9010/readiness/

Troubleshooting

The Celery worker is likely not running.
# Docker
make logs-document

# Local — confirm Celery worker is running
PYTHONPATH=api/document-api celery -A app.worker inspect active
Adjust the Celery worker batch size:
EMBEDDINGS_BATCH_SIZE=8     # Low memory
EMBEDDINGS_BATCH_SIZE=64    # High throughput (GPU recommended)
# List existing indices
curl http://localhost:9200/_cat/indices

# Delete a stale index and allow re-indexing
curl -X DELETE http://localhost:9200/documents-<index-name>
CELERY_CONCURRENCY=2
docker stats document-api

Next Steps