Skip to main content

Overview

The assistant-api is the voice orchestration engine of the Rapida platform. Every active call — whether inbound from a phone, a browser SDK session, or an Asterisk trunk — runs through this service. It owns the complete real-time pipeline: audio streaming, speech-to-text, LLM inference (via integration-api), text-to-speech, and call event emission (via endpoint-api).

Port

9007 — HTTP · gRPC · WebSocket (cmux) 4573 — Asterisk AudioSocket (TCP)

Language

Go 1.25 gRPC inter-service cmux port multiplexing

Storage

PostgreSQL assistant_db Redis (session, cache) OpenSearch (transcripts)
The assistant-api does not store provider credentials. It delegates all LLM, STT, and TTS calls to integration-api, which manages encrypted credential access. The assistant-api never touches raw API keys.

Components

The assistant-api is composed of four internal subsystems. Each subsystem has a well-defined boundary and can be extended independently.
The pipeline runs per-utterance within a conversation. When voice activity is detected, audio frames are forwarded to the configured STT provider. The transcript is sent to integration-api for LLM inference, and the response is synthesized by the configured TTS provider before being streamed back to the caller.
StageComponentWhat Happens
Audio InAudio receiverRaw PCM / G.711 frames arrive over WebSocket or SIP
VADVoice Activity DetectorSilence detection determines utterance start/end
STTinternal/transformer/<provider>/stt.goAudio chunks streamed to provider; transcript returned
LLMintegration-api:9004 via gRPCTranscript + system prompt + conversation history sent; tokens streamed back
TTSinternal/transformer/<provider>/tts.goLLM tokens synthesized to audio in real time
Audio OutAudio senderSynthesized audio chunks streamed back to caller
Each STT and TTS provider is implemented as a Go struct that satisfies one of two interfaces:
// Speech-to-Text
type Transformers[AudioPacket] interface { ... }

// Text-to-Speech
type Transformers[TextPacket] interface { ... }
Provider directories live at api/assistant-api/internal/transformer/<provider>/. Each directory contains:
FilePurpose
<provider>.goClient initialization and configuration
stt.goImplements the Transformers[AudioPacket] interface
tts.goImplements the Transformers[TextPacket] interface
normalizer.goAudio format normalization (sample rate, encoding)
Supported STT providers
ProviderNotes
Google Cloud STTStreaming recognition, 100+ languages
Azure Cognitive ServicesMicrosoft Neural Speech
DeepgramLow-latency streaming, Nova models
AssemblyAIStreaming and batch
CartesiaReal-time with speaker diarization
Sarvam AIIndian language support
Supported TTS providers
ProviderNotes
Google Cloud TTSWaveNet / Neural2 voices
Azure Cognitive ServicesNeural voices, 140+ languages
ElevenLabsHigh-fidelity cloned voices
Deepgram AuraLow-latency streaming synthesis
CartesiaStreaming synthesis
Sarvam AIIndian language support
Each telephony provider is implemented as a channel adapter under api/assistant-api/internal/channel/<provider>/. Channels handle connection negotiation, audio codec translation, and call lifecycle events (answer, transfer, hangup).Supported telephony channels
ProviderIntegration MethodAudio Protocol
TwilioHTTP webhook + Media StreamsWebSocket (µ-law)
VonageHTTP webhook + WebSocketWebSocket (PCM)
ExotelHTTP webhookWebSocket
AsteriskAudioSocket TCPRaw G.711 µ-law frames
SIP (direct)SIP INVITERTP
WebRTC (browser)WebSocket / gRPC-webPCM / Opus
The Communication interface in api/assistant-api/internal/type/ is the top-level contract that ties together the pipeline. It exposes:
  • OnAudioReceived — called with each audio frame from the client
  • OnTranscript — called when STT returns a transcript segment
  • OnLLMResponse — called with each streamed LLM token
  • OnAudioSend — called with each TTS audio chunk to deliver to the client
  • OnCallEnd — called when the conversation terminates
All telephony channels and WebSocket handlers call into this interface. Swapping a channel type does not change pipeline logic.

Call Routing

Browser clients and server SDKs connect via WebSocket through the Nginx gateway. Nginx upgrades the HTTP connection and proxies it to assistant-api:9007 with keepalive.Connection URL
ws://<host>:8080/talk_api.TalkService/AssistantTalk
In production with TLS:
wss://<host>/talk_api.TalkService/AssistantTalk
Required headers
HeaderValue
AuthorizationBearer <jwt>
X-Assistant-IdAssistant UUID
Nginx routing rule (from nginx.conf)
location ~ ^/(talk_api.TalkService/AssistantTalk) {
    proxy_pass http://assistant-talk;   # → assistant-api:9007
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
}
The rapida-react SDK manages the WebSocket connection, audio capture, VAD, and playback automatically. Use it for browser integrations instead of raw WebSocket.

Voice Pipeline Flow

The sequence below shows a complete utterance cycle within an active call.

Configuration

The assistant-api reads its configuration from docker/assistant-api/.assistant.env (Docker) or environment variables (local).

Required variables

VariableRequiredDefaultDescription
SECRET✅ Yesrpd_pksJWT signing secret — must match all other services
POSTGRES__HOST✅ YespostgresPostgreSQL host
POSTGRES__DB_NAME✅ Yesassistant_dbDatabase name
POSTGRES__AUTH__USER✅ Yesrapida_userDatabase user
POSTGRES__AUTH__PASSWORD✅ YesDatabase password
REDIS__HOST✅ YesredisRedis host
OPENSEARCH__HOST✅ YesopensearchOpenSearch host
INTEGRATION_HOST✅ Yesintegration-api:9004gRPC address of integration-api
ENDPOINT_HOST✅ Yesendpoint-api:9005gRPC address of endpoint-api

Tuning variables

VariableDefaultDescription
LOG_LEVELdebugdebug · info · warn · error
ENVdevelopmentdevelopment · staging · production
POSTGRES__MAX_OPEN_CONNECTION50Database connection pool size
POSTGRES__MAX_IDEAL_CONNECTION25Idle connections to keep open
REDIS__MAX_CONNECTION10Redis connection pool size
OPENSEARCH__MAX_RETRIES3Retry attempts on OpenSearch failure
OPENSEARCH__MAX_CONNECTION10OpenSearch connection pool size

Full environment file

# ── Service identity ──────────────────────────────────────────────
SERVICE_NAME=workflow-api
HOST=0.0.0.0
PORT=9007
LOG_LEVEL=debug
SECRET=rpd_pks
ENV=development

# ── Asset storage ─────────────────────────────────────────────────
ASSET_STORE__STORAGE_TYPE=local
ASSET_STORE__STORAGE_PATH_PREFIX=/app/rapida-data/assets/workflow

# ── PostgreSQL ────────────────────────────────────────────────────
POSTGRES__HOST=postgres
POSTGRES__PORT=5432
POSTGRES__DB_NAME=assistant_db
POSTGRES__AUTH__USER=rapida_user
POSTGRES__AUTH__PASSWORD=rapida_db_password
POSTGRES__MAX_OPEN_CONNECTION=50
POSTGRES__MAX_IDEAL_CONNECTION=25
POSTGRES__SSL_MODE=disable

# ── Redis (second-level GORM cache) ───────────────────────────────
POSTGRES__SLC_CACHE__HOST=redis
POSTGRES__SLC_CACHE__PORT=6379
POSTGRES__SLC_CACHE__DB=1
POSTGRES__SLC_CACHE__MAX_CONNECTION=10

# ── Redis ─────────────────────────────────────────────────────────
REDIS__HOST=redis
REDIS__PORT=6379
REDIS__MAX_CONNECTION=10
REDIS__MAX_DB=0

# ── OpenSearch ────────────────────────────────────────────────────
OPENSEARCH__SCHEMA=http
OPENSEARCH__HOST=opensearch
OPENSEARCH__PORT=9200
OPENSEARCH__MAX_RETRIES=3
OPENSEARCH__MAX_CONNECTION=10

# ── Internal service addresses (gRPC) ─────────────────────────────
INTEGRATION_HOST=integration-api:9004
ENDPOINT_HOST=endpoint-api:9005
ASSISTANT_HOST=assistant-api:9007
WEB_HOST=web-api:9001
Change SECRET to a cryptographically random value before any production deployment. All services must share the same SECRET. Generate one with openssl rand -hex 32. A mismatch will cause JWT validation to fail across services.

Running

# Start assistant-api and its dependencies (postgres, redis, opensearch)
make up-assistant

# Follow logs
make logs-assistant

# Rebuild image after code changes (no cache)
make rebuild-assistant

# Open a shell inside the container
make shell-assistant

Health & Observability

EndpointPurpose
GET /readiness/Reports whether the service is ready to accept traffic (DB + Redis + OpenSearch connected)
GET /healthz/Liveness probe — confirms the Go process is alive
curl http://localhost:9007/readiness/

Troubleshooting

  • Verify the JWT token in the Authorization header is valid and not expired.
  • Confirm the assistant UUID exists in assistant_db.
  • Run make logs-assistant and look for connection rejected or auth failed entries.
  • Confirm the STT provider credentials are stored in integration-api and pass the credential test.
  • Verify the audio sample rate matches the provider’s expectation (most require 8 kHz or 16 kHz).
  • Check that OPENSEARCH__HOST is reachable — OpenSearch indexing failures can surface as STT errors in logs.
  • Check integration-api is healthy: curl http://localhost:9004/readiness/
  • Verify the LLM provider API key is valid and has sufficient quota.
  • Increase POSTGRES__MAX_OPEN_CONNECTION if database contention is visible in logs.
  • Confirm the port is exposed: docker port assistant-api 4573
  • Check the Asterisk dialplan passes a valid assistant UUID.
  • Verify firewall rules allow TCP from the Asterisk host to port 4573.
  • Confirm the webhook URL is publicly reachable from the provider’s servers (not localhost).
  • Check that the TwiML / Vonage NCCO returns the correct stream URL.
  • Review make logs-assistant for codec negotiation errors.

Next Steps