Skip to main content
An assistant is the core unit of Rapida. It packages everything needed to run a production voice AI conversation — the LLM and prompt, the voice pipeline (STT, VAD, TTS), knowledge sources, tools, and deployment channels — into a single versioned object. One assistant configuration drives every channel. The same prompt, model, and voice settings that handle your inbound phone calls also power your web widget and WhatsApp deployment. Change something once and it propagates everywhere.
Assistants are version-controlled. Every prompt or model change creates a new draft version. Versions must be explicitly released — live deployments are never changed automatically.

Anatomy of an assistant

Prompt & Model

The system prompt defines persona, scope, and behaviour. The LLM provider and model (OpenAI, Anthropic, Gemini, Mistral, Bedrock, or a custom AgentKit backend) power the reasoning. Model parameters — temperature, max tokens, stop sequences — are tunable per version.

Voice Pipeline

Converts caller audio to text and assistant text back to speech. Configurable STT provider, VAD sensitivity, noise cancellation, end-of-speech detection, TTS provider, voice model, and pronunciation rules — independently tunable per deployment channel.

Knowledge Bases

One or more knowledge bases attached for retrieval-augmented generation. At call time, the assistant retrieves the most relevant document chunks and injects them as context before calling the LLM.

Tools

Functions the LLM can invoke mid-conversation: query knowledge, call external APIs, invoke endpoint LLM prompts, hold a call, or end the session. The LLM decides when to call each tool based on its description.

Deployments

Each channel — phone, web widget, web app, WhatsApp — is a separate deployment attached to the assistant. Deployments share the assistant’s brain but can have per-channel voice and experience settings.

Webhooks

Post-call webhooks fire at conversation start, completion, and failure. Deliver transcripts, metadata, and tool call results to your CRM, data warehouse, or alerting system in real time.

Post-call Analysis

Analysis pipelines run LLM prompts against completed transcripts to produce sentiment scores, intent labels, CSAT predictions, compliance flags, and custom metrics.

How a voice conversation works

Every conversation follows the same pipeline from audio-in to audio-out. Understanding this flow helps you tune latency, accuracy, and behaviour at each stage.
The EOS timeout (default 700ms) is the primary latency control between the caller finishing speaking and the assistant beginning to respond. Reduce it for snappy IVR-style interactions; increase it for conversational use cases where callers pause mid-thought.

Deployment channels

The same assistant is deployable across every channel. Each deployment is independently configured for voice settings and conversation experience while sharing the assistant’s prompt, model, and tools.

The assistant lifecycle

1. Create — Define the assistant with an LLM provider and initial system prompt. The first version (v1) is created automatically in draft state. 2. Configure — Attach knowledge bases, add tools, tune the voice pipeline, and set up deployments. Each deployment channel has its own voice settings and experience configuration. 3. Test — Use the built-in Debugger deployment to run live conversations before any real traffic hits the assistant. Inspect transcripts, latency breakdowns, and tool invocations. 4. Release — Promote a version from draft to live. All active deployments switch to the released version immediately. 5. Monitor — Every conversation generates structured logs: full transcript, per-turn latency, tool call results, LLM token usage, and EOS timing. Webhook events and analysis pipeline outputs flow to your downstream systems. 6. Iterate — Create a new version with updated prompt or model parameters. Test in the Debugger. Release when confident. Previous versions are preserved and can be re-released for instant rollback.
Use separate assistants for distinct products or personas rather than a single assistant with complex conditional logic in the prompt. Assistants are cheap to create — isolation keeps prompts focused and version history clean.

In this section