Assistants are version-controlled. Every prompt or model change creates a new draft version. Versions must be explicitly released — live deployments are never changed automatically.
Anatomy of an assistant
Prompt & Model
The system prompt defines persona, scope, and behaviour. The LLM provider and model (OpenAI, Anthropic, Gemini, Mistral, Bedrock, or a custom AgentKit backend) power the reasoning. Model parameters — temperature, max tokens, stop sequences — are tunable per version.
Voice Pipeline
Converts caller audio to text and assistant text back to speech. Configurable STT provider, VAD sensitivity, noise cancellation, end-of-speech detection, TTS provider, voice model, and pronunciation rules — independently tunable per deployment channel.
Knowledge Bases
One or more knowledge bases attached for retrieval-augmented generation. At call time, the assistant retrieves the most relevant document chunks and injects them as context before calling the LLM.
Tools
Functions the LLM can invoke mid-conversation: query knowledge, call external APIs, invoke endpoint LLM prompts, hold a call, or end the session. The LLM decides when to call each tool based on its description.
Deployments
Each channel — phone, web widget, web app, WhatsApp — is a separate deployment attached to the assistant. Deployments share the assistant’s brain but can have per-channel voice and experience settings.
Webhooks
Post-call webhooks fire at conversation start, completion, and failure. Deliver transcripts, metadata, and tool call results to your CRM, data warehouse, or alerting system in real time.
Post-call Analysis
Analysis pipelines run LLM prompts against completed transcripts to produce sentiment scores, intent labels, CSAT predictions, compliance flags, and custom metrics.
How a voice conversation works
Every conversation follows the same pipeline from audio-in to audio-out. Understanding this flow helps you tune latency, accuracy, and behaviour at each stage.The EOS timeout (default 700ms) is the primary latency control between the caller finishing speaking and the assistant beginning to respond. Reduce it for snappy IVR-style interactions; increase it for conversational use cases where callers pause mid-thought.
Deployment channels
The same assistant is deployable across every channel. Each deployment is independently configured for voice settings and conversation experience while sharing the assistant’s prompt, model, and tools.Phone
Inbound and outbound PSTN calls. Connect Twilio, Vonage, Exotel, Asterisk, or any SIP trunk. The assistant handles full voice conversations with live call transfer and session controls.
Web Widget
Embeddable voice and chat widget. Add to any web page with a script tag. Supports text, voice input, and voice output — configurable per deployment.
Web App (React SDK)
Full-featured voice integration inside your own React application. Audio streams over WebRTC directly — no relay servers, no iframes.
Conversational AI on WhatsApp Business via the Meta webhook API. Maintains multi-turn context across messages.
API / SDK
Trigger and manage assistants programmatically. Initiate calls, pass runtime variables, stream transcripts, and receive webhook events from Python, Node.js, Go, or React.
Debugger
Browser-based testing environment. Talk to your assistant before it touches production — inspect transcripts, tool calls, and LLM responses in real time.
The assistant lifecycle
1. Create — Define the assistant with an LLM provider and initial system prompt. The first version (v1) is created automatically in draft state.
2. Configure — Attach knowledge bases, add tools, tune the voice pipeline, and set up deployments. Each deployment channel has its own voice settings and experience configuration.
3. Test — Use the built-in Debugger deployment to run live conversations before any real traffic hits the assistant. Inspect transcripts, latency breakdowns, and tool invocations.
4. Release — Promote a version from draft to live. All active deployments switch to the released version immediately.
5. Monitor — Every conversation generates structured logs: full transcript, per-turn latency, tool call results, LLM token usage, and EOS timing. Webhook events and analysis pipeline outputs flow to your downstream systems.
6. Iterate — Create a new version with updated prompt or model parameters. Test in the Debugger. Release when confident. Previous versions are preserved and can be re-released for instant rollback.
In this section
Create an Assistant
Step-by-step guide through creation, prompt setup, model configuration, voice pipeline, and tools.
Version Control
Create, compare, and release new versions. Roll back instantly if something goes wrong.
Tools
Add knowledge retrieval, API calls, endpoint invocations, hold, and end-of-conversation tools.
Knowledge
Attach knowledge bases and tune retrieval settings for your use case.
Webhooks
Configure event-driven delivery of transcripts and call data to external systems.
Logs
Browse conversation transcripts, tool call traces, LLM token usage, and latency breakdowns for every session.
Post-call Analysis
Run LLM-powered analysis pipelines against completed conversations.
AgentKit
Replace Rapida’s built-in LLM with your own gRPC backend — LangChain, CrewAI, or custom logic.
Twilio Integration
Step-by-step guide for connecting a Twilio phone number to your assistant.