Skip to main content
An assistant is the central object in Rapida. It holds your LLM configuration, system prompt, voice pipeline settings, tools, knowledge bases, and deployment channels — all versioned and deployable across phone, web, and messaging from one place.
Before creating an assistant, set up credentials for your LLM, STT, and TTS providers in Integration → Vault. The creation wizard will ask you to select a provider — credentials must exist first.

Create your first assistant

1

Navigate to Assistants

Navigate to AssistantsGo to Assistants in the main sidebar. Click Add new assistant to open the creation wizard.
2

Select your LLM provider and model

Select your LLMChoose your LLM provider and the specific model to power this assistant. Supported providers: OpenAI, Anthropic, Azure OpenAI, Google Gemini, Cohere, Vertex AI, or a custom AgentKit gRPC backend.Each provider requires a vault credential. The model you select here sets the default — you can tune all parameters after creation.
3

Write your system prompt

Provide Assistant InstructionsDefine the assistant’s persona, scope, and behaviour in the Instructions field. This becomes the system prompt sent to the LLM at the start of every conversation.Use {{variable}} syntax to inject dynamic values at runtime — caller name, account ID, or any context passed when a call is initiated.
4

Add tools (optional)

Add ToolsAttach tools to extend what the assistant can do mid-conversation — query a knowledge base, call an external API, transfer the call, or end the session. Tools can also be added or modified after creation.
5

Name and create

Success ConfirmationGive the assistant a name and description, then click Create Assistant. The assistant is created at version v1 in draft state. Configure voice, deployments, and advanced settings before going live.
A newly created assistant has no deployment attached. It will not handle live calls or web sessions until you configure at least one deployment under Configure assistant → Deployments.

Configure your assistant

After creation, open Configure assistant from the top-right of the assistant page. Configuration is organized into six areas:

Prompt & Model

System prompt, model selection, temperature, token limits, and advanced LLM parameters.

Voice Pipeline

STT provider and model, VAD sensitivity, noise cancellation, EOS detection, TTS provider and voice, pronunciation rules.

Knowledge & Retrieval

Attach knowledge bases, set retrieval method (hybrid, semantic, text), top-K, score threshold, and reranking.

Tools

Knowledge retrieval, API request, endpoint invocation, call hold, and end-of-conversation tool types.

Deployments

Phone, web widget, web app, WhatsApp, and API deployment channels — each with its own voice and experience settings.

Webhooks & Analysis

Post-call webhooks to downstream systems and analysis pipelines for conversation scoring and custom metrics.

Prompt and model

The prompt and model configuration defines what your assistant knows, how it reasons, and how it generates responses.

System prompt

The system prompt is your primary control surface. It sets the assistant’s persona, scope of knowledge, constraints, and tone. Well-written prompts are the single biggest lever for assistant quality.
For voice: Keep sentences short and natural. Avoid bullet lists, markdown, and symbols — the TTS engine reads punctuation literally. Write instructions the way you’d brief a call centre agent.
Dynamic variables — inject runtime context into your prompt using {{variable_name}} syntax. Variables are automatically detected from your prompt and can be populated via the SDK when initiating a call:
You are an assistant for {{company_name}}. The caller's name is {{caller_name}}.
Their account tier is {{account_tier}}. Always address them by first name.
Multi-turn prompt structure — the prompt editor supports system, user, and assistant roles. Add example exchanges to shape the tone and structure of responses without increasing latency.

LLM model parameters

ParameterDefaultRangeNotes
Modelgpt-4ogpt-4, gpt-4o, gpt-4.1-mini, o3, o4-mini, and more
Temperature0.70–2Higher = more creative, lower = more deterministic
Top P10–1Nucleus sampling; use with temperature, not instead
Max completion tokens2048≥1Caps response length
Frequency penalty0-2 to 2Reduces word repetition
Presence penalty0-2 to 2Encourages topic diversity
Stop sequencesup to 4Tokens that halt generation
Tool choiceautonone / auto / requiredControl when tools are called
Reasoning effortlow / medium / highFor o-series models only
Response formattext / json_object / json_schemaFor structured output use cases
ParameterDefaultRangeNotes
Modelclaude-opus-4Opus 4, Sonnet 4, Sonnet 3.7, Haiku 3.5
Max tokens1028≥1Caps response length
Temperature1.00–1
Top P0–1Nucleus sampling
Top KTop-K sampling
Stop sequencesHalt generation tokens
Extended thinkingJSON configClaude’s built-in reasoning mode
AgentKit replaces the built-in LLM with your own gRPC server. Rapida streams user speech transcripts to your server and synthesizes your text responses to audio in real time.
ParameterNotes
Server URLhost:port of your gRPC service (e.g. my-server.example.com:50051)
TLS certificateOptional — path to CA cert for mutual TLS
MetadataKey-value map passed as gRPC metadata headers
Your server receives a bidirectional Talk stream. Rapida handles all audio — VAD, STT, TTS, telephony. Your server only handles text in / text out.See the AgentKit guide for implementation examples with LangChain, CrewAI, and Anthropic Claude.

Voice pipeline

The voice pipeline controls how audio is captured, cleaned, converted to text, and how text responses are spoken back. These settings directly affect latency, accuracy, and caller experience.

Voice input — STT, VAD, and noise processing

For telephone deployments (8kHz audio), choose STT providers with dedicated telephony models: Deepgram nova-3, Azure Speech (telephony mode), or AssemblyAI. For high-fidelity web audio (16kHz+), all providers perform well.
STT providers: Deepgram, AssemblyAI, Azure Cognitive Speech, Google Speech, OpenAI Whisper, Cartesia, Sarvam AI
Audio settingDefaultRangeWhat it controls
STT languagemultiProvider-dependentPrimary transcription language; multi enables auto-detection
STT modelnova-3Provider-dependentAccuracy vs. latency tradeoff per provider
VAD threshold0.60.5–1.0Speech detection sensitivity. Lower = more sensitive (may catch noise). Higher = stricter (may miss quiet speech)
EOS timeout700 ms500–4000 msSilence duration before the assistant treats a pause as end-of-turn. Increase for conversational pauses; decrease for snappy IVR-style responses
EOS backoff20–5Re-prompts (“Are you there?”) before ending session on silence
Noise cancellationrn_noiseRNNoise background noise suppression applied before VAD and STT
EOS timeout is the most impactful latency lever after model selection. At 700ms, the assistant waits 0.7 seconds of silence before responding. Reducing to 500ms cuts perceived latency noticeably but risks cutting off slow speakers mid-sentence.

Voice output — TTS and pronunciation

TTS providers: ElevenLabs, Cartesia, Deepgram Aura, OpenAI TTS, Azure Speech, Google Cloud TTS, PlayHT, Sarvam AI
Speaker settingDefaultWhat it controls
Voice IDProvider-dependentThe specific voice model. Supports cloned brand voices on ElevenLabs and Resemble
Speed / emotionNormalProvider-specific; Cartesia exposes slowest → fastest and emotion controls (anger:high, positivity:low, etc.)
Sentence boundaries.!?;:—…Characters that flush a TTS chunk for immediate playback. Tuning these reduces time-to-first-audio
Conjunction break240 msPause inserted at conjunctions (and, but, or). Adds natural rhythm to long sentences
Pronunciation dictionariesNormalize how abbreviations, currencies, dates, URLs, and technical terms are spoken
Add your product names, acronyms, and technical terms to the pronunciation dictionary to prevent the TTS engine from mispronouncing them. This is especially important for brand names and medical or legal terms.

Knowledge and retrieval

Attach one or more knowledge bases to give your assistant access to documents, FAQs, product data, or any content indexed in Rapida.
SettingDefaultRangeNotes
Retrieval methodhybridhybrid / semantic / textHybrid combines vector similarity and full-text search — best default for most use cases
Top K51–20Number of document chunks retrieved per query. Higher = more context, higher latency
Score threshold0.50–1Minimum relevance score. Raise to reduce noise; lower to improve recall
RerankingoffCohere reranker re-scores retrieved chunks before passing to the LLM. Improves precision at the cost of a small latency increase
See Create a Knowledge Base for document ingestion, connector setup, and embedding model configuration.

Tools

Tools extend what your assistant can do mid-conversation without breaking the voice flow. The LLM decides when to call a tool based on its description — write clear, specific descriptions.
Tool names must use only letters, numbers, and underscores (no spaces). The description is passed directly to the LLM — it determines when and whether the tool is called. Be specific: “Search the product knowledge base for pricing information” outperforms “Search knowledge base”.

Conversation experience

These settings control the runtime behaviour of a live session — what happens when the caller goes silent, how long sessions last, and what the assistant says at the start of a call.
SettingDefaultRangeNotes
Greeting messageSpoken immediately when a call connects. Supports {{variable}} for personalisation
Idle messageAre you there?Spoken after the idle timeout expires without user input
Idle silence timeout30 s15–120 s (phone)Time before the idle message triggers
Idle backoff20–5How many times the idle message repeats before ending the session
Max session duration300 s180–600 s (phone)Hard cap on session length — protects against runaway calls
Error messageSpoken when the assistant encounters an unrecoverable error

Webhooks and post-call analysis

Webhooks fire at conversation.begin, conversation.completed, and conversation.failed events. Deliver transcripts, metadata, and analysis results to your CRM, data warehouse, or alerting system in real time. Analysis pipelines run after a conversation ends. They invoke a configured Rapida endpoint — typically an LLM prompt — against the conversation transcript to produce structured output: sentiment scores, intent labels, CSAT predictions, compliance flags, or any custom metric. See Webhooks and Analysis for full configuration details.

Version control

Every change to your assistant’s prompt, model, or parameters creates a new version. Versions let you safely iterate without affecting live traffic.
New versions are not deployed automatically. After creating a version, it stays in draft state until you explicitly release it. Live deployments continue running the previously released version until you promote the new one.
ActionWhat it does
Create versionSaves a new draft with updated prompt/model config and a change description
Release versionPromotes the version to live — all active deployments switch immediately
RollbackRe-release a previous version to revert a bad change
This model lets you run A/B tests, stage changes in a debugger deployment before pushing to phone, and maintain a full audit trail of every prompt change — who made it, when, and why.

Next steps