Cartesia Text-to-Speech

Supported Models

Text-to-Speech Models

Model Name	Description	Use Case
Sonic English	High-quality English voices	Professional TTS applications
Bark Model	Expressive speech synthesis	Creative and dynamic voice outputs
Tortoise TTS	Ultra-realistic voices	High-fidelity voice synthesis

Supported Languages

Cartesia supports 20+ languages including:

English (multiple accents)

Spanish, French, German, Italian

Portuguese, Russian, Polish, Dutch

Japanese, Mandarin, Korean

Hindi and other Indian languages

And more

Voice Features

Multiple Voice Personalities: Choose from various voice options

Emotion Expression: Control emotional tone and expressiveness

Pronunciation Control: Define how specific words are pronounced

Speed Control: Adjust speaking rate from slow to fast

Streaming Support: Stream audio in real-time

Setting Up Provider Credentials

Access the Integrations Page

Navigate to the “Integration > Models” page to access TTS providers.

Select Cartesia

On the Integrations page, find the Cartesia provider card.Click the “Setup Credential” button for Cartesia.

Create Provider Credential

A modal window will appear titled “Create provider credential”. Follow these steps:

Select “Cartesia” from the dropdown (if not already selected)
Enter a Key Name: Assign a unique name to this provider key for easy identification
Enter the API Key: Input your Cartesia API key
Click “Configure” to save the credential

Verify Credential Setup

After setting up the credential, you can verify it’s been added:

The Cartesia provider card should now show “Connected”
If you click on the provider, you’ll see a “View provider credential” modal
This modal displays the credential name, when it was last updated, and options to delete or close

Your Cartesia Text-to-Speech provider credential is now set up.

Integration Features

Ultra-Realistic Voices: Advanced voice synthesis quality

20+ Languages: Comprehensive language coverage

Multiple Voice Models: Choose different TTS models

Expression Control: Fine-tune emotional expression

Real-time Processing: Low-latency audio generation

Streaming TTS: Stream audio as it’s being generated

Professional Quality: Enterprise-grade voice synthesis

Assistants

Knowledge

LLM Endpoint

Activity & Logs

External Integrations

Credentials

Workspace

Governance

Deployment Options

Getting Started

Supported Models

Text-to-Speech Models

Supported Languages

Voice Features

Prerequisites

Setting Up Provider Credentials

Integration Features

​Getting Started

​Supported Models

​Text-to-Speech Models

​Supported Languages

​Voice Features

​Prerequisites

​Setting Up Provider Credentials

​Integration Features

Getting Started

Supported Models

Text-to-Speech Models

Supported Languages

Voice Features

Prerequisites

Setting Up Provider Credentials

Integration Features