Skip to main content

Overview

VoiceAiTTSService converts text to speech using Voice.ai’s Multi-Context WebSocket API. It maintains a persistent WebSocket connection that streams raw PCM audio (32kHz mono), with support for multiple languages, custom voice selection, and temperature/top_p generation controls.

Source Repository

Source code, examples, and issues for the Voice.ai integration

Voice.ai Website

Sign up and get a Voice.ai API key

API Documentation

Voice.ai Multi-Context WebSocket TTS API reference

Installation

This is a community-maintained package distributed separately from pipecat-ai. It is not published to PyPI; install it from source:
git clone https://github.com/voice-ai/voice-ai-pipecat-tts.git
cd voice-ai-pipecat-tts
pip install -e .

Prerequisites

Voice.ai Account Setup

Before using the Voice.ai TTS service, you need:
  1. Voice.ai Account: Sign up at Voice.ai
  2. API Key: Obtain an API key (format: vk_*) from Voice.ai

Required Environment Variables

  • VOICEAI_API_KEY: Your Voice.ai API key for authentication

Configuration

api_key
str
required
Voice.ai API key for authentication (format: vk_*).
voice_id
str
default:"None"
Voice identifier for synthesis. If not provided, uses the default built-in voice.
url
str
default:"wss://dev.voice.ai/api/v1/tts/multi-stream"
WebSocket URL for the Voice.ai multi-context TTS API.
sample_rate
int
default:"None"
Output audio sample rate. Defaults to 32000 Hz (Voice.ai’s native rate) when not set.
params
VoiceAiTTSService.InputParams
default:"None"
Voice synthesis settings. See Input Parameters below.
aggregate_sentences
bool
default:"True"
Whether to aggregate text by sentences before TTS. When True, each sentence is sent separately for lower latency; when False, larger text chunks are batched for more natural flow at the cost of higher latency.

Input Parameters

Synthesis settings passed via the params constructor argument using VoiceAiTTSService.InputParams(...).
ParameterTypeDefaultDescription
languageLanguageLanguage.ENTarget language. Supports en, ca, sv, es, fr, de, it, pt, pl, ru, nl.
modelstrNoneTTS model. Auto-selected based on language when not provided.
audio_formatstr"pcm"Audio output format (raw PCM).
temperaturefloat1.0Generation temperature (0.0–2.0). Higher values are more random.
top_pfloat0.8Top-p sampling (0.0–1.0). Controls output diversity.
Available parameters and defaults are defined by the integration and the Voice.ai API. See the source repository for the authoritative, up-to-date list.

Usage

from pipecat_voice_ai import VoiceAiTTSService
from pipecat.transcriptions.language import Language

tts = VoiceAiTTSService(
    api_key="vk_your-api-key",
    voice_id="your-voice-id",  # Optional: uses default if not provided
    params=VoiceAiTTSService.InputParams(
        language=Language.EN,
        temperature=1.0,
        top_p=0.8,
    ),
)

# Add tts to your pipeline:
from pipecat.pipeline.pipeline import Pipeline

pipeline = Pipeline([
    # ... other processors ...
    tts,
    transport.output(),
])

Compatibility

Tested with Pipecat v0.0.100+. Check the source repository for the latest tested version and changelog.