Voice.ai - Pipecat

Overview

VoiceAiTTSService converts text to speech using Voice.ai’s Multi-Context WebSocket API. It maintains a persistent WebSocket connection that streams raw PCM audio (32kHz mono), with support for multiple languages, custom voice selection, and temperature/top_p generation controls.

Source Repository

Source code, examples, and issues for the Voice.ai integration

Voice.ai Website

API Documentation

Voice.ai Multi-Context WebSocket TTS API reference

Installation

This is a community-maintained package distributed separately from pipecat-ai. It is not published to PyPI; install it from source:

git clone https://github.com/voice-ai/voice-ai-pipecat-tts.git
cd voice-ai-pipecat-tts
pip install -e .

Prerequisites

Voice.ai Account Setup

Before using the Voice.ai TTS service, you need:

Voice.ai Account: Sign up at Voice.ai
API Key: Obtain an API key (format: vk_*) from Voice.ai

Required Environment Variables

VOICEAI_API_KEY: Your Voice.ai API key for authentication

Configuration

api_key

str

required

Voice.ai API key for authentication (format: vk_*).

voice_id

str

default:"None"

Voice identifier for synthesis. If not provided, uses the default built-in voice.

url

str

default:"wss://dev.voice.ai/api/v1/tts/multi-stream"

WebSocket URL for the Voice.ai multi-context TTS API.

sample_rate

int

default:"None"

Output audio sample rate. Defaults to 32000 Hz (Voice.ai’s native rate) when not set.

params

VoiceAiTTSService.InputParams

default:"None"

Voice synthesis settings. See Input Parameters below.

aggregate_sentences

bool

default:"True"

Whether to aggregate text by sentences before TTS. When True, each sentence is sent separately for lower latency; when False, larger text chunks are batched for more natural flow at the cost of higher latency.

Input Parameters

Synthesis settings passed via the params constructor argument using VoiceAiTTSService.InputParams(...).

Parameter	Type	Default	Description
`language`	`Language`	`Language.EN`	Target language. Supports `en`, `ca`, `sv`, `es`, `fr`, `de`, `it`, `pt`, `pl`, `ru`, `nl`.
`model`	`str`	`None`	TTS model. Auto-selected based on language when not provided.
`audio_format`	`str`	`"pcm"`	Audio output format (raw PCM).
`temperature`	`float`	`1.0`	Generation temperature (0.0–2.0). Higher values are more random.
`top_p`	`float`	`0.8`	Top-p sampling (0.0–1.0). Controls output diversity.

Available parameters and defaults are defined by the integration and the Voice.ai API. See the source repository for the authoritative, up-to-date list.

Usage

from pipecat_voice_ai import VoiceAiTTSService
from pipecat.transcriptions.language import Language

tts = VoiceAiTTSService(
    api_key="vk_your-api-key",
    voice_id="your-voice-id",  # Optional: uses default if not provided
    params=VoiceAiTTSService.InputParams(
        language=Language.EN,
        temperature=1.0,
        top_p=0.8,
    ),
)

# Add tts to your pipeline:
from pipecat.pipeline.pipeline import Pipeline

pipeline = Pipeline([
    # ... other processors ...
    tts,
    transport.output(),
])

Compatibility

Tested with Pipecat v0.0.100+. Check the source repository for the latest tested version and changelog.

​Overview

Source Repository

Voice.ai Website

API Documentation

​Installation

​Prerequisites

​Voice.ai Account Setup

​Required Environment Variables

​Configuration

​Input Parameters

​Usage

​Compatibility

Overview

Installation

Prerequisites

Voice.ai Account Setup

Required Environment Variables

Configuration

Input Parameters

Usage

Compatibility