Overview
SimplismartSTTService is a segmented speech-to-text service that POSTs WAV
audio segments to the Simplismart HTTP /predict
endpoint and emits TranscriptionFrames. It requires upstream VAD (a
VADProcessor or transport/user-aggregator VAD) so speech segments are
delimited before transcription.
Source Repository
Source code, examples, and issues for the Simplismart integration
Simplismart
Learn more about Simplismart’s AI platform
Installation
This is a community-maintained package distributed separately frompipecat-ai.
It is not published to PyPI, so install it from source:
Prerequisites
Simplismart Account Setup
Before using the Simplismart STT service, you need a Simplismart account and an API key. See Simplismart to get started.Required Environment Variables
SIMPLISMART_API_KEY: Bearer token used to authenticate requests. May be passed directly via theapi_keyconstructor argument instead.SIMPLISMART_STT_URL(optional): Full URL for the STT endpoint. Defaults tohttps://api.simplismart.live/predict.
Configuration
Bearer token. Falls back to the
SIMPLISMART_API_KEY environment variable if
not provided.Full URL to the predict endpoint. Falls back to the
SIMPLISMART_STT_URL
environment variable, then to https://api.simplismart.live/predict.Optional shared aiohttp session. If not provided, the service creates and owns
its own session.
Input audio sample rate. Usually supplied by the pipeline
StartFrame.Runtime-configurable STT settings. See Settings below.
Settings
Runtime-configurable settings passed via thesettings constructor argument
using SimplismartSTTService.Settings(...). The settings dataclass extends
Pipecat’s common STTSettings (which includes model and language).
| Parameter | Type | Default | Description |
|---|---|---|---|
vad_filter | bool | True | Enable server-side VAD filtering when supported. |
vad_onset | float | 0.5 | VAD onset threshold. |
beam_size | int | 4 | Beam search size for decoding. |
temperature | float | 0.0 | Decoding temperature. |
strict_hallucination_reduction | bool | True | Ask the server to apply extra anti-hallucination logic (Whisper). |
openai/whisper-large-v3-turbo and the default language is
Language.EN.
See the source
repository for the
authoritative, up-to-date list of settings and defaults.
Usage
Place aVADProcessor before SimplismartSTTService so VAD events reach the
segmented STT layer.
TranscriptionFrames.
Compatibility
Tested with Pipecat v1.1.0 (pipecat-ai>=0.0.86). Check the source
repository for the latest
tested version and changelog.