RTVIObserver translates Pipecat’s internal pipeline events into standardized RTVI protocol messages. It monitors frame flow through the pipeline and generates corresponding client messages based on event types.
Purpose
TheRTVIObserver primarily serves to convert internal pipeline frames into to client-compatible RTVI messages. It is required for any application using RTVI as the client protocol to ensure proper communication of events such as speech start/stop, user transcript, bot output, metrics, and server messages.
Automatic Setup
RTVIObserver is automatically created and attached when you create a PipelineWorker. No manual setup is required for standard usage.
To customize the observer’s behavior, pass RTVIObserverParams to the worker:
Configuration
RTVIObserverParams accepts the following fields:
Indicates if bot output messages should be sent.
Indicates if the bot’s LLM messages should be sent.
Indicates if the bot’s TTS messages should be sent.
Indicates if the bot’s started/stopped speaking messages should be sent.
Indicates if bot’s audio level messages should be sent.
Indicates if the user’s LLM input messages should be sent.
Indicates if the user’s started/stopped speaking messages should be sent.
Indicates if raw VAD user started/stopped speaking messages should be sent. These reflect the VAD signal directly, independent of turn finalization (unlike
user_speaking_enabled, which a turn strategy may gate or defer).Indicates if user mute started/stopped messages (
user-mute-started,
user-mute-stopped) should be sent.Indicates if user’s transcription messages should be sent.
Indicates if user’s audio level messages should be sent.
Indicates if metrics messages should be sent.
Indicates if system logs should be sent.
List of aggregation types to skip sending as tts/output messages.
If using this to avoid sending secure information, be sure to also disable
bot_llm_enabled to avoid leaking through LLM messages.
A list of tuples to transform text before sending it to the client. Each tuple should be of the form When
(aggregation_type, transform_function).The preferred transform signature is:accumulated_text and remaining_text are None, the transform is being called for the full segment text (bot-output message). When they are provided, the transform is being called for a progress event and must return a BotOutputTransformResult with accumulated_text and remaining_text set, enabling word-level transforms on the client side.The legacy 2-parameter signature (text, agg_type) -> str is deprecated but still works. Transforms using it will emit a DeprecationWarning at registration time.Example:How often audio levels should be sent if enabled.
function_call_report_level
Dict[str, RTVIFunctionCallReportLevel]
default:"{\"*\": RTVIFunctionCallReportLevel.NONE}"
Controls what information is exposed in function call lifecycle events
(
llm-function-call-started, llm-function-call-in-progress,
llm-function-call-stopped). Maps function names to security levels, where
"*" sets the default for unlisted functions.Levels:DISABLED: No events emitted for this functionNONE: Events withtool_call_idonly (most secure when events are needed)NAME: Adds function name to eventsFULL: Adds function name, arguments, and results
Frame Translation
The observer maps Pipecat’s internal frames to RTVI protocol messages:| Pipeline Frame | RTVI Message |
|---|---|
| Speech Events | |
UserStartedSpeakingFrame | RTVIUserStartedSpeakingMessage |
UserStoppedSpeakingFrame | RTVIUserStoppedSpeakingMessage |
VADUserStartedSpeakingFrame | VADUserStartedSpeakingMessage |
VADUserStoppedSpeakingFrame | VADUserStoppedSpeakingMessage |
BotStartedSpeakingFrame | RTVIBotStartedSpeakingMessage |
BotStoppedSpeakingFrame | RTVIBotStoppedSpeakingMessage |
| User Mute | |
UserMuteStartedFrame | RTVIUserMuteStartedMessage |
UserMuteStoppedFrame | RTVIUserMuteStoppedMessage |
| Transcription | |
TranscriptionFrame | RTVIUserTranscriptionMessage(final=true) |
InterimTranscriptionFrame | RTVIUserTranscriptionMessage(final=false) |
| Bot Output | |
AggregatedTextFrame | RTVIBotOutputMessage |
| LLM Processing | |
LLMFullResponseStartFrame | RTVIBotLLMStartedMessage |
LLMFullResponseEndFrame | RTVIBotLLMStoppedMessage |
LLMTextFrame | RTVIBotLLMTextMessage |
| TTS Events | |
TTSStartedFrame | RTVIBotTTSStartedMessage |
TTSStoppedFrame | RTVIBotTTSStoppedMessage |
TTSTextFrame | RTVIBotTTSTextMessage |
| Function Calls | |
FunctionCallsStartedFrame | llm-function-call-started |
FunctionCallInProgressFrame | llm-function-call-in-progress |
FunctionCallResultFrame | llm-function-call-stopped |
| Context/Metrics | |
LLMContextFrame | RTVIUserLLMTextMessage |
MetricsFrame | RTVIMetricsMessage |
RTVIServerMessageFrame | RTVIServerMessage |
| User Interface | |
RTVIUICommandFrame | UICommandMessage (ui-command) |
RTVIUIJobGroupFrame | UIJobGroupMessage (ui-job-group) |
PipelineWorker when a UIWorker sends a command or runs a job group; the observer translates them into the outbound ui-command / ui-job-group messages the client renders.