Skip to main content
The RTVIObserver translates Pipecat’s internal pipeline events into standardized RTVI protocol messages. It monitors frame flow through the pipeline and generates corresponding client messages based on event types.

Purpose

The RTVIObserver primarily serves to convert internal pipeline frames into to client-compatible RTVI messages. It is required for any application using RTVI as the client protocol to ensure proper communication of events such as speech start/stop, user transcript, bot output, metrics, and server messages.

Automatic Setup

RTVIObserver is automatically created and attached when you create a PipelineWorker. No manual setup is required for standard usage. To customize the observer’s behavior, pass RTVIObserverParams to the worker:
from pipecat.processors.frameworks.rtvi import RTVIObserverParams

worker = PipelineWorker(
    pipeline,
    rtvi_observer_params=RTVIObserverParams(
        bot_llm_enabled=False,
        metrics_enabled=False,
    ),
)

Configuration

RTVIObserverParams accepts the following fields:
bot_output_enabled
default:"True"
Indicates if bot output messages should be sent.
bot_llm_enabled
default:"True"
Indicates if the bot’s LLM messages should be sent.
bot_tts_enabled
default:"True"
Indicates if the bot’s TTS messages should be sent.
bot_speaking_enabled
default:"True"
Indicates if the bot’s started/stopped speaking messages should be sent.
bot_audio_level_enabled
default:"False"
Indicates if bot’s audio level messages should be sent.
user_llm_enabled
default:"True"
Indicates if the user’s LLM input messages should be sent.
user_speaking_enabled
default:"True"
Indicates if the user’s started/stopped speaking messages should be sent.
vad_user_speaking_enabled
default:"False"
Indicates if raw VAD user started/stopped speaking messages should be sent. These reflect the VAD signal directly, independent of turn finalization (unlike user_speaking_enabled, which a turn strategy may gate or defer).
user_mute_enabled
default:"True"
Indicates if user mute started/stopped messages (user-mute-started, user-mute-stopped) should be sent.
user_transcription_enabled
default:"True"
Indicates if user’s transcription messages should be sent.
user_audio_level_enabled
default:"False"
Indicates if user’s audio level messages should be sent.
metrics_enabled
default:"True"
Indicates if metrics messages should be sent.
system_logs_enabled
default:"False"
Indicates if system logs should be sent.
skip_aggregator_types
default:"None"
List of aggregation types to skip sending as tts/output messages.
If using this to avoid sending secure information, be sure to also disable bot_llm_enabled to avoid leaking through LLM messages.
bot_output_transforms
default:"None"
A list of tuples to transform text before sending it to the client. Each tuple should be of the form (aggregation_type, transform_function).The preferred transform signature is:
async def my_transform(
    text: str,
    agg_type: AggregationType | str,
    accumulated_text: str | None = None,
    remaining_text: str | None = None,
) -> BotOutputTransformResult:
    ...
When accumulated_text and remaining_text are None, the transform is being called for the full segment text (bot-output message). When they are provided, the transform is being called for a progress event and must return a BotOutputTransformResult with accumulated_text and remaining_text set, enabling word-level transforms on the client side.The legacy 2-parameter signature (text, agg_type) -> str is deprecated but still works. Transforms using it will emit a DeprecationWarning at registration time.Example:
from pipecat.processors.frameworks.rtvi import BotOutputTransformResult

async def redact_sensitive(
        text: str,
        agg_type: str,
        accumulated_text: str | None = None,
        remaining_text: str | None = None,
    ) -> BotOutputTransformResult:
        transformed = "XXXX-XXXX-XXXX-" + text[-4:]
        if accumulated_text is not None and remaining_text is not None:
            ratio = len(accumulated_text) / max(len(text), 1)
            split = int(ratio * len(transformed))
            return BotOutputTransformResult(
                text=transformed,
                accumulated_text=transformed[:split],
                remaining_text=transformed[split:],
            )
        return BotOutputTransformResult(text=transformed)

bot_output_transforms = [
    ("credit_card", redact_sensitive),  # Only for 'credit_card' type
    ("*", lambda text, agg_type: text.upper()),  # For all types, make uppercase
]

observer = RTVIObserver(
    rtvi,
    params=RTVIObserverParams(bot_output_transforms=bot_output_transforms),
)
audio_level_period_secs
default:"0.15"
How often audio levels should be sent if enabled.
function_call_report_level
Dict[str, RTVIFunctionCallReportLevel]
default:"{\"*\": RTVIFunctionCallReportLevel.NONE}"
Controls what information is exposed in function call lifecycle events (llm-function-call-started, llm-function-call-in-progress, llm-function-call-stopped). Maps function names to security levels, where "*" sets the default for unlisted functions.Levels:
  • DISABLED: No events emitted for this function
  • NONE: Events with tool_call_id only (most secure when events are needed)
  • NAME: Adds function name to events
  • FULL: Adds function name, arguments, and results
from pipecat.processors.frameworks.rtvi import (
    RTVIFunctionCallReportLevel,
    RTVIObserverParams,
)

worker = PipelineWorker(
    pipeline,
    rtvi_observer_params=RTVIObserverParams(
        function_call_report_level={
            "*": RTVIFunctionCallReportLevel.NONE,
            "get_weather": RTVIFunctionCallReportLevel.FULL,
        },
    ),
)

Frame Translation

The observer maps Pipecat’s internal frames to RTVI protocol messages:
Pipeline FrameRTVI Message
Speech Events
UserStartedSpeakingFrameRTVIUserStartedSpeakingMessage
UserStoppedSpeakingFrameRTVIUserStoppedSpeakingMessage
VADUserStartedSpeakingFrameVADUserStartedSpeakingMessage
VADUserStoppedSpeakingFrameVADUserStoppedSpeakingMessage
BotStartedSpeakingFrameRTVIBotStartedSpeakingMessage
BotStoppedSpeakingFrameRTVIBotStoppedSpeakingMessage
User Mute
UserMuteStartedFrameRTVIUserMuteStartedMessage
UserMuteStoppedFrameRTVIUserMuteStoppedMessage
Transcription
TranscriptionFrameRTVIUserTranscriptionMessage(final=true)
InterimTranscriptionFrameRTVIUserTranscriptionMessage(final=false)
Bot Output
AggregatedTextFrameRTVIBotOutputMessage
LLM Processing
LLMFullResponseStartFrameRTVIBotLLMStartedMessage
LLMFullResponseEndFrameRTVIBotLLMStoppedMessage
LLMTextFrameRTVIBotLLMTextMessage
TTS Events
TTSStartedFrameRTVIBotTTSStartedMessage
TTSStoppedFrameRTVIBotTTSStoppedMessage
TTSTextFrameRTVIBotTTSTextMessage
Function Calls
FunctionCallsStartedFramellm-function-call-started
FunctionCallInProgressFramellm-function-call-in-progress
FunctionCallResultFramellm-function-call-stopped
Context/Metrics
LLMContextFrameRTVIUserLLMTextMessage
MetricsFrameRTVIMetricsMessage
RTVIServerMessageFrameRTVIServerMessage
User Interface
RTVIUICommandFrameUICommandMessage (ui-command)
RTVIUIJobGroupFrameUIJobGroupMessage (ui-job-group)
The User Interface frames are emitted by PipelineWorker when a UIWorker sends a command or runs a job group; the observer translates them into the outbound ui-command / ui-job-group messages the client renders.