RTVIObserver - Pipecat

The RTVIObserver translates Pipecat’s internal pipeline events into standardized RTVI protocol messages. It monitors frame flow through the pipeline and generates corresponding client messages based on event types.

Purpose

The RTVIObserver primarily serves to convert internal pipeline frames into to client-compatible RTVI messages. It is required for any application using RTVI as the client protocol to ensure proper communication of events such as speech start/stop, user transcript, bot output, metrics, and server messages.

Automatic Setup

RTVIObserver is automatically created and attached when you create a PipelineWorker. No manual setup is required for standard usage. To customize the observer’s behavior, pass RTVIObserverParams to the worker:

from pipecat.processors.frameworks.rtvi import RTVIObserverParams

worker = PipelineWorker(
    pipeline,
    rtvi_observer_params=RTVIObserverParams(
        bot_llm_enabled=False,
        metrics_enabled=False,
    ),
)

Configuration

RTVIObserverParams accepts the following fields:

bot_output_enabled

default:"True"

Indicates if bot output messages should be sent.

bot_llm_enabled

default:"True"

Indicates if the bot’s LLM messages should be sent.

bot_tts_enabled

default:"True"

Indicates if the bot’s TTS messages should be sent.

bot_speaking_enabled

default:"True"

Indicates if the bot’s started/stopped speaking messages should be sent.

bot_audio_level_enabled

default:"False"

Indicates if bot’s audio level messages should be sent.

user_llm_enabled

default:"True"

Indicates if the user’s LLM input messages should be sent.

user_speaking_enabled

default:"True"

Indicates if the user’s started/stopped speaking messages should be sent.

vad_user_speaking_enabled

default:"False"

Indicates if raw VAD user started/stopped speaking messages should be sent. These reflect the VAD signal directly, independent of turn finalization (unlike user_speaking_enabled, which a turn strategy may gate or defer).

user_mute_enabled

default:"True"

Indicates if user mute started/stopped messages (user-mute-started, user-mute-stopped) should be sent.

user_transcription_enabled

default:"True"

Indicates if user’s transcription messages should be sent.

user_audio_level_enabled

default:"False"

Indicates if user’s audio level messages should be sent.

metrics_enabled

default:"True"

Indicates if metrics messages should be sent.

system_logs_enabled

default:"False"

Indicates if system logs should be sent.

skip_aggregator_types

default:"None"

List of aggregation types to skip sending as tts/output messages.

If using this to avoid sending secure information, be sure to also disable bot_llm_enabled to avoid leaking through LLM messages.

bot_output_transforms

default:"None"

A list of tuples to transform text before sending it to the client. Each tuple should be of the form (aggregation_type, transform_function).The preferred transform signature is:

async def my_transform(
    text: str,
    agg_type: AggregationType | str,
    accumulated_text: str | None = None,
    remaining_text: str | None = None,
) -> BotOutputTransformResult:
    ...

When accumulated_text and remaining_text are None, the transform is being called for the full segment text (bot-output message). When they are provided, the transform is being called for a progress event and must return a BotOutputTransformResult with accumulated_text and remaining_text set, enabling word-level transforms on the client side.The legacy 2-parameter signature (text, agg_type) -> str is deprecated but still works. Transforms using it will emit a DeprecationWarning at registration time.Example:

from pipecat.processors.frameworks.rtvi import BotOutputTransformResult

async def redact_sensitive(
        text: str,
        agg_type: str,
        accumulated_text: str | None = None,
        remaining_text: str | None = None,
    ) -> BotOutputTransformResult:
        transformed = "XXXX-XXXX-XXXX-" + text[-4:]
        if accumulated_text is not None and remaining_text is not None:
            ratio = len(accumulated_text) / max(len(text), 1)
            split = int(ratio * len(transformed))
            return BotOutputTransformResult(
                text=transformed,
                accumulated_text=transformed[:split],
                remaining_text=transformed[split:],
            )
        return BotOutputTransformResult(text=transformed)

bot_output_transforms = [
    ("credit_card", redact_sensitive),  # Only for 'credit_card' type
    ("*", lambda text, agg_type: text.upper()),  # For all types, make uppercase
]

observer = RTVIObserver(
    rtvi,
    params=RTVIObserverParams(bot_output_transforms=bot_output_transforms),
)

audio_level_period_secs

default:"0.15"

How often audio levels should be sent if enabled.

function_call_report_level

Dict[str, RTVIFunctionCallReportLevel]

default:"{\"*\": RTVIFunctionCallReportLevel.NONE}"

Controls what information is exposed in function call lifecycle events (llm-function-call-started, llm-function-call-in-progress, llm-function-call-stopped). Maps function names to security levels, where "*" sets the default for unlisted functions.Levels:

DISABLED: No events emitted for this function
NONE: Events with tool_call_id only (most secure when events are needed)
NAME: Adds function name to events
FULL: Adds function name, arguments, and results

from pipecat.processors.frameworks.rtvi import (
    RTVIFunctionCallReportLevel,
    RTVIObserverParams,
)

worker = PipelineWorker(
    pipeline,
    rtvi_observer_params=RTVIObserverParams(
        function_call_report_level={
            "*": RTVIFunctionCallReportLevel.NONE,
            "get_weather": RTVIFunctionCallReportLevel.FULL,
        },
    ),
)

Frame Translation

The observer maps Pipecat’s internal frames to RTVI protocol messages:

Pipeline Frame	RTVI Message
Speech Events
`UserStartedSpeakingFrame`	`RTVIUserStartedSpeakingMessage`
`UserStoppedSpeakingFrame`	`RTVIUserStoppedSpeakingMessage`
`VADUserStartedSpeakingFrame`	`VADUserStartedSpeakingMessage`
`VADUserStoppedSpeakingFrame`	`VADUserStoppedSpeakingMessage`
`BotStartedSpeakingFrame`	`RTVIBotStartedSpeakingMessage`
`BotStoppedSpeakingFrame`	`RTVIBotStoppedSpeakingMessage`
User Mute
`UserMuteStartedFrame`	`RTVIUserMuteStartedMessage`
`UserMuteStoppedFrame`	`RTVIUserMuteStoppedMessage`
Transcription
`TranscriptionFrame`	`RTVIUserTranscriptionMessage(final=true)`
`InterimTranscriptionFrame`	`RTVIUserTranscriptionMessage(final=false)`
Bot Output
`AggregatedTextFrame`	`RTVIBotOutputMessage`
LLM Processing
`LLMFullResponseStartFrame`	`RTVIBotLLMStartedMessage`
`LLMFullResponseEndFrame`	`RTVIBotLLMStoppedMessage`
`LLMTextFrame`	`RTVIBotLLMTextMessage`
TTS Events
`TTSStartedFrame`	`RTVIBotTTSStartedMessage`
`TTSStoppedFrame`	`RTVIBotTTSStoppedMessage`
`TTSTextFrame`	`RTVIBotTTSTextMessage`
Function Calls
`FunctionCallsStartedFrame`	`llm-function-call-started`
`FunctionCallInProgressFrame`	`llm-function-call-in-progress`
`FunctionCallResultFrame`	`llm-function-call-stopped`
Context/Metrics
`LLMContextFrame`	`RTVIUserLLMTextMessage`
`MetricsFrame`	`RTVIMetricsMessage`
`RTVIServerMessageFrame`	`RTVIServerMessage`
User Interface
`RTVIUICommandFrame`	`UICommandMessage` (`ui-command`)
`RTVIUIJobGroupFrame`	`UIJobGroupMessage` (`ui-job-group`)

The User Interface frames are emitted by PipelineWorker when a UIWorker sends a command or runs a job group; the observer translates them into the outbound ui-command / ui-job-group messages the client renders.

​Purpose

​Automatic Setup

​Configuration

​Frame Translation

Purpose

Automatic Setup

Configuration

Frame Translation