External Turn Management

Overview

In some scenarios, turn detection happens externally, either through a dedicated processor or an external service. Pipecat provides ExternalUserTurnStrategies, a user turn strategy that defers turn handling to these external sources. External turn management might be needed when:

Multiple context aggregators: Parallel pipelines with multiple LLMs need a single, shared source of turn events
External services with turn detection: Services like Deepgram Flux or Speechmatics provide their own turn detection

In both cases, you need to configure your context aggregators with ExternalUserTurnStrategies to defer turn handling to the external source.

External Services

Some speech-to-text services provide built-in turn detection. When using these services, configure your context aggregator with ExternalUserTurnStrategies to let the service handle turn management:

from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies

# Configure aggregator to use external turn strategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=ExternalUserTurnStrategies()
    ),
)

When the STT service is driving turn detection, a VAD in the transport (such as SileroVADAnalyzer) is optional. It’s not needed for core turn management, but including one enables useful STT metrics. Drop it if you don’t care about those metrics.

Realtime (Speech-to-Speech) Services

Realtime (speech-to-speech) LLM services — OpenAI Realtime, Azure Realtime, Grok/xAI Realtime, Inworld, Gemini Live, AWS Nova Sonic, and Ultravox — consume user audio directly and manage their own conversation flow. For these, pass realtime_service_mode=True to LLMContextAggregatorPair:

from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    realtime_service_mode=True,
)

Setting realtime_service_mode=True adapts the pair’s behavior in three ways:

Context writes are decoupled from UserStoppedSpeakingFrame. Instead, the assistant response start triggers the user message write. This keeps the context correct even when the realtime service provides no turn frames and local turn detection (VAD) is disabled.
UserStoppedSpeakingFrame can fire without waiting for transcripts. When local turn detection drives the conversation, this frame triggers the assistant response — letting it fire earlier reduces latency. The pair flips wait_for_transcript=False on the stop strategies that support it.
Default turn strategies are replaced with external strategies when the service emits its own turn frames. Services that emit UserStartedSpeakingFrame / UserStoppedSpeakingFrame (OpenAI Realtime, Azure, Grok, Inworld) drive on_user_turn_started / on_user_turn_stopped from those server-emitted frames. Services that don’t emit them — either because they never do (Gemini Live, Nova Sonic, Ultravox) or because server-side turn detection was disabled at runtime (e.g. OpenAI Realtime with turn_detection=False) — keep the defaults so locally-driven turn detection (e.g. local VAD) can fire the events. Passing custom user_turn_strategies opts out of this swap.

In realtime mode, subscribe to on_user_turn_message_added to receive the finalized user message. on_user_turn_stopped still fires but its UserTurnStoppedMessage.content is None, since the message isn’t finalized until the assistant response starts.

UserTurnProcessor

UserTurnProcessor is a frame processor for managing user turn lifecycle when you need a single source of turn events shared across multiple context aggregators. It emits UserStartedSpeakingFrame and UserStoppedSpeakingFrame frames and handles interruptions.

UserTurnProcessor only manages user turn start and end events. It does not handle transcription aggregation, that remains the responsibility of the context aggregators.

Constructor Parameters

user_turn_strategies

UserTurnStrategies

default:"UserTurnStrategies()"

Configured strategies for starting and stopping user turns. See User Turn Strategies for available options.

user_turn_stop_timeout

float

default:"5.0"

Timeout in seconds to automatically stop a user turn if no stop strategy triggers.

user_idle_timeout

float

default:"0"

Timeout in seconds for detecting user idle state. The processor will emit an on_user_turn_idle event when the user has been idle (not speaking) for this duration after the bot finishes speaking. Set to 0 to disable idle detection. See Detecting Idle Users for details.

Event Handlers

UserTurnProcessor provides event handlers for turn lifecycle events:

@user_turn_processor.event_handler("on_user_turn_started")
async def on_user_turn_started(processor, strategy):
    # Called when a user turn starts
    pass

@user_turn_processor.event_handler("on_user_turn_stopped")
async def on_user_turn_stopped(processor, strategy):
    # Called when a user turn stops
    pass

@user_turn_processor.event_handler("on_user_turn_stop_timeout")
async def on_user_turn_stop_timeout(processor):
    # Called if no stop strategy triggers before timeout
    pass

Usage with Parallel Pipelines

When using parallel pipelines with multiple context aggregators, place UserTurnProcessor before the parallel pipeline and configure each context aggregator with ExternalUserTurnStrategies:

from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_processor import UserTurnProcessor
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies, UserTurnStrategies

# Create the external user turn processor with your preferred strategies
user_turn_processor = UserTurnProcessor(
    user_turn_strategies=UserTurnStrategies(
        stop=[
            TurnAnalyzerUserTurnStopStrategy(
                turn_analyzer=LocalSmartTurnAnalyzerV3()
            )
        ]
    ),
)

# Create contexts for each LLM
openai_context = LLMContext(openai_messages)
groq_context = LLMContext(groq_messages)

# Configure aggregators to use external turn strategies
openai_context_aggregator = LLMContextAggregatorPair(
    openai_context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=ExternalUserTurnStrategies()
    ),
)
groq_context_aggregator = LLMContextAggregatorPair(
    groq_context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=ExternalUserTurnStrategies()
    ),
)

# Build the pipeline with UserTurnProcessor before the parallel branches
pipeline = Pipeline(
    [
        transport.input(),
        stt,
        user_turn_processor,  # Handles turn management for all branches
        ParallelPipeline(
            [
                openai_context_aggregator.user(),
                openai_llm,
                transport.output(),
                openai_context_aggregator.assistant(),
            ],
            [
                groq_context_aggregator.user(),
                groq_llm,
                groq_context_aggregator.assistant(),
            ],
        ),
    ]
)

User Turn Strategies - Configure turn detection strategies
Parallel Pipeline - Run multiple pipeline branches concurrently
Turn Events - Handle turn lifecycle events

​Overview

​External Services

​Realtime (Speech-to-Speech) Services

​UserTurnProcessor

​Constructor Parameters

​Event Handlers

​Usage with Parallel Pipelines

​Related

Overview

External Services

Realtime (Speech-to-Speech) Services

UserTurnProcessor

Constructor Parameters

Event Handlers

Usage with Parallel Pipelines

Related