Overview
In some scenarios, turn detection happens externally, either through a dedicated processor or an external service. Pipecat providesExternalUserTurnStrategies, a user turn strategy that defers turn handling to these external sources.
External turn management might be needed when:
- Multiple context aggregators: Parallel pipelines with multiple LLMs need a single, shared source of turn events
- External services with turn detection: Services like Deepgram Flux or Speechmatics provide their own turn detection
ExternalUserTurnStrategies to defer turn handling to the external source.
External Services
Some speech-to-text services provide built-in turn detection. When using these services, configure your context aggregator withExternalUserTurnStrategies to let the service handle turn management:
When the STT service is driving turn detection, a VAD in the transport (such
as
SileroVADAnalyzer) is optional. It’s not needed for core turn management,
but including one enables useful STT metrics. Drop it if you don’t care about
those metrics.Realtime (Speech-to-Speech) Services
Realtime (speech-to-speech) LLM services — OpenAI Realtime, Azure Realtime, Grok/xAI Realtime, Inworld, Gemini Live, AWS Nova Sonic, and Ultravox — consume user audio directly and manage their own conversation flow. For these, passrealtime_service_mode=True to LLMContextAggregatorPair:
realtime_service_mode=True adapts the pair’s behavior in three ways:
- Context writes are decoupled from
UserStoppedSpeakingFrame. Instead, the assistant response start triggers the user message write. This keeps the context correct even when the realtime service provides no turn frames and local turn detection (VAD) is disabled. UserStoppedSpeakingFramecan fire without waiting for transcripts. When local turn detection drives the conversation, this frame triggers the assistant response — letting it fire earlier reduces latency. The pair flipswait_for_transcript=Falseon the stop strategies that support it.- Default turn strategies are replaced with external strategies when the service emits its own turn frames. Services that emit
UserStartedSpeakingFrame/UserStoppedSpeakingFrame(OpenAI Realtime, Azure, Grok, Inworld) driveon_user_turn_started/on_user_turn_stoppedfrom those server-emitted frames. Services that don’t emit them — either because they never do (Gemini Live, Nova Sonic, Ultravox) or because server-side turn detection was disabled at runtime (e.g. OpenAI Realtime withturn_detection=False) — keep the defaults so locally-driven turn detection (e.g. local VAD) can fire the events. Passing customuser_turn_strategiesopts out of this swap.
In realtime mode, subscribe to
on_user_turn_message_added
to receive the finalized user message. on_user_turn_stopped still fires but
its UserTurnStoppedMessage.content is None, since the message isn’t
finalized until the assistant response starts.UserTurnProcessor
UserTurnProcessor is a frame processor for managing user turn lifecycle when you need a single source of turn events shared across multiple context aggregators. It emits UserStartedSpeakingFrame and UserStoppedSpeakingFrame frames and handles interruptions.
UserTurnProcessor only manages user turn start and end events. It does not
handle transcription aggregation, that remains the responsibility of the
context aggregators.Constructor Parameters
Configured strategies for starting and stopping user turns. See User Turn
Strategies for
available options.
Timeout in seconds to automatically stop a user turn if no stop strategy
triggers.
Timeout in seconds for detecting user idle state. The processor will emit an
on_user_turn_idle event when the user has been idle (not speaking) for this
duration after the bot finishes speaking. Set to 0 to disable idle
detection. See Detecting Idle
Users for details.Event Handlers
UserTurnProcessor provides event handlers for turn lifecycle events:
Usage with Parallel Pipelines
When using parallel pipelines with multiple context aggregators, placeUserTurnProcessor before the parallel pipeline and configure each context aggregator with ExternalUserTurnStrategies:
Related
- User Turn Strategies - Configure turn detection strategies
- Parallel Pipeline - Run multiple pipeline branches concurrently
- Turn Events - Handle turn lifecycle events