Overview
Context summarization automatically compresses older conversation history when token or message limits are reached. It is configured viaLLMAutoContextSummarizationConfig (auto-trigger thresholds) and LLMContextSummaryConfig (summary generation params), and managed by LLMContextSummarizer.
For a walkthrough of how to enable and customize context summarization, see the Context Summarization guide.
LLMAutoContextSummarizationConfig
Maximum context size in estimated tokens before triggering summarization.
Tokens are estimated using the heuristic of 1 token per 4 characters.
Maximum number of new messages before triggering summarization, even if the
token limit has not been reached.
Configuration for how summaries are generated. See below.
LLMContextSummaryConfig
summary_config inside LLMAutoContextSummarizationConfig, or passed directly to LLMSummarizeContextFrame for on-demand summarization.
Target token count for the generated summary. Passed to the LLM as
max_tokens. Auto-adjusted to 80% of max_context_tokens if it exceeds that
value.Number of recent messages to preserve uncompressed after each summarization.
Custom system prompt for the LLM when generating summaries. When
None, uses
a built-in default prompt.Template for formatting the summary when injected into context. Must contain
{summary} as a placeholder. Allows wrapping summaries in custom delimiters
(e.g., XML tags) so system prompts can distinguish summaries from live
conversation.Dedicated LLM service for generating summaries. When set, summarization
requests are sent to this service instead of the pipeline’s primary LLM.
Useful for routing summarization to a cheaper or faster model. When
None,
the pipeline LLM handles summarization.Maximum time in seconds to wait for the LLM to generate a summary. If
exceeded, summarization is aborted and future summarization attempts are
unblocked. Set to
None to disable the timeout.LLMSummarizeContextFrame
Per-request override for summary generation settings (prompt, token budget,
messages to keep). When
None, the summarizer’s default
LLMContextSummaryConfig is used.enable_auto_context_summarization is False — the summarizer is always created internally to handle manually pushed frames.
If a summarization is already in progress, the manual request is ignored.
LLMContextSummarizer
LLMAssistantAggregator when enable_auto_context_summarization=True. Access it via assistant_aggregator._summarizer.
Event Handlers
| Event | Parameters | Description |
|---|---|---|
on_summary_applied | event: SummaryAppliedEvent | Emitted after a summary has been successfully applied to the context. |
on_summary_applied
SummaryAppliedEvent
Number of messages in context before summarization.
Number of messages in context after summarization.
Number of messages that were compressed into the summary.
Number of messages preserved uncompressed (system message plus recent
messages).
Deprecated: LLMContextSummarizationConfig
max_context_tokens, max_unsummarized_messages) into LLMAutoContextSummarizationConfig and summary generation params into LLMContextSummaryConfig:
LLMAssistantAggregatorParams fields were renamed:
enable_context_summarization→enable_auto_context_summarizationcontext_summarization_config→auto_context_summarization_config
DeprecationWarning.