Back to Prompts LibraryStreaming responses enable real-time, incremental LLM output—the model transmits output pieces as they become available instead of waiting for a complete response. This reduces latency and improves interaction for conversational AI, speech-to-text, and content generation tools.
Voice: Supports the Voice Gateway channel for GenAI features including Agent Node, with Deepgram as the supported TTS engine.
Chat: Delivers real-time LLM response streaming for Web/Mobile SDK channels. Agent Node and Prompt Node stream responses token-by-token.
Models: Integrates with OpenAI and Azure OpenAI out of the box. Custom prompts support other streaming-capable LLMs.
Node-level activation: Streaming activates when a streaming prompt is selected at the node level, even if the feature-level prompt doesn’t use streaming.
Note: Agent Node Tool Calling and Streaming
V1 Custom JavaScript Prompts: Supports tool calling and streaming as separate capabilities only—not simultaneously.
V2 Custom JavaScript Prompts: Supports both tool calling and streaming together using the OpenAI/Azure OpenAI response format.
The Platform provides default streaming prompts for Agent Node and Prompt Node for OpenAI models.Go to Generative AI Tools > GenAI Features and select the default-streaming prompt for the required feature.
Output quality variance is minimal (≤2.5%), ensuring task reliability.
These benchmarks were conducted under specific scenarios. Performance varies by environment. Conduct your own testing before enabling streaming in production.
Requires the complete response, which conflicts with incremental delivery.
No guardrails
Content moderation requires full-context evaluation, incompatible with token-by-token streaming.
Voice compatibility
Depends on TTS engine support for bi-directional streaming (e.g., Deepgram).
No BotKit interception
Real-time delivery is incompatible with message interception.
Streaming quality depends heavily on prompt design. LLMs are subject to hallucination—ensure prompts are accurate and aligned with desired output before using in production.