Voice Interactions
Enable voice-based conversations with your agentic apps.Overview
The platform supports voice interactions through integration with AI for Service, enabling users to speak naturally with your agents. Two modes are available: real-time voice for natural conversations and ASR/TTS for text-based processing with voice I/O.Voice Modes
Real-Time Voice
Natural voice conversations using multimodal language models.- Simultaneous voice input/output
- Natural conversational flow
- Lower latency for back-and-forth
- Requires compatible multimodal model
ASR/TTS (Speech-to-Text/Text-to-Speech)
Hybrid approach where speech is converted to text, processed, and converted back to speech.- Works with any text-based model
- More flexible model choices
- TTS streaming reduces perceived latency
- Better for complex responses
Configuration
Enable Real-Time Voice
- Configure in AI for Service Automation Node
- Enable in Platform’s Agentic App settings
- Select a real-time voice compatible model
Enable ASR/TTS
- Disable real-time voice in AI for Service
- Enable TTS Streaming for progressive delivery
- Configure voice settings
TTS Streaming
Reduce perceived latency by streaming text output progressively:Voice-Specific Considerations
Response Design
Optimize responses for voice:Handling Interruptions
Configure interruption behavior:Multi-Turn Conversations
Maintain context across voice turns:Limitations
Real-Time Voice
- Requires specific multimodal models
- Higher compute costs
- Wait-time experience features don’t apply
ASR/TTS
- Transcription errors possible
- Additional latency from conversion
- May miss voice tone/emotion
Best Practices
Design for Ears, Not Eyes
- Shorter responses work better
- Avoid visual formatting (tables, code blocks)
- Use conversational markers (“First…”, “Next…”)
Handle Voice Errors Gracefully
Test with Real Speech
- Test with various accents
- Try background noise scenarios
- Validate with actual users