NLP Preprocessing
Before intent detection, every utterance is preprocessed:| Step | Description |
|---|---|
| Tokenization | Split utterance into sentences, then words. TreeBank Tokenizer for English. |
| toLower() | Convert to lowercase (not for German). ML and KG engines only. |
| Stop word removal | Remove low-signal words. Language-specific list; optional, disabled by default. |
| Stemming | Reduce to stem (e.g., “Running” → “run”). Output may not be a real word. |
| Lemmatization | Reduce to base dictionary form (e.g., “housing” → “house”). |
| N-grams | Combine co-occurring words for context (e.g., “New York City” as a tri-gram). |
Scoping Your Assistant
Before training, define your assistant’s scope:- Define the problem — what the assistant must accomplish; align with BAs and developers.
- List intents — identify key results for each; focus on user needs.
- Sketch example conversations — user utterances and responses; include edge cases and follow-ups.
- Brainstorm alternate utterances — include idioms and slang for each intent.
Choosing an Engine
| Engine | Best For |
|---|---|
| ML | Large corpus; diverse utterances; flexible and auto-learning. Recommended as the primary training method. |
| KG | Query-type intents; document-based answers; many intents with limited alternate utterances. |
| FM | Idiomatic/command-like sentences; acceptable tolerance for false positives. |
NLP Configuration in the Platform
Go to Automation > Natural Language:| Section | Purpose |
|---|---|
| Training | Add ML utterances, synonyms, concepts, patterns. |
| Engine Tuning | Set recognition confidence levels, thresholds. |
| Advanced Settings | Auto-training settings, negative intent patterns. |
- Improved Traits Engine accuracy.
- Transformer and KAEN models for English; Transformer for other languages.
- Enables Zero-shot and Few-shot ML models.
- As of January 21, 2024, all existing VAs are on Version 3.