Skip to main content
Back to NLP Topics NLP training ensures your assistant accurately identifies user intent. The platform uses multiple engines—ML, FM, KG, Traits, and Ranking & Resolver—each suited to different scenarios.

NLP Preprocessing

Before intent detection, every utterance is preprocessed:
StepDescription
TokenizationSplit utterance into sentences, then words. TreeBank Tokenizer for English.
toLower()Convert to lowercase (not for German). ML and KG engines only.
Stop word removalRemove low-signal words. Language-specific list; optional, disabled by default.
StemmingReduce to stem (e.g., “Running” → “run”). Output may not be a real word.
LemmatizationReduce to base dictionary form (e.g., “housing” → “house”).
N-gramsCombine co-occurring words for context (e.g., “New York City” as a tri-gram).

Scoping Your Assistant

Before training, define your assistant’s scope:
  1. Define the problem — what the assistant must accomplish; align with BAs and developers.
  2. List intents — identify key results for each; focus on user needs.
  3. Sketch example conversations — user utterances and responses; include edge cases and follow-ups.
  4. Brainstorm alternate utterances — include idioms and slang for each intent.

Choosing an Engine

EngineBest For
MLLarge corpus; diverse utterances; flexible and auto-learning. Recommended as the primary training method.
KGQuery-type intents; document-based answers; many intents with limited alternate utterances.
FMIdiomatic/command-like sentences; acceptable tolerance for false positives.

NLP Configuration in the Platform

Go to Automation > Natural Language:
SectionPurpose
TrainingAdd ML utterances, synonyms, concepts, patterns.
Engine TuningSet recognition confidence levels, thresholds.
Advanced SettingsAuto-training settings, negative intent patterns.
NLP Version 3 (default for new VAs from v10.0):
  • Improved Traits Engine accuracy.
  • Transformer and KAEN models for English; Transformer for other languages.
  • Enables Zero-shot and Few-shot ML models.
  • As of January 21, 2024, all existing VAs are on Version 3.
For per-engine training and configuration: