> ## Documentation Index
> Fetch the complete documentation index at: https://koreai.mintlify.app/llms.txt
> Use this file to discover all available pages before exploring further.

# NLP Training Overview

<Badge icon="arrow-left" color="gray">[Back to NLP Topics](/ai-for-service/automation/natural-language/nlp-topics)</Badge>

NLP training ensures your assistant accurately identifies user intent. The platform uses multiple engines—ML, FM, KG, Traits, and Ranking & Resolver—each suited to different scenarios.

***

## NLP Preprocessing

Before intent detection, every utterance is preprocessed:

| Step                  | Description                                                                     |
| --------------------- | ------------------------------------------------------------------------------- |
| **Tokenization**      | Split utterance into sentences, then words. TreeBank Tokenizer for English.     |
| **toLower()**         | Convert to lowercase (not for German). ML and KG engines only.                  |
| **Stop word removal** | Remove low-signal words. Language-specific list; optional, disabled by default. |
| **Stemming**          | Reduce to stem (e.g., "Running" → "run"). Output may not be a real word.        |
| **Lemmatization**     | Reduce to base dictionary form (e.g., "housing" → "house").                     |
| **N-grams**           | Combine co-occurring words for context (e.g., "New York City" as a tri-gram).   |

***

## Scoping Your Assistant

Before training, define your assistant's scope:

1. **Define the problem** — what the assistant must accomplish; align with BAs and developers.
2. **List intents** — identify key results for each; focus on user needs.
3. **Sketch example conversations** — user utterances and responses; include edge cases and follow-ups.
4. **Brainstorm alternate utterances** — include idioms and slang for each intent.

***

## Choosing an Engine

| Engine | Best For                                                                                                  |
| ------ | --------------------------------------------------------------------------------------------------------- |
| **ML** | Large corpus; diverse utterances; flexible and auto-learning. Recommended as the primary training method. |
| **KG** | Query-type intents; document-based answers; many intents with limited alternate utterances.               |
| **FM** | Idiomatic/command-like sentences; acceptable tolerance for false positives.                               |

***

## NLP Configuration in the Platform

Go to **Automation > Natural Language**:

| Section               | Purpose                                           |
| --------------------- | ------------------------------------------------- |
| **Training**          | Add ML utterances, synonyms, concepts, patterns.  |
| **Engine Tuning**     | Set recognition confidence levels, thresholds.    |
| **Advanced Settings** | Auto-training settings, negative intent patterns. |

**NLP Version 3** (default for new VAs from v10.0):

* Improved Traits Engine accuracy.
* Transformer and KAEN models for English; Transformer for other languages.
* Enables Zero-shot and Few-shot ML models.
* As of January 21, 2024, all existing VAs are on Version 3.

For per-engine training and configuration:

* [Machine Learning Engine](/ai-for-service/automation/natural-language/training/machine-learning-engine#machine-learning-engine)
* [Fundamental Meaning Engine](/ai-for-service/automation/natural-language/training/fundamental-meaning#fundamental-meaning-engine)
* [Traits](/ai-for-service/automation/natural-language/training/traits#traits)
* [Ranking and Resolver](/ai-for-service/automation/natural-language/training/ranking-and-resolver#ranking-and-resolver-engine)
* [Knowledge Graph Training](/ai-for-service/automation/knowledge-ai/knowledge-graph-training#knowledge-graph-training)
