Skip to main content
This document covers best practices, how-to instructions, and frequently asked questions for building and managing Search AI solutions.
SectionWhat’s Covered
Best PracticesData ingestion, chunking strategy, LLM selection and token management
How-To TopicsCustom embedding model integration, channel-based response customization
Frequently Asked QuestionsApplication training, multilingual support, user feedback handling
Quick ReferenceChecklists and summary tables for all key topics

How-To Topics

TopicDescription
Integrate Custom Embedding ModelConnect your own embedding model to control how Search AI vectorizes ingested content
Image based Document ExtractionExtract and Search Content from Image-Based Documents
Customizing Model Responses by Channel TypeUpdate Search AI response based on channels of interaction
Batch Processing of Embedding GenerationSend batch request to the embedding models for vector generation
Use userContext in Gen AI promptsLearn to use userContext shared via the Public API in Generative AI Prompts
Use conversation context in SearchUse Conversational Context When Searching via the Public API
Ingest from spreadsheetsHandling content from xlsx and csv files
Enable Answer StreamingConfigure Search AI to respond in real-time

Best Practices

This section provides essential recommendations for building effective Search AI solutions across three critical areas: data ingestion, chunking strategy, and LLM selection.

1. Data Ingestion

The quality of your data directly impacts search performance. Ensure you ingest the right content in the right format.

Supported Data Sources

Search AI accepts content from:
  • Files: PDF, DOCX, PPTX, TXT, XLSX
  • Websites: Web pages and HTML content
  • Connectors: Third-party applications

File Best Practices

Document Quality
  • Use digitally created documents rather than scanned or handwritten files
  • Maintain consistent layouts across pages
  • Ensure documents are unencrypted and password-free
  • Keep content concise and well-structured for better search accuracy
  • Remove unnecessary headers, footers, and metadata
Layout and Formatting
  • Single-column documents work best
  • Multi-column layouts may reduce accuracy and need tuning
  • Use clear section headers and logical organization
  • Avoid switching between single and multi-column formats in the same document
Content Restrictions Avoid these to prevent data loss:
  • Compressed PDFs (can cause distortion)
  • Multi-page tables (hard to process)
  • Scanned or heavily formatted files
  • Inconsistent formatting
Images and Tables Search AI extracts text by default. For better results:
  • Provide text descriptions for key information in images
  • Add contextual summaries before or after tables and images
  • Use meaningful titles for images and tables
  • Update extraction strategies for documents with significant visual content

Website Best Practices

Structure
  • Follow schema.org standards for metadata
  • If not using schema.org, apply consistent heading logic (h1, h2, p tags)
  • Standard HTML tags provide the best results
  • Custom CSS structures may need fine-tuning
Non-Standard Content
  • Override default processing using Document Workbench for custom layouts
  • Define custom extraction rules for non-standard structures

Connector Integration Best Practices

Relevance and Filtering
  • Ingest only relevant data for your use case
  • Use advanced filters to select valuable content
  • Avoid pulling entire datasets to prevent noise
Performance
  • Limit ingestion frequency to avoid system overload
  • Monitor logs and adjust based on performance
  • Use incremental updates instead of full re-ingestion

2. Chunking Strategy

Chunking breaks content into smaller pieces for better search and retrieval. Choose your chunk size based on your specific needs. Default chunk size: 1000 tokens (customize based on your use case)

When to Use Smaller Chunks (300-500 tokens)

  • Precise question answering: When answers are in short text segments
  • Technical documentation: Dense content with tightly packed concepts
  • Multiple topics: Documents covering various subjects requiring targeted retrieval
  • Limited context windows: LLMs with smaller capacity
  • Memory efficiency: Optimizing storage and processing
  • Cost sensitivity: Managing token usage

When to Use Larger Chunks (1000+ tokens)

  • Reasoning tasks: When context and relationships between concepts matter
  • Narrative content: Stories, case studies, or arguments that need coherence
  • Contextual dependency: Information requiring surrounding text
  • Cross-paragraph references: Content with internal references
  • Multi-step procedures: Processes that must be followed in sequence
  • Conceptual understanding: When grasping themes is more important than specific facts

3. LLM Selection and Configuration

Your choice of LLM shapes performance, accuracy, cost, and user experience.

Model Selection

Cost-Sensitive Applications
  • Recommended: GPT-4o mini
  • Best for: High query volume, straightforward retrieval, budget constraints
  • Trade-offs: May struggle with complex reasoning but offers faster responses
Performance-Prioritizing Applications
  • Recommended: GPT-4o or similar high-performance models
  • Best for: Customer-facing apps, complex documents, technical/medical/legal content
  • Trade-offs: Higher costs but better accuracy and coherence
Model Recommendations by Content Type
Content TypeRecommended ModelReason
Simple FAQs, knowledge baseGPT-3.5, GPT-4o miniCost-effective for explicit information
Technical docs, processesGPT-4 TurboBetter handling of technical concepts
Legal, scientific, complexGPT-4oSuperior reasoning for interpretation
Specialized knowledgeFine-tuned custom modelIndustry-specific accuracy

Context Window Considerations

The context window is the amount of text a model can process in one call. This is critical for RAG applications. Matching Window Size to Chunks
  • Small chunks (300-500 tokens): 8k-16k context windows are sufficient
    • Can accommodate 15-40 chunks
    • Examples: GPT-3.5 Turbo (16k)
  • Medium chunks (1k-2k tokens): 16k-32k context windows recommended
    • Can accommodate 8-15 chunks
  • Large chunks (3k-5k tokens): 32k-128k context windows essential
    • Examples: GPT-4o (128k), Claude 3 Opus (200k)

Token Management

Your context window accommodates three parts:
  1. System and User Prompts (500-1000 tokens) — Instructions, format specs, custom domain instructions
  2. Retrieved Chunks (70-90% of total usage) — Varies by chunk size, number of chunks, and search settings
  3. Model Response (500-2000 tokens) — Controlled through output length instructions
Recommended Max Tokens for Chunks by Context Window
Context WindowRecommended Max TokensReasoning
4k2,000-2,500Reserves space for prompts/responses
8k5,000-6,000Balances chunks with prompt space
16k12,000-13,000Maximizes info while preventing overflow
32k25,000-27,000Uses larger windows with safety margin
64k+50,000+Leverages expansive context
Example Token Calculations Example 1: 16k Context Window
  • System prompt: 500 tokens
  • Chunks: 12 × 1000 = 12,000 tokens
  • Response: 1,500 tokens
  • Total: 14,000 tokens ✓ Fits in 16k window
Example 2: Problematic Configuration
  • System prompt: 800 tokens
  • Chunks: 20 × 800 = 16,000 tokens
  • Response: 1,200 tokens
  • Total: 18,000 tokens ✗ Exceeds 16k window → Error

Cost Optimization

ConfigurationApproximate TokensCost per Query (GPT-4o)Weekly Cost (1000 queries)
Conservative (5k chunks)7k input, 1k output$0.28$280
Moderate (10k chunks)12k input, 1.5k output$0.45$450
Expansive (20k chunks)22k input, 2k output$0.75$750
Optimization Tips
  • For verbose responses, reduce “Max tokens for Chunks”
  • For complex prompts, account for their increased token usage
  • Monitor usage patterns and adjust accordingly
  • Use lower temperature settings (0.0-0.3) for factual responses

Custom LLM Implementation

When using custom or third-party LLMs:
  • Search AI doesn’t auto-detect context window limits for custom LLMs
  • Set maximum input token limits manually
  • Configure “Max tokens for chunks” according to context window
  • Test regularly to prevent overflow errors
  • Adjust temperature settings for your use case (lower is better for factual content)

Frequently Asked Questions

TopicWhat’s Covered
Application TrainingManual vs Automatic training, Full vs incremental training
Multilingual SupportLanguage support in Search AI
User Feedback HandlingEnable and analyze user feedback

Application Training

Training prepares ingested content for search by applying configurations, extracting chunks, and generating embeddings. Training is required whenever content or configuration changes.

Training Types

TypeDescriptionScope
Full TrainingComplete training of the applicationAll content, regardless of changes
Incremental TrainingTraining only for changed contentAdditions, deletions, or modifications

Automatic Training

The application automatically trains when new content is ingested through file uploads, web crawls, or connectors.
ScenarioBehavior
Initial IngestionIf the content source is configured for the first time, such as setting up a web crawler for a web domain, the training is initiated automatically, and all ingested content is used to train the application.
Incremental UpdatesWhenever an update to the existing content source is identified, such as through the recrawling of a configured web source or resync with an existing connector, auto-training is initiated for only the updates in the content (i.e., additions, deletions, or modifications).
When auto-training initiates, a banner appears at the top of the application interface.

Manual Training

Use the Train button on the Extract or Vector Configuration page to force retraining. When Manual Training is Required:
ScenarioExamples
Config UpdatesChanges to extraction strategies, vector configuration
Content DeletionRemoving files or content sources
Training Scope Based on Change Type:
Change LocationReprocessing Scope
Before extraction stageReprocesses from extraction and chunking onward (e.g., connector config, schema, extraction strategy)
After extraction stageSkips re-chunking, updates enrichment and vector generation only (e.g., embedding model, embedding fields)
Example: If a new extraction strategy is introduced for a particular document, the application is trained only for that document without affecting the chunks related to other content. If, however, the embedding model is updated or embedding fields are modified, new embeddings are generated for all the content.

Training Logs

View detailed training logs with document-level visibility:
  1. Navigate to the Extract page
  2. Click the dropdown with the Train option
  3. Select View Training Logs
Log Information:
FieldDescription
Training TypeFull or Incremental
Trigger TimeWhen training was initiated
Successful DocsNumber of documents processed successfully
Failed DocsNumber of documents with errors
Overall StatusTraining marked as failed if any doc fails
Click individual records to view details grouped by extraction strategy.

Important Notes

  • Manual chunk edits are overwritten during retraining for affected content only.
  • Manually resynchronizing a connector may require manual training trigger (known issue).

Multilingual Support

Search AI supports multilingual capabilities, enabling users to interact in their preferred language.

Core Capabilities

FeatureDescription
Content ManagementAdd and manage content in multiple languages
Query ProcessingSubmit queries in supported languages
Response GenerationReceive answers in the same language as the query

Key Highlights

  • 100+ Languages Supported for indexing, querying, and answer generation.
  • Works with any language supported by your chosen LLM and vector generation model, using the Text Extraction strategy and Vector Retrieval method.
  • No additional configuration required.

Language Support Requirements

Multilingual support works with any language supported by your configured LLM and vector generation model when using:
  • Text Extraction strategy
  • Vector Retrieval method
No additional configuration is required for basic multilingual support.

Widely Supported Languages

Search AI supports languages commonly handled by advanced LLMs and embedding models like BGE-M3. Refer to your LLM or vector generation model’s official documentation for a comprehensive list.

Language-Specific Extraction Capabilities

Extraction MethodSupported Languages
Text ExtractionAll languages
Layout Aware ExtractionEnglish, Ukrainian
Image-based Document ExtractionEnglish, Spanish, Italian, German, French
Advanced HTML ExtractionEnglish, Ukrainian, German
Markdown ExtractionEnglish, Ukrainian, Spanish, Russian, German, Hungarian, Chinese

Language-Specific Vector Generation Support

Vector generation model support varies by language. Use the following models for optimal performance:
  • English: MPNet, E5, BGE-M3, LaBSE
  • Non-English Languages: BGE-M3 and LaBSE
BGE-M3 supports a wide range of languages. Their training data includes many commonly spoken languages; however, performance may be lower for low-resource or underrepresented languages.

Language-Specific Retrieval Strategy Support

  • English: Vector Retrieval and Hybrid Retrieval
  • Non-English: Vector Retrieval

Supported Answer Generation Models

Answer generation quality depends on the language capabilities of the underlying LLM. Please refer to the official list of supported languages from the LLM provider.

Recommendations

To optimize multilingual performance:
  • Choose the right LLM - Select models with strong support for your target languages. Refer to the official list of languages supported by the LLM.
  • Customize prompts - Create language-specific prompts to improve answer quality and relevance.
  • Test performance - Evaluate different LLMs for your specific use case in your target language.
  • Monitor quality - Regularly assess answer quality across languages and adjust configurations as needed.

Best Practices

  1. Verify model compatibility — Ensure your LLM and embedding model support target languages
  2. Test across languages — Validate answer quality in each supported language
  3. Consider re-ranker selection — Choose based on primary language requirements
  4. Monitor performance — Low-resource languages may have reduced accuracy

User Feedback Handling

The feedback mechanism allows end users to rate response quality, helping evaluate and improve answer delivery.

How Feedback Works

Users express satisfaction through thumbs up/down actions captured via:
  • Web SDK
  • Public API

Enabling Feedback

  1. Navigate to Answer Generation
  2. Enable Feedback Configuration

Capturing Feedback

Via Web SDK: When enabled, thumbs up/down icons appear with each answer in the SDK interface. Via API: Use the Feedback API to capture feedback programmatically.

Viewing Feedback Data

Feedback appears in Analytics > Search AI > Answer Insights. Feedback Display:
ScenarioDisplay
Majority positiveGreen indicator with positive count
Majority negativeRed indicator with negative count
Mixed feedbackHighlights majority sentiment
Example: 20 feedback entries with 16 positive and 4 negative displays as green with count of 16.

Detailed Feedback Analysis

  1. Click a query in Answer Insights to view the Answer Summary page
  2. See all answers users received for that query with associated feedback
  3. Click View Details for any answer to see user comments

Implementation Notes

When using SearchAINode:
  • Ensure searchRequestId is included in the channel response
  • Automatic when SearchAINode response is presented directly
  • Must be explicitly included if response is saved to context and rendered with a custom template. Learn More.

Feedback vs. Feedback Surveys

FeaturePurpose
Search AI FeedbackCaptures answer relevance and accuracy ratings
Platform Feedback SurveysGeneral survey capabilities in AI for Service platform
These are separate mechanisms — the Search AI feedback mechanism is specifically designed for answer quality evaluation.

Quick Reference

Best Practices Checklist

Data Ingestion
  • Use digitally created, unencrypted documents
  • Maintain consistent single-column layouts
  • Add text descriptions for images and tables
  • Use filters to ingest only relevant data
  • Implement incremental updates
Chunking
  • Choose chunk size based on content type and use case
  • Use 300-500 tokens for precise retrieval
  • Use 1000+ tokens for complex narratives
LLM Configuration
  • Select model based on budget vs. performance needs
  • Ensure context window matches your chunk strategy
  • Configure “Max tokens for Chunks” appropriately
  • Monitor token usage and costs
  • Test configurations to avoid context overflow

Training Summary

QuestionAnswer
When does auto-training occur?On content ingestion (uploads, crawls, connector syncs)
When is manual training needed?Config changes, content deletion
Where to trigger manual training?Extract or Vector Configuration page
Where to view training logs?Extract page > Train dropdown > View Training Logs

Multilingual Summary

QuestionAnswer
What languages are supported?Any language supported by configured LLM and embedding model
Is configuration required?No additional setup for basic support
Which re-ranker for multilingual?BGE Re-Ranker (bge-reranker-v2-m3)
Which re-ranker for English only?Cross Encoder (ms-marco-MiniLM)

Feedback Summary

QuestionAnswer
How to enable feedback?Configuration > Answer Generation > Enable Feedback Configuration
Where to view feedback?Analytics > Search AI > Answer Insights
How to capture via API?Use the Feedback API
What does feedback show?Thumbs up/down counts with majority sentiment highlighted