> ## Documentation Index
> Fetch the complete documentation index at: https://koreai.mintlify.app/llms.txt
> Use this file to discover all available pages before exploring further.

# Open-Source Models

Deploy and manage open-source models on Agent Platform's managed infrastructure.

***

## Overview

Agent Platform supports 30+ curated open-source models and allows importing any text generation model from Hugging Face. Models are deployed on Kore-hosted infrastructure with optional optimization for improved performance.

**Model Sources**:

| Source          | Description                                     |
| --------------- | ----------------------------------------------- |
| Platform-Hosted | Curated list of 30+ models ready for deployment |
| Hugging Face    | Import any compatible text generation model     |
| Local Import    | Upload model files (.zip) from your machine     |

<Note>Open-source models do not support tool calling and cannot be used with Agentic Apps. Use external commercial models for agentic workflows.</Note>

For the complete list of supported models, see [Supported Models](/agent-platform/models/supported-models).

## View Models & Deployments

* View Models: Go to **Models** → **Open-source models** and click a model to view its deployments.

* Deployments List: Each model can have multiple deployments. Select a deployment to access its settings.

## Deploy a Platform-Hosted Model

1. Go to **Models** → **Open-source models** → **Deploy a model**
2. Select **Platform-hosted** and choose a model from the dropdown
3. Add a **Description** and **Tags**, then click **Next**
4. Select an [optimization technique](#optimization-techniques) (optional)
5. Configure [deployment parameters](#deployment-parameters)
6. Select **Hardware** for deployment
7. Review and accept terms, then click **Deploy**

After deployment, the model is available across Agent Platform and via API endpoint.

## Deploy from Hugging Face

Import and deploy any compatible model from Hugging Face.

<Note>Models must be compatible with Transformers library version ≤4.43.1.</Note>

1. Go to **Models** → **Open-source models** → **Deploy a model**
2. Select **Hugging Face**
3. Enter **Deployment name** and **Description**
4. Select your **Hugging Face connection** (or use public mode)
5. Enter the **Hugging Face model name**
6. Configure [deployment parameters](#deployment-parameters)
7. Select **Hardware** and click **Deploy**

To connect your Hugging Face account, see [Enable Hugging Face](/agent-platform/administration/integrations#hugging-face).

## Import a Local Model

Upload model files from your local machine for deployment.

### Import a Base Model

1. Go to **Models** → **Open-source models** → **Import model**
2. Select the **Base Model** tab
3. Upload your model file (.zip format)
4. Click **Import**

### Import an Adapter Model

Adapter models require a compatible base model.

1. Go to **Models** → **Open-source models** → **Import model**
2. Select the **Adapter Model** tab
3. Choose a **Base model** from the Platform-hosted list
4. Upload your adapter model file (.zip format)
5. Click **Import**

### Import Status

| Status          | Description                                   |
| --------------- | --------------------------------------------- |
| Importing       | File is uploading                             |
| Validating      | File is being validated                       |
| Import Failed   | Error occurred (view details to troubleshoot) |
| Ready to Deploy | Model is ready for deployment                 |

After import, the model appears in the Open-source models list and can be deployed like any other model.

## Optimization Techniques

Optimize Platform-hosted models before deployment for improved performance.

| Technique             | Best For                                    | Quantization  |
| --------------------- | ------------------------------------------- | ------------- |
| **Skip Optimization** | Maximum compatibility                       | None          |
| **CTranslate2 (CT2)** | Small-medium models, low latency, CPU/GPU   | int8\_float16 |
| **vLLM**              | Large models, high throughput, GPU clusters | AWQ (4-bit)   |

### CTranslate2

Efficient inference for translation and NLP tasks with optimized kernels.

**Advantages**: CPU and GPU support, int8 quantization, multi-threading, PyTorch/TensorFlow compatibility.

### vLLM

High-performance inference for very large language models.

**Advantages**: Advanced memory management, model/data parallelism, mixed-precision, AWQ quantization.

### When to Choose

| Criteria               | CTranslate2     | vLLM                           |
| ---------------------- | --------------- | ------------------------------ |
| Model size             | Small to medium | Large (billions of parameters) |
| Latency priority       | ✓               |                                |
| Throughput priority    |                 | ✓                              |
| Resource-constrained   | ✓               |                                |
| Distributed deployment |                 | ✓                              |

## Deployment Parameters

Configure these parameters when deploying a model:

### Inference Parameters

| Parameter            | Description                          |
| -------------------- | ------------------------------------ |
| Temperature          | Sampling temperature (randomness)    |
| Maximum Length       | Max tokens to generate               |
| Top P                | Nucleus sampling probability mass    |
| Top K                | Number of highest probability tokens |
| Stop Sequences       | Strings that stop generation         |
| Inference Batch Size | Batch size for concurrent requests   |

### Scaling Parameters

| Parameter        | Description                         |
| ---------------- | ----------------------------------- |
| Min Replicas     | Minimum model replicas deployed     |
| Max Replicas     | Maximum replicas for auto-scaling   |
| Scale-up Delay   | Seconds to wait before scaling up   |
| Scale-down Delay | Seconds to wait before scaling down |

### Hardware

Select GPU hardware based on model size and performance requirements.

## Manage Deployed Models

After deployment, access model settings via the three-dot menu or by clicking the deployment.

### Model Endpoint

View the generated API endpoint in three formats (cURL, Python, Node.js). Use this to invoke your model externally.

**Structured Output**: Kore-hosted models support JSON schema responses via the `response_format` parameter (v2/chat/completions endpoint only). See [Supported Models for Structured Output](/agent-platform/models/supported-models#structured-output).

### Deployment History

Track all deployment versions with:

* Deployment name and timestamp
* Duration and status
* Who deployed and when
* Un-deployment details (if applicable)

Version numbers auto-increment (for example, Model\_v1, Model\_v2).

### API Keys

Generate API keys for external access to your deployed model.

1. Go to **API Keys** tab
2. Click **Create a new API key**
3. Enter a name and click **Generate key**
4. Copy the key (shown only once)

API keys are scoped to the specific deployment.

### Configurations

| Setting          | Description                               |
| ---------------- | ----------------------------------------- |
| Description      | Edit model description                    |
| Tags             | Add searchable tags                       |
| Endpoint Timeout | 30–180 seconds (default: 60)              |
| Undeploy         | Stop the model (removes active instances) |
| Delete           | Remove undeployed model and all data      |

<Note> Timeout precedence: Tool > Node timeout > Model timeout.</Note>

## Re-deploy a Model

To update parameters or hardware after initial deployment:

1. Select the deployed model
2. Click **Deploy model**
3. Modify settings as needed
4. Click **Deploy**

A new version is automatically created.

## Limitations

* Open-source models do **not** support tool calling
* Cannot be used with Agentic Apps or Agents
* Hugging Face models require Transformers ≤4.43.1
* Structured output requires v2/chat/completions endpoint and vLLM/no optimization
