Open-Source Models

Deploy and manage open-source models on Agent Platform’s managed infrastructure.

Overview

Agent Platform supports 30+ curated open-source models and allows importing any text generation model from Hugging Face. Models are deployed on Kore-hosted infrastructure with optional optimization for improved performance. Model Sources:

Source	Description
Platform-Hosted	Curated list of 30+ models ready for deployment
Hugging Face	Import any compatible text generation model
Local Import	Upload model files (.zip) from your machine

Open-source models do not support tool calling and cannot be used with Agentic Apps. Use external commercial models for agentic workflows.

For the complete list of supported models, see Supported Models.

View Models & Deployments

View Models: Go to Models → Open-source models and click a model to view its deployments.
Deployments List: Each model can have multiple deployments. Select a deployment to access its settings.

Deploy a Platform-Hosted Model

Go to Models → Open-source models → Deploy a model
Select Platform-hosted and choose a model from the dropdown
Add a Description and Tags, then click Next
Select an optimization technique (optional)
Configure deployment parameters
Select Hardware for deployment
Review and accept terms, then click Deploy

After deployment, the model is available across Agent Platform and via API endpoint.

Deploy from Hugging Face

Import and deploy any compatible model from Hugging Face.

Note: Models must be compatible with Transformers library version ≤4.43.1.

Go to Models → Open-source models → Deploy a model
Select Hugging Face
Enter Deployment name and Description
Select your Hugging Face connection (or use public mode)
Enter the Hugging Face model name
Configure deployment parameters
Select Hardware and click Deploy

To connect your Hugging Face account, see Enable Hugging Face.

Import a Local Model

Upload model files from your local machine for deployment.

Import a Base Model

Go to Models → Open-source models → Import model
Select the Base Model tab
Upload your model file (.zip format)
Click Import

Import an Adapter Model

Adapter models require a compatible base model.

Go to Models → Open-source models → Import model
Select the Adapter Model tab
Choose a Base model from the Platform-hosted list
Upload your adapter model file (.zip format)
Click Import

Import Status

Status	Description
Importing	File is uploading
Validating	File is being validated
Import Failed	Error occurred (view details to troubleshoot)
Ready to Deploy	Model is ready for deployment

After import, the model appears in the Open-source models list and can be deployed like any other model.

Optimization Techniques

Optimize Platform-hosted models before deployment for improved performance.

Technique	Best For	Quantization
Skip Optimization	Maximum compatibility	None
CTranslate2 (CT2)	Small-medium models, low latency, CPU/GPU	int8_float16
vLLM	Large models, high throughput, GPU clusters	AWQ (4-bit)

CTranslate2

Efficient inference for translation and NLP tasks with optimized kernels. Advantages: CPU and GPU support, int8 quantization, multi-threading, PyTorch/TensorFlow compatibility.

vLLM

High-performance inference for very large language models. Advantages: Advanced memory management, model/data parallelism, mixed-precision, AWQ quantization.

When to Choose

Criteria	CTranslate2	vLLM
Model size	Small to medium	Large (billions of parameters)
Latency priority	✓
Throughput priority		✓
Resource-constrained	✓
Distributed deployment		✓

Deployment Parameters

Configure these parameters when deploying a model:

Inference Parameters

Parameter	Description
Temperature	Sampling temperature (randomness)
Maximum Length	Max tokens to generate
Top P	Nucleus sampling probability mass
Top K	Number of highest probability tokens
Stop Sequences	Strings that stop generation
Inference Batch Size	Batch size for concurrent requests

Scaling Parameters

Parameter	Description
Min Replicas	Minimum model replicas deployed
Max Replicas	Maximum replicas for auto-scaling
Scale-up Delay	Seconds to wait before scaling up
Scale-down Delay	Seconds to wait before scaling down

Hardware

Select GPU hardware based on model size and performance requirements.

Manage Deployed Models

After deployment, access model settings via the three-dot menu or by clicking the deployment.

Model Endpoint

View the generated API endpoint in three formats (cURL, Python, Node.js). Use this to invoke your model externally. Structured Output: Kore-hosted models support JSON schema responses via the response_format parameter (v2/chat/completions endpoint only). See Supported Models for Structured Output.

Deployment History

Track all deployment versions with:

Deployment name and timestamp
Duration and status
Who deployed and when
Un-deployment details (if applicable)

Version numbers auto-increment (e.g., Model_v1, Model_v2).

API Keys

Generate API keys for external access to your deployed model.

Go to API Keys tab
Click Create a new API key
Enter a name and click Generate key
Copy the key (shown only once)

API keys are scoped to the specific deployment.

Configurations

Setting	Description
Description	Edit model description
Tags	Add searchable tags
Endpoint Timeout	30–180 seconds (default: 60)
Undeploy	Stop the model (removes active instances)
Delete	Remove undeployed model and all data

Timeout precedence: Tool > Node timeout > Model timeout.

Re-deploy a Model

To update parameters or hardware after initial deployment:

Select the deployed model
Click Deploy model
Modify settings as needed
Click Deploy

A new version is created automatically.

Limitations

Open-source models do not support tool calling
Cannot be used with Agentic Apps or Agents
Hugging Face models require Transformers ≤4.43.1
Structured output requires v2/chat/completions endpoint and vLLM/no optimization

Supported Models — Complete model list
External Models — Commercial models with tool calling
Enable Hugging Face — Connect your account
Model Analytics — Monitor performance

BUILDING AGENTS

PLATFORM FEATURES

OPERATIONS

REFERENCES

Overview

View Models & Deployments

Deploy a Platform-Hosted Model

Deploy from Hugging Face

Import a Local Model

Import a Base Model

Import an Adapter Model

Import Status

Optimization Techniques

CTranslate2

vLLM

When to Choose

Deployment Parameters

Inference Parameters

Scaling Parameters

Hardware

Manage Deployed Models

Model Endpoint

Deployment History

API Keys

Configurations

Re-deploy a Model

Limitations

BUILDING AGENTS

PLATFORM FEATURES

OPERATIONS

REFERENCES

​Overview

​View Models & Deployments

​Deploy a Platform-Hosted Model

​Deploy from Hugging Face

​Import a Local Model

​Import a Base Model

​Import an Adapter Model

​Import Status

​Optimization Techniques

​CTranslate2

​vLLM

​When to Choose

​Deployment Parameters

​Inference Parameters

​Scaling Parameters

​Hardware

​Manage Deployed Models

​Model Endpoint

​Deployment History

​API Keys

​Configurations

​Re-deploy a Model

​Limitations

​Related

Overview

View Models & Deployments

Deploy a Platform-Hosted Model

Deploy from Hugging Face

Import a Local Model

Import a Base Model

Import an Adapter Model

Import Status

Optimization Techniques

CTranslate2

vLLM

When to Choose

Deployment Parameters

Inference Parameters

Scaling Parameters

Hardware

Manage Deployed Models

Model Endpoint

Deployment History

API Keys

Configurations

Re-deploy a Model

Limitations

Related