Deploy and manage open-source models on Agent Platform’s managed infrastructure.
Overview
Agent Platform supports 30+ curated open-source models and allows importing any text generation model from Hugging Face. Models are deployed on Kore-hosted infrastructure with optional optimization for improved performance.
Model Sources:
| Source | Description |
|---|
| Platform-Hosted | Curated list of 30+ models ready for deployment |
| Hugging Face | Import any compatible text generation model |
| Local Import | Upload model files (.zip) from your machine |
Open-source models do not support tool calling and cannot be used with Agentic Apps. Use external commercial models for agentic workflows.
For the complete list of supported models, see Supported Models.
View Models & Deployments
-
View Models: Go to Models → Open-source models and click a model to view its deployments.
-
Deployments List: Each model can have multiple deployments. Select a deployment to access its settings.
- Go to Models → Open-source models → Deploy a model
- Select Platform-hosted and choose a model from the dropdown
- Add a Description and Tags, then click Next
- Select an optimization technique (optional)
- Configure deployment parameters
- Select Hardware for deployment
- Review and accept terms, then click Deploy
After deployment, the model is available across Agent Platform and via API endpoint.
Deploy from Hugging Face
Import and deploy any compatible model from Hugging Face.
Note: Models must be compatible with Transformers library version ≤4.43.1.
- Go to Models → Open-source models → Deploy a model
- Select Hugging Face
- Enter Deployment name and Description
- Select your Hugging Face connection (or use public mode)
- Enter the Hugging Face model name
- Configure deployment parameters
- Select Hardware and click Deploy
To connect your Hugging Face account, see Enable Hugging Face.
Import a Local Model
Upload model files from your local machine for deployment.
Import a Base Model
- Go to Models → Open-source models → Import model
- Select the Base Model tab
- Upload your model file (.zip format)
- Click Import
Import an Adapter Model
Adapter models require a compatible base model.
- Go to Models → Open-source models → Import model
- Select the Adapter Model tab
- Choose a Base model from the Platform-hosted list
- Upload your adapter model file (.zip format)
- Click Import
Import Status
| Status | Description |
|---|
| Importing | File is uploading |
| Validating | File is being validated |
| Import Failed | Error occurred (view details to troubleshoot) |
| Ready to Deploy | Model is ready for deployment |
After import, the model appears in the Open-source models list and can be deployed like any other model.
Optimization Techniques
Optimize Platform-hosted models before deployment for improved performance.
| Technique | Best For | Quantization |
|---|
| Skip Optimization | Maximum compatibility | None |
| CTranslate2 (CT2) | Small-medium models, low latency, CPU/GPU | int8_float16 |
| vLLM | Large models, high throughput, GPU clusters | AWQ (4-bit) |
CTranslate2
Efficient inference for translation and NLP tasks with optimized kernels.
Advantages: CPU and GPU support, int8 quantization, multi-threading, PyTorch/TensorFlow compatibility.
vLLM
High-performance inference for very large language models.
Advantages: Advanced memory management, model/data parallelism, mixed-precision, AWQ quantization.
When to Choose
| Criteria | CTranslate2 | vLLM |
|---|
| Model size | Small to medium | Large (billions of parameters) |
| Latency priority | ✓ | |
| Throughput priority | | ✓ |
| Resource-constrained | ✓ | |
| Distributed deployment | | ✓ |
Deployment Parameters
Configure these parameters when deploying a model:
Inference Parameters
| Parameter | Description |
|---|
| Temperature | Sampling temperature (randomness) |
| Maximum Length | Max tokens to generate |
| Top P | Nucleus sampling probability mass |
| Top K | Number of highest probability tokens |
| Stop Sequences | Strings that stop generation |
| Inference Batch Size | Batch size for concurrent requests |
Scaling Parameters
| Parameter | Description |
|---|
| Min Replicas | Minimum model replicas deployed |
| Max Replicas | Maximum replicas for auto-scaling |
| Scale-up Delay | Seconds to wait before scaling up |
| Scale-down Delay | Seconds to wait before scaling down |
Hardware
Select GPU hardware based on model size and performance requirements.
Manage Deployed Models
After deployment, access model settings via the three-dot menu or by clicking the deployment.
Model Endpoint
View the generated API endpoint in three formats (cURL, Python, Node.js). Use this to invoke your model externally.
Structured Output: Kore-hosted models support JSON schema responses via the response_format parameter (v2/chat/completions endpoint only). See Supported Models for Structured Output.
Deployment History
Track all deployment versions with:
- Deployment name and timestamp
- Duration and status
- Who deployed and when
- Un-deployment details (if applicable)
Version numbers auto-increment (e.g., Model_v1, Model_v2).
API Keys
Generate API keys for external access to your deployed model.
- Go to API Keys tab
- Click Create a new API key
- Enter a name and click Generate key
- Copy the key (shown only once)
API keys are scoped to the specific deployment.
Configurations
| Setting | Description |
|---|
| Description | Edit model description |
| Tags | Add searchable tags |
| Endpoint Timeout | 30–180 seconds (default: 60) |
| Undeploy | Stop the model (removes active instances) |
| Delete | Remove undeployed model and all data |
Timeout precedence: Tool > Node timeout > Model timeout.
Re-deploy a Model
To update parameters or hardware after initial deployment:
- Select the deployed model
- Click Deploy model
- Modify settings as needed
- Click Deploy
A new version is created automatically.
Limitations
- Open-source models do not support tool calling
- Cannot be used with Agentic Apps or Agents
- Hugging Face models require Transformers ≤4.43.1
- Structured output requires v2/chat/completions endpoint and vLLM/no optimization