Skip to main content
Deploy and manage open-source models on Agent Platform’s managed infrastructure.

Overview

Agent Platform supports 30+ curated open-source models and allows importing any text generation model from Hugging Face. Models are deployed on Kore-hosted infrastructure with optional optimization for improved performance. Model Sources:
SourceDescription
Platform-HostedCurated list of 30+ models ready for deployment
Hugging FaceImport any compatible text generation model
Local ImportUpload model files (.zip) from your machine
Open-source models do not support tool calling and cannot be used with Agentic Apps. Use external commercial models for agentic workflows.
For the complete list of supported models, see Supported Models.

View Models & Deployments

  • View Models: Go to ModelsOpen-source models and click a model to view its deployments.
  • Deployments List: Each model can have multiple deployments. Select a deployment to access its settings.

Deploy a Platform-Hosted Model

  1. Go to ModelsOpen-source modelsDeploy a model
  2. Select Platform-hosted and choose a model from the dropdown
  3. Add a Description and Tags, then click Next
  4. Select an optimization technique (optional)
  5. Configure deployment parameters
  6. Select Hardware for deployment
  7. Review and accept terms, then click Deploy
After deployment, the model is available across Agent Platform and via API endpoint.

Deploy from Hugging Face

Import and deploy any compatible model from Hugging Face.
Note: Models must be compatible with Transformers library version ≤4.43.1.
  1. Go to ModelsOpen-source modelsDeploy a model
  2. Select Hugging Face
  3. Enter Deployment name and Description
  4. Select your Hugging Face connection (or use public mode)
  5. Enter the Hugging Face model name
  6. Configure deployment parameters
  7. Select Hardware and click Deploy
To connect your Hugging Face account, see Enable Hugging Face.

Import a Local Model

Upload model files from your local machine for deployment.

Import a Base Model

  1. Go to ModelsOpen-source modelsImport model
  2. Select the Base Model tab
  3. Upload your model file (.zip format)
  4. Click Import

Import an Adapter Model

Adapter models require a compatible base model.
  1. Go to ModelsOpen-source modelsImport model
  2. Select the Adapter Model tab
  3. Choose a Base model from the Platform-hosted list
  4. Upload your adapter model file (.zip format)
  5. Click Import

Import Status

StatusDescription
ImportingFile is uploading
ValidatingFile is being validated
Import FailedError occurred (view details to troubleshoot)
Ready to DeployModel is ready for deployment
After import, the model appears in the Open-source models list and can be deployed like any other model.

Optimization Techniques

Optimize Platform-hosted models before deployment for improved performance.
TechniqueBest ForQuantization
Skip OptimizationMaximum compatibilityNone
CTranslate2 (CT2)Small-medium models, low latency, CPU/GPUint8_float16
vLLMLarge models, high throughput, GPU clustersAWQ (4-bit)

CTranslate2

Efficient inference for translation and NLP tasks with optimized kernels. Advantages: CPU and GPU support, int8 quantization, multi-threading, PyTorch/TensorFlow compatibility.

vLLM

High-performance inference for very large language models. Advantages: Advanced memory management, model/data parallelism, mixed-precision, AWQ quantization.

When to Choose

CriteriaCTranslate2vLLM
Model sizeSmall to mediumLarge (billions of parameters)
Latency priority
Throughput priority
Resource-constrained
Distributed deployment

Deployment Parameters

Configure these parameters when deploying a model:

Inference Parameters

ParameterDescription
TemperatureSampling temperature (randomness)
Maximum LengthMax tokens to generate
Top PNucleus sampling probability mass
Top KNumber of highest probability tokens
Stop SequencesStrings that stop generation
Inference Batch SizeBatch size for concurrent requests

Scaling Parameters

ParameterDescription
Min ReplicasMinimum model replicas deployed
Max ReplicasMaximum replicas for auto-scaling
Scale-up DelaySeconds to wait before scaling up
Scale-down DelaySeconds to wait before scaling down

Hardware

Select GPU hardware based on model size and performance requirements.

Manage Deployed Models

After deployment, access model settings via the three-dot menu or by clicking the deployment.

Model Endpoint

View the generated API endpoint in three formats (cURL, Python, Node.js). Use this to invoke your model externally. Structured Output: Kore-hosted models support JSON schema responses via the response_format parameter (v2/chat/completions endpoint only). See Supported Models for Structured Output.

Deployment History

Track all deployment versions with:
  • Deployment name and timestamp
  • Duration and status
  • Who deployed and when
  • Un-deployment details (if applicable)
Version numbers auto-increment (e.g., Model_v1, Model_v2).

API Keys

Generate API keys for external access to your deployed model.
  1. Go to API Keys tab
  2. Click Create a new API key
  3. Enter a name and click Generate key
  4. Copy the key (shown only once)
API keys are scoped to the specific deployment.

Configurations

SettingDescription
DescriptionEdit model description
TagsAdd searchable tags
Endpoint Timeout30–180 seconds (default: 60)
UndeployStop the model (removes active instances)
DeleteRemove undeployed model and all data
Timeout precedence: Tool > Node timeout > Model timeout.

Re-deploy a Model

To update parameters or hardware after initial deployment:
  1. Select the deployed model
  2. Click Deploy model
  3. Modify settings as needed
  4. Click Deploy
A new version is created automatically.

Limitations

  • Open-source models do not support tool calling
  • Cannot be used with Agentic Apps or Agents
  • Hugging Face models require Transformers ≤4.43.1
  • Structured output requires v2/chat/completions endpoint and vLLM/no optimization