Announcing Porter's $20M Series A!
Read more
Changelog
July 31, 2024

Introducing the Inference Tab and GPU metric-based Autoscaling

Shankar Radhakrishnan
1 min read

Users can now one-click deploy models in your own AWS infra and autoscale GPU-enabled apps based on GPU and VRAM utilization.

OUTLINE

Inference Tab

We’ve created a new hub where you can one-click launch inference workloads in your own AWS infrastructure. Models including DeepGram, Llama-3-8B, and Mistral-3-7B are currently supported.

We run inference workloads on the ideal GPUs for these models, using industry-standard engines that guarantee high throughput and low latency.

GPU metric-based Autoscaling

Users can configure GPU-enabled applications to autoscale based on GPU and VRAM utilization. You can configure this feature from the ​Resources​ tab of your application. Applications need to be bound to a specific GPU node group.