Changelog
July 31, 2024
Introducing the Inference Tab and GPU metric-based Autoscaling

Users can now one-click deploy models in your own AWS infra and autoscale GPU-enabled apps based on GPU and VRAM utilization.
OUTLINE
Inference Tab
We’ve created a new hub where you can one-click launch inference workloads in your own AWS infrastructure. Models including DeepGram, Llama-3-8B, and Mistral-3-7B are currently supported.
.png)
We run inference workloads on the ideal GPUs for these models, using industry-standard engines that guarantee high throughput and low latency.
GPU metric-based Autoscaling
Users can configure GPU-enabled applications to autoscale based on GPU and VRAM utilization. You can configure this feature from the Resources tab of your application. Applications need to be bound to a specific GPU node group.




