NVIDIA A100, L40S, and T4 GPUs for AI training, inference, rendering, and scientific computing. Pay by the hour. Scale on demand.
From cost-effective inference to large-scale distributed training.
Inference, small models, embeddings
Fine-tuning, mid-size training, rendering
Large model training, multi-GPU
LLM training, massive datasets
Serve fine-tuned models as API endpoints. Auto-scale on request rate. T4 for small models, L40S for 13B+, A100 for 70B+.
Full fine-tuning or LoRA on your own data. Multi-GPU distributed training with NCCL and InfiniBand interconnect.
Generate embeddings at scale for semantic search and retrieval-augmented generation. Batch processing with queue-based scaling.
Stable Diffusion, video transcoding, 3D rendering. GPU-accelerated media pipelines with persistent storage for outputs.
Pay by the hour. No commitments. Reserved instances available at a discount.
| GPU | VRAM | vCPU | RAM | Storage | On-Demand | Reserved |
|---|---|---|---|---|---|---|
| T4 | 16 GB | 4 vCPU | 16 GB | 100 GB NVMe | $0.50/hr | $0.30/hr |
| L40S | 48 GB | 8 vCPU | 64 GB | 200 GB NVMe | $1.50/hr | $0.95/hr |
| A100 40GB | 40 GB | 12 vCPU | 128 GB | 500 GB NVMe | $2.50/hr | $1.60/hr |
| A100 80GB | 80 GB | 16 vCPU | 256 GB | 1 TB NVMe | $3.50/hr | $2.25/hr |
| 8x A100 80GB | 640 GB total | 128 vCPU | 2 TB | 8 TB NVMe | $28/hr | $18/hr |
Reserved pricing requires 1-month or 3-month commitment. Spot instances available at 50-70% discount for interruptible workloads.
GPU workers scale based on inference queue depth. Spin up when jobs arrive, scale to zero when idle. Only pay for actual compute.
Distributed training across up to 8x A100 GPUs with NCCL and InfiniBand interconnect. Maximum throughput for large model training.
High-speed NVMe volumes for training data, model checkpoints, and datasets. Data persists between sessions and across restarts.
PyTorch, TensorFlow, JAX, ONNX Runtime — bring your own Docker image with any CUDA-compatible framework. Pre-built base images available.
Real-time GPU utilization, VRAM usage, temperature, and power draw. Alerts on memory pressure or thermal throttling.
GPU instances in US East, US West, and EU West. A100 clusters available in US East. Contact sales for dedicated capacity.
Most GPU instances available within 60 seconds. High-demand configs may take a few minutes during peak hours.
Deploy trained models as API endpoints with auto-scaling. Built-in load balancing across GPU instances for inference workloads.
50-70% discount for workloads that can tolerate interruption. 30-second warning before reclaim. Ideal for batch training with checkpoints.
No. Use NVIDIA's official CUDA Docker images as your base. RaidFrame instances have GPU drivers pre-installed. Just bring your application code.
Most instances available within 60 seconds. A100 80GB and 8x clusters may take a few minutes during peak demand.
Yes. Reserved instances at 35-40% discount for 1-month and 3-month terms. Spot instances at 50-70% discount for interruptible workloads.
Distributed training across up to 8x A100 GPUs with NCCL and InfiniBand interconnect. Use torchrun, deepspeed, or any distributed framework.
Yes. GPU workers can scale to zero when idle and spin up on demand when jobs arrive in the queue. You only pay when GPUs are actually running.
Yes. NVMe volumes persist across restarts and scaling events. Training data, checkpoints, and model weights are safe.
Request GPU quota on sign up. Available in US East and EU West.