Product — GPU

GPU cloud
instances.

NVIDIA A100, L40S, and T4 GPUs for AI training, inference, rendering, and scientific computing. Pay by the hour. Scale on demand.

80GB
Max VRAM (A100)
<60s
Provisioning time
8x
Multi-GPU clusters
$0.50/hr
Starting price (T4)
Hardware

Choose your GPU.

From cost-effective inference to large-scale distributed training.

NVIDIA T4

VRAM16 GB
ArchitectureTuring

Inference, small models, embeddings

$0.50/hr

NVIDIA L40S

VRAM48 GB
ArchitectureAda Lovelace

Fine-tuning, mid-size training, rendering

$1.50/hr

NVIDIA A100 40GB

VRAM40 GB
ArchitectureAmpere

Large model training, multi-GPU

$2.50/hr
Popular

NVIDIA A100 80GB

VRAM80 GB
ArchitectureAmpere

LLM training, massive datasets

$3.50/hr
Use cases

What to run on GPU.

raidframe.yaml
services:
inference:
type: web
port: 8080
resources:
gpu: t4
cpu: 4
memory: 16GB
scaling:
min: 1
max: 10
target_rps: 50
trainer:
type: worker
resources:
gpu: a100-80g
gpu_count: 8
command: torchrun --nproc_per_node=8 train.py

LLM Inference

T4 / L40S

Serve fine-tuned models as API endpoints. Auto-scale on request rate. T4 for small models, L40S for 13B+, A100 for 70B+.

Model Training

A100 80GB

Full fine-tuning or LoRA on your own data. Multi-GPU distributed training with NCCL and InfiniBand interconnect.

Embeddings & RAG

T4

Generate embeddings at scale for semantic search and retrieval-augmented generation. Batch processing with queue-based scaling.

Image & Video

L40S

Stable Diffusion, video transcoding, 3D rendering. GPU-accelerated media pipelines with persistent storage for outputs.

Pricing

Transparent GPU pricing.

Pay by the hour. No commitments. Reserved instances available at a discount.

GPUVRAMvCPURAMStorageOn-DemandReserved
T416 GB4 vCPU16 GB100 GB NVMe$0.50/hr$0.30/hr
L40S48 GB8 vCPU64 GB200 GB NVMe$1.50/hr$0.95/hr
A100 40GB40 GB12 vCPU128 GB500 GB NVMe$2.50/hr$1.60/hr
A100 80GB80 GB16 vCPU256 GB1 TB NVMe$3.50/hr$2.25/hr
8x A100 80GB640 GB total128 vCPU2 TB8 TB NVMe$28/hr$18/hr

Reserved pricing requires 1-month or 3-month commitment. Spot instances available at 50-70% discount for interruptible workloads.

Capabilities

Built for ML workloads.

Auto-scaling on queue depth

GPU workers scale based on inference queue depth. Spin up when jobs arrive, scale to zero when idle. Only pay for actual compute.

Multi-GPU training

Distributed training across up to 8x A100 GPUs with NCCL and InfiniBand interconnect. Maximum throughput for large model training.

Persistent NVMe storage

High-speed NVMe volumes for training data, model checkpoints, and datasets. Data persists between sessions and across restarts.

Any framework

PyTorch, TensorFlow, JAX, ONNX Runtime — bring your own Docker image with any CUDA-compatible framework. Pre-built base images available.

GPU monitoring

Real-time GPU utilization, VRAM usage, temperature, and power draw. Alerts on memory pressure or thermal throttling.

Available regions

GPU instances in US East, US West, and EU West. A100 clusters available in US East. Contact sales for dedicated capacity.

Fast provisioning

Most GPU instances available within 60 seconds. High-demand configs may take a few minutes during peak hours.

Model serving

Deploy trained models as API endpoints with auto-scaling. Built-in load balancing across GPU instances for inference workloads.

Spot instances

50-70% discount for workloads that can tolerate interruption. 30-second warning before reclaim. Ideal for batch training with checkpoints.

Frequently asked questions

Do I need to install CUDA drivers?

No. Use NVIDIA's official CUDA Docker images as your base. RaidFrame instances have GPU drivers pre-installed. Just bring your application code.

How quickly are GPUs provisioned?

Most instances available within 60 seconds. A100 80GB and 8x clusters may take a few minutes during peak demand.

Can I reserve GPU capacity?

Yes. Reserved instances at 35-40% discount for 1-month and 3-month terms. Spot instances at 50-70% discount for interruptible workloads.

What about multi-GPU training?

Distributed training across up to 8x A100 GPUs with NCCL and InfiniBand interconnect. Use torchrun, deepspeed, or any distributed framework.

Can I scale GPUs to zero?

Yes. GPU workers can scale to zero when idle and spin up on demand when jobs arrive in the queue. You only pay when GPUs are actually running.

Is data persistent between sessions?

Yes. NVMe volumes persist across restarts and scaling events. Training data, checkpoints, and model weights are safe.

Get GPU access.

Request GPU quota on sign up. Available in US East and EU West.