Hardware

Choose your GPU.

From cost-effective inference to large-scale distributed training.

NVIDIA T4

VRAM16 GB

ArchitectureTuring

Inference, small models, embeddings

$0.50/hr

NVIDIA L40S

VRAM48 GB

ArchitectureAda Lovelace

Fine-tuning, mid-size training, rendering

$1.50/hr

NVIDIA A100 40GB

VRAM40 GB

ArchitectureAmpere

Large model training, multi-GPU

$2.50/hr

Popular

NVIDIA A100 80GB

VRAM80 GB

ArchitectureAmpere

LLM training, massive datasets

$3.50/hr

Use cases

What to run on GPU.

raidframe.yaml

services:

inference:

type: web

port: 8080

resources:

gpu: t4

cpu: 4

memory: 16GB

scaling:

min: 1

max: 10

target_rps: 50

trainer:

type: worker

resources:

gpu: a100-80g

gpu_count: 8

command: torchrun --nproc_per_node=8 train.py

LLM Inference

T4 / L40S

Serve fine-tuned models as API endpoints. Auto-scale on request rate. T4 for small models, L40S for 13B+, A100 for 70B+.

Model Training

A100 80GB

Full fine-tuning or LoRA on your own data. Multi-GPU distributed training with NCCL and InfiniBand interconnect.

Embeddings & RAG

Generate embeddings at scale for semantic search and retrieval-augmented generation. Batch processing with queue-based scaling.

Image & Video

L40S

Stable Diffusion, video transcoding, 3D rendering. GPU-accelerated media pipelines with persistent storage for outputs.

Pricing

Transparent GPU pricing.

Pay by the hour. No commitments. Reserved instances available at a discount.

GPU	VRAM	vCPU	RAM	Storage	On-Demand	Reserved
T4	16 GB	4 vCPU	16 GB	100 GB NVMe	$0.50/hr	$0.30/hr
L40S	48 GB	8 vCPU	64 GB	200 GB NVMe	$1.50/hr	$0.95/hr
A100 40GB	40 GB	12 vCPU	128 GB	500 GB NVMe	$2.50/hr	$1.60/hr
A100 80GB	80 GB	16 vCPU	256 GB	1 TB NVMe	$3.50/hr	$2.25/hr
8x A100 80GB	640 GB total	128 vCPU	2 TB	8 TB NVMe	$28/hr	$18/hr

Reserved pricing requires 1-month or 3-month commitment. Spot instances available at 50-70% discount for interruptible workloads.

Capabilities

Built for ML workloads.

Auto-scaling on queue depth

GPU workers scale based on inference queue depth. Spin up when jobs arrive, scale to zero when idle. Only pay for actual compute.

Multi-GPU training

Distributed training across up to 8x A100 GPUs with NCCL and InfiniBand interconnect. Maximum throughput for large model training.

Persistent NVMe storage

High-speed NVMe volumes for training data, model checkpoints, and datasets. Data persists between sessions and across restarts.

Any framework

PyTorch, TensorFlow, JAX, ONNX Runtime — bring your own Docker image with any CUDA-compatible framework. Pre-built base images available.

GPU monitoring

Real-time GPU utilization, VRAM usage, temperature, and power draw. Alerts on memory pressure or thermal throttling.

Available regions

GPU instances in US East, US West, and EU West. A100 clusters available in US East. Contact sales for dedicated capacity.

Fast provisioning

Most GPU instances available within 60 seconds. High-demand configs may take a few minutes during peak hours.

Model serving

Deploy trained models as API endpoints with auto-scaling. Built-in load balancing across GPU instances for inference workloads.

Spot instances

50-70% discount for workloads that can tolerate interruption. 30-second warning before reclaim. Ideal for batch training with checkpoints.

Frequently asked questions

Do I need to install CUDA drivers?

No. Use NVIDIA's official CUDA Docker images as your base. RaidFrame instances have GPU drivers pre-installed. Just bring your application code.

How quickly are GPUs provisioned?

Most instances available within 60 seconds. A100 80GB and 8x clusters may take a few minutes during peak demand.

Can I reserve GPU capacity?

Yes. Reserved instances at 35-40% discount for 1-month and 3-month terms. Spot instances at 50-70% discount for interruptible workloads.

What about multi-GPU training?

Distributed training across up to 8x A100 GPUs with NCCL and InfiniBand interconnect. Use torchrun, deepspeed, or any distributed framework.

Can I scale GPUs to zero?

Yes. GPU workers can scale to zero when idle and spin up on demand when jobs arrive in the queue. You only pay when GPUs are actually running.

Is data persistent between sessions?

Yes. NVMe volumes persist across restarts and scaling events. Training data, checkpoints, and model weights are safe.

GPU cloud
instances.

Choose your GPU.

NVIDIA T4

NVIDIA L40S

NVIDIA A100 40GB

NVIDIA A100 80GB

What to run on GPU.

LLM Inference

Model Training

Embeddings & RAG

Image & Video

Transparent GPU pricing.

Built for ML workloads.

Auto-scaling on queue depth

Multi-GPU training

Persistent NVMe storage

Any framework

GPU monitoring

Available regions

Fast provisioning

Model serving

Spot instances

Frequently asked questions

Do I need to install CUDA drivers?

How quickly are GPUs provisioned?

Can I reserve GPU capacity?

What about multi-GPU training?

Can I scale GPUs to zero?

Is data persistent between sessions?

Explore more

Compute

Databases

GPU Cloud Hosting Guide

AI & Legal Tech

Get GPU access.

GPU cloudinstances.

Choose your GPU.

NVIDIA T4

NVIDIA L40S

NVIDIA A100 40GB

NVIDIA A100 80GB

What to run on GPU.

LLM Inference

Model Training

Embeddings & RAG

Image & Video

Transparent GPU pricing.

Built for ML workloads.

Auto-scaling on queue depth

Multi-GPU training

Persistent NVMe storage

Any framework

GPU monitoring

Available regions

Fast provisioning

Model serving

Spot instances

Frequently asked questions

Do I need to install CUDA drivers?

How quickly are GPUs provisioned?

Can I reserve GPU capacity?

What about multi-GPU training?

Can I scale GPUs to zero?

Is data persistent between sessions?

Explore more

Compute

Databases

GPU Cloud Hosting Guide

AI & Legal Tech

Get GPU access.

GPU cloud
instances.