Auto-Scaling

Automatic horizontal scaling based on CPU, memory, requests, queue depth, or schedule.

How It Works

RaidFrame monitors your service metrics and automatically adjusts the number of running instances. New instances spin up in under 10 seconds. Scale-down happens after a configurable cooldown to prevent flapping.

Scaling Triggers

CPU-Based (Default)

Scale when average CPU across instances exceeds a threshold:

services:
  api:
    scaling:
      min: 2
      max: 20
      target_cpu: 70

Memory-Based

scaling:
  target_memory: 80

Request-Based

Scale based on requests per second per instance:

scaling:
  target_rps: 500

Queue-Based (Workers)

Scale workers based on queue depth:

services:
  processor:
    type: worker
    scaling:
      min: 1
      max: 50
      target_queue_depth: 100

When the queue has 500 items and target_queue_depth is 100, RaidFrame scales to 5 instances.

Custom Metrics

Push your own metrics and scale on them:

rf metrics push api.active_connections 1500
scaling:
  custom_metric: api.active_connections
  target_value: 200

Scheduled Scaling

Pre-scale for known traffic patterns:

scaling:
  min: 2
  max: 20
  schedule:
    - cron: "0 9 * * MON-FRI"
      min: 10
      comment: "Business hours"
    - cron: "0 18 * * MON-FRI"
      min: 2
      comment: "After hours"
    - cron: "0 0 25 12 *"
      min: 30
      comment: "Christmas sale"

Scale to Zero

Idle services can scale to zero to save costs. The first request triggers a cold start:

scaling:
  min: 0
  max: 10
  scale_to_zero:
    idle_timeout: 15m
    cold_start_target: 3s

RaidFrame keeps a warm standby image to minimize cold start time.

CLI Commands

# View current scaling status
rf services info api

# Manually scale
rf services scale api --min 5 --max 30

# Temporarily override auto-scaling
rf services scale api --fixed 10 --duration 2h

# View scaling events
rf metrics --service api --type scaling
SCALING EVENTS (last 24h)
─────────────────────────────
14:23  api  2 → 4 instances  (cpu: 82%)
14:45  api  4 → 6 instances  (cpu: 76%)
15:30  api  6 → 4 instances  (cooldown, cpu: 35%)
18:00  api  4 → 2 instances  (scheduled: after hours)

Scaling Configuration

FieldTypeDefaultDescription
minint1Minimum instances (0 enables scale-to-zero)
maxint10Maximum instances
target_cpuint70CPU % threshold
target_memoryint-Memory % threshold
target_rpsint-Requests/sec per instance
target_queue_depthint-Queue items per worker
cooldownduration5mWait before scale-down
scale_up_speedstringfastfast (10s) or gradual (60s)
scale_down_speedstringgradualfast (30s) or gradual (5m)