Automatic horizontal scaling based on CPU, memory, requests, queue depth, or schedule.

How It Works

RaidFrame monitors your service metrics and automatically adjusts the number of running instances. New instances spin up in under 10 seconds. Scale-down happens after a configurable cooldown to prevent flapping.

Scaling Triggers

CPU-Based (Default)

Scale when average CPU across instances exceeds a threshold:

services:
  api:
    scaling:
      min: 2
      max: 20
      target_cpu: 70

Memory-Based

scaling:
  target_memory: 80

Request-Based

Scale based on requests per second per instance:

scaling:
  target_rps: 500

Queue-Based (Workers)

Scale workers based on queue depth:

services:
  processor:
    type: worker
    scaling:
      min: 1
      max: 50
      target_queue_depth: 100

When the queue has 500 items and target_queue_depth is 100, RaidFrame scales to 5 instances.

Custom Metrics

Push your own metrics and scale on them:

rf metrics push api.active_connections 1500

scaling:
  custom_metric: api.active_connections
  target_value: 200

Scheduled Scaling

Pre-scale for known traffic patterns:

scaling:
  min: 2
  max: 20
  schedule:
    - cron: "0 9 * * MON-FRI"
      min: 10
      comment: "Business hours"
    - cron: "0 18 * * MON-FRI"
      min: 2
      comment: "After hours"
    - cron: "0 0 25 12 *"
      min: 30
      comment: "Christmas sale"

Scale to Zero

Idle services can scale to zero to save costs. The first request triggers a cold start:

scaling:
  min: 0
  max: 10
  scale_to_zero:
    idle_timeout: 15m
    cold_start_target: 3s

RaidFrame keeps a warm standby image to minimize cold start time.

CLI Commands

# View current scaling status
rf services info api

# Manually scale
rf services scale api --min 5 --max 30

# Temporarily override auto-scaling
rf services scale api --fixed 10 --duration 2h

# View scaling events
rf metrics --service api --type scaling

SCALING EVENTS (last 24h)
─────────────────────────────
14:23  api  2 → 4 instances  (cpu: 82%)
14:45  api  4 → 6 instances  (cpu: 76%)
15:30  api  6 → 4 instances  (cooldown, cpu: 35%)
18:00  api  4 → 2 instances  (scheduled: after hours)

Scaling Configuration

Field	Type	Default	Description
`min`	int	1	Minimum instances (0 enables scale-to-zero)
`max`	int	10	Maximum instances
`target_cpu`	int	70	CPU % threshold
`target_memory`	int	-	Memory % threshold
`target_rps`	int	-	Requests/sec per instance
`target_queue_depth`	int	-	Queue items per worker
`cooldown`	duration	5m	Wait before scale-down
`scale_up_speed`	string	fast	`fast` (10s) or `gradual` (60s)
`scale_down_speed`	string	gradual	`fast` (30s) or `gradual` (5m)

Auto-Scaling