Automatic horizontal scaling based on CPU, memory, requests, queue depth, or schedule.
RaidFrame monitors your service metrics and automatically adjusts the number of running instances. New instances spin up in under 10 seconds. Scale-down happens after a configurable cooldown to prevent flapping.
Scale when average CPU across instances exceeds a threshold:
services:
api:
scaling:
min: 2
max: 20
target_cpu: 70
scaling:
target_memory: 80
Scale based on requests per second per instance:
scaling:
target_rps: 500
Scale workers based on queue depth:
services:
processor:
type: worker
scaling:
min: 1
max: 50
target_queue_depth: 100
When the queue has 500 items and target_queue_depth is 100, RaidFrame scales to 5 instances.
Push your own metrics and scale on them:
rf metrics push api.active_connections 1500
scaling:
custom_metric: api.active_connections
target_value: 200
Pre-scale for known traffic patterns:
scaling:
min: 2
max: 20
schedule:
- cron: "0 9 * * MON-FRI"
min: 10
comment: "Business hours"
- cron: "0 18 * * MON-FRI"
min: 2
comment: "After hours"
- cron: "0 0 25 12 *"
min: 30
comment: "Christmas sale"
Idle services can scale to zero to save costs. The first request triggers a cold start:
scaling:
min: 0
max: 10
scale_to_zero:
idle_timeout: 15m
cold_start_target: 3s
RaidFrame keeps a warm standby image to minimize cold start time.
# View current scaling status
rf services info api
# Manually scale
rf services scale api --min 5 --max 30
# Temporarily override auto-scaling
rf services scale api --fixed 10 --duration 2h
# View scaling events
rf metrics --service api --type scaling
SCALING EVENTS (last 24h)
─────────────────────────────
14:23 api 2 → 4 instances (cpu: 82%)
14:45 api 4 → 6 instances (cpu: 76%)
15:30 api 6 → 4 instances (cooldown, cpu: 35%)
18:00 api 4 → 2 instances (scheduled: after hours)
| Field | Type | Default | Description |
|---|---|---|---|
min | int | 1 | Minimum instances (0 enables scale-to-zero) |
max | int | 10 | Maximum instances |
target_cpu | int | 70 | CPU % threshold |
target_memory | int | - | Memory % threshold |
target_rps | int | - | Requests/sec per instance |
target_queue_depth | int | - | Queue items per worker |
cooldown | duration | 5m | Wait before scale-down |
scale_up_speed | string | fast | fast (10s) or gradual (60s) |
scale_down_speed | string | gradual | fast (30s) or gradual (5m) |