Canary & Gradual Rollouts

Progressive delivery with automatic rollback on error rate spikes.

Canary Deployments

Route a small percentage of traffic to a new version before full rollout:

rf deploy --canary 5%
Building...        ████████████████████ 100% (32s)
Canary deployed:   5% of traffic → v44
                   95% of traffic → v43 (current)

Monitoring for 10 minutes...
  Error rate: 0.1% (threshold: 5%)
  P99 latency: 92ms (threshold: 500ms)

Promote canary? [Y/n]

Manual Canary Control

# Start canary at 5%
rf deploy --canary 5%

# Increase traffic
rf canary promote --percent 25
rf canary promote --percent 50

# View canary status
rf canary status

# Full rollout
rf canary promote --percent 100

# Abort and rollback
rf canary abort

Gradual Rollouts

Automated progressive delivery with defined stages:

rf deploy --rollout "5%:10m, 25%:10m, 50%:10m, 100%"
Gradual rollout started (v44)
  Stage 1: 5% for 10 minutes     ✓ (error rate: 0.08%)
  Stage 2: 25% for 10 minutes    ✓ (error rate: 0.12%)
  Stage 3: 50% for 10 minutes    ⏳ monitoring...
  Stage 4: 100%                   pending

If the error rate or latency exceeds thresholds at any stage, the rollout is automatically rolled back.

Rollout Configuration

deploy:
  production:
    rollout:
      stages:
        - percent: 5
          duration: 10m
        - percent: 25
          duration: 10m
        - percent: 50
          duration: 15m
        - percent: 100
      abort_conditions:
        error_rate: "> 5%"
        p99_latency: "> 1000ms"
        crash_rate: "> 1%"
      on_abort: rollback

Traffic Splitting

Route traffic based on rules — not just percentage:

deploy:
  traffic_split:
    v44:
      percent: 10
      match:
        headers:
          x-beta: "true"       # Beta users
        cookies:
          beta_optin: "1"      # Opted-in users
        geo: ["US", "CA"]      # Specific countries
    v43:
      percent: 90
      default: true

A/B Testing

Split traffic between two versions and compare metrics:

rf deploy --ab-test v43:50 v44:50 --duration 24h --metric conversion_rate
A/B Test Results (24h)
──────────────────────
                v43 (control)   v44 (variant)
Requests:       12,340          12,287
Error rate:     0.12%           0.09%
P50 latency:    24ms            22ms
Conversion:     3.2%            3.8%  (+18.7%)

Statistical significance: 94.2%
Recommendation: v44 is performing better. Promote? [Y/n]

Blue/Green Deployments

Run two full environments and switch traffic instantly:

# Deploy to green (inactive)
rf deploy --target green

# Verify green is healthy
rf services info api --target green

# Switch traffic from blue to green
rf deploy swap

# If something breaks, swap back
rf deploy swap
Traffic swap: blue → green
  ✓ All traffic now routed to green (v44)
  Blue (v43) kept alive for 30 minutes for quick swap-back

Monitoring During Rollouts

During any progressive deployment, RaidFrame provides real-time comparison:

rf canary status
CANARY STATUS (v44 at 25%)
─────────────────────────
                Canary (v44)    Stable (v43)
Instances:      1               3
Traffic:        25%             75%
Error rate:     0.11%           0.14%
P50:            18ms            22ms
P99:            110ms           135ms
CPU:            34%             38%
Memory:         480MB           512MB

✓ Canary is performing equal or better across all metrics