Engineeringdeploymentzero-downtimeDevOps

Zero Downtime Deployments: How to Ship Without Taking Your App Offline

A complete guide to zero-downtime deployment strategies — rolling deploys, blue/green, canary releases, database migrations, and the gotchas that cause 2am incidents.

R

RaidFrame Team

September 28, 2025 · 5 min read

"We'll do the deploy at 2am when traffic is low."

If you're scheduling maintenance windows for deployments, your deployment process is broken. Every modern application should deploy during business hours, multiple times per day, with zero user impact.

Here's how.

Rolling deployments

The most common zero-downtime strategy. New instances roll in while old instances roll out.

Time 0: [v1] [v1] [v1] [v1]    ← all running v1
Time 1: [v2] [v1] [v1] [v1]    ← first v2 instance ready
Time 2: [v2] [v2] [v1] [v1]    ← second v2 ready, first v1 draining
Time 3: [v2] [v2] [v2] [v1]    ← third v2 ready
Time 4: [v2] [v2] [v2] [v2]    ← complete, all v1 terminated

During the transition, both v1 and v2 serve traffic. This means your application must handle:

  1. Backward-compatible APIs — v2 must accept requests that v1 could handle
  2. Shared session state — sessions can't be stored in-memory (use Redis)
  3. Graceful shutdown — v1 instances must finish in-flight requests before terminating

Graceful shutdown

When the platform sends SIGTERM to your app, don't exit immediately. Finish what you're doing first.

let isShuttingDown = false;
 
process.on("SIGTERM", () => {
  isShuttingDown = true;
 
  // Stop accepting new connections
  server.close(() => {
    // Close database connections
    pool.end();
    process.exit(0);
  });
 
  // Force exit after 30 seconds
  setTimeout(() => process.exit(1), 30000);
});
 
// Health check returns unhealthy during shutdown
app.get("/health", (req, res) => {
  if (isShuttingDown) return res.status(503).send("shutting down");
  res.status(200).send("ok");
});

Connection draining

The load balancer stops sending NEW requests to the old instance but lets EXISTING requests finish. Configure a drain timeout (30-60 seconds is typical).

Blue/green deployments

Run two identical environments. "Blue" is live, "green" is staging the new version.

Before:  Traffic → [Blue: v1]     [Green: idle]
Deploy:  Traffic → [Blue: v1]     [Green: v2 deploying]
Switch:  Traffic → [Green: v2]    [Blue: v1 standby]

Advantages:

  • Instant switchover (DNS or load balancer change)
  • Instant rollback (switch back to blue)
  • No mixed-version traffic

Disadvantages:

  • 2x infrastructure cost during deployment
  • Database schema changes are tricky (both versions share the database)

Canary deployments

Deploy to a small percentage of traffic, monitor, then gradually increase.

Step 1:  5% → v2,   95% → v1    (monitor for 10 min)
Step 2:  25% → v2,  75% → v1    (monitor for 10 min)
Step 3:  50% → v2,  50% → v1    (monitor for 10 min)
Step 4:  100% → v2               (complete)

If error rates or latency spike at any step, automatically roll back.

Best for: High-traffic services where a bad deploy affects millions of users. The canary catches issues before they reach everyone.

Try RaidFrame free

Deploy your first app in 60 seconds. No credit card required.

Start free

Database migrations (the hard part)

The number one cause of deployment downtime is database migrations. You can't just ALTER TABLE in production and hope for the best.

The expand-contract pattern

Step 1: Expand — add new columns/tables without removing old ones

-- Deploy 1: Add new column, nullable
ALTER TABLE users ADD COLUMN display_name varchar(255);

Step 2: Migrate — backfill data, update application to use new column

-- Background job: copy data
UPDATE users SET display_name = name WHERE display_name IS NULL;

Step 3: Contract — remove old column after all code uses the new one

-- Deploy 3: Drop old column (weeks later, after verification)
ALTER TABLE users DROP COLUMN name;

Each step is its own deployment. The database is always compatible with both the old and new application code.

Dangerous migrations

Never do these in a single deploy:

  • Rename a column (old code can't find it)
  • Change a column type (existing data may not convert)
  • Add a NOT NULL column without a default (existing rows fail)
  • Drop a column that old code still reads

Always do these instead:

  • Add a new column, migrate data, drop old column (3 deploys)
  • Add a new table, dual-write, backfill, switch reads, drop old table
  • Use feature flags to control which code path runs

Lock-free migrations

Large ALTER TABLE operations lock the table. On a table with millions of rows, this can lock for minutes.

Use pg_repack or CREATE INDEX CONCURRENTLY for lock-free operations:

-- Bad: locks the table
CREATE INDEX idx_users_email ON users(email);
 
-- Good: no lock
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

Feature flags

Feature flags let you deploy code without activating it. Ship the feature to production behind a flag, then enable it when ready.

if (featureFlags.isEnabled("new-checkout-flow", user)) {
  return renderNewCheckout();
} else {
  return renderOldCheckout();
}

This decouples deployment from release. You can deploy 10 times a day and release features on a completely separate schedule.

Monitoring during deployment

Watch these metrics during every deploy:

  • Error rate — should not increase. If it does, rollback.
  • Response time (p95) — should not increase significantly.
  • CPU/memory — new version shouldn't use dramatically more resources.
  • Business metrics — conversion rate, signup rate, revenue. If they drop, investigate.

Set up automatic rollback triggers:

  • Error rate > 5% → rollback
  • p95 latency > 2x baseline → rollback
  • Health check failures > 0 → stop rollout

Zero-downtime deploys on RaidFrame

Every deploy on RaidFrame is zero-downtime by default:

rf deploy

The platform handles:

  • Rolling deployment with health checks
  • Connection draining (30s configurable)
  • Automatic rollback on health check failure
  • Preview environments for every PR
  • One-command rollback to any previous version

No maintenance windows. No 2am deploys. Ship whenever you want.

deploymentzero-downtimeDevOpsreliability

Ship faster with RaidFrame

Auto-scaling compute, managed databases, global CDN, and zero-config CI/CD. Free tier included.