How to Build and Scale a SaaS on Cloud Infrastructure
End-to-end guide to building a SaaS from first user to 10K customers. Stack decisions, infrastructure milestones, and scaling patterns that actually work.
RaidFrame Team
March 5, 2026 · 5 min read
TL;DR — Building a SaaS has three infrastructure phases: build (just ship it), grow (add caching, queues, and monitoring), and scale (multi-region, auto-scaling, compliance). Most teams over-engineer phase 1 and under-invest in phase 2. Here's what to do at each stage.
Phase 1: Build (0-100 users)
Goal: ship and get feedback
Your infrastructure should take less than an hour to set up. If you're spending more time on DevOps than product, you're doing it wrong.
The stack:
# raidframe.yaml
services:
web:
type: web
build:
dockerfile: Dockerfile
port: 3000
scaling:
min: 1
max: 1
databases:
main:
engine: postgresOne service. One database. Deploy and move on.
What to skip:
- Redis (your database handles 100 users fine)
- Background workers (process everything synchronously)
- CDN (your app isn't bandwidth-constrained)
- Monitoring beyond uptime checks
- Multi-region (pick one region close to your target market)
- Microservices (absolutely not)
What NOT to skip:
- Automated backups (RaidFrame does this automatically)
- SSL (automatic on RaidFrame)
- Environment variables for secrets (never hardcode)
Monthly cost: $0-7
Try RaidFrame free
Deploy your first app in 60 seconds. No credit card required.
Phase 2: Grow (100-5,000 users)
Goal: performance and reliability
You have paying customers. Downtime costs money. Slow pages lose conversions.
Add caching:
rf add redisconst cached = await redis.get(`user:${id}`);
if (cached) return JSON.parse(cached);
const user = await db.query("SELECT * FROM users WHERE id = $1", [id]);
await redis.set(`user:${id}`, JSON.stringify(user), "EX", 300);
return user;Add background jobs:
rf cron add "0 9 * * *" "node scripts/daily-digest.js" --name digestMove slow operations out of the request path:
- Email sending → queue
- Image processing → queue
- Report generation → cron
- Data sync → cron
Add monitoring:
rf alerts create --name "Errors" --metric error_rate --service web --threshold "> 2%" --window 5m --notify slack
rf alerts create --name "Slow" --metric response_time_p99 --service web --threshold "> 1000ms" --window 5m --notify slackScale the database:
rf db upgrade main --plan proAdd a staging environment:
rf env create staging --from productionTest deployments on staging before pushing to production. Preview environments for PRs give you per-feature testing.
Updated stack:
services:
web:
type: web
scaling:
min: 2
max: 5
worker:
type: worker
command: node worker.js
databases:
main:
engine: postgres
plan: pro
cache:
engine: redisMonthly cost: $25-80
Phase 3: Scale (5,000-50,000+ users)
Goal: handle load, meet compliance, go global
Auto-scaling:
services:
web:
scaling:
min: 4
max: 40
target_cpu: 70Multi-region:
rf regions add eu-west-1
rf db replicas add main --region eu-west-1Users in Europe get sub-50ms responses instead of 200ms.
Database read replicas:
Route read queries to replicas to reduce load on the primary:
const readDb = new Pool({ connectionString: process.env.DATABASE_REPLICA_URL });
const writeDb = new Pool({ connectionString: process.env.DATABASE_URL });
// Reads go to replica
app.get("/api/products", async (req, res) => {
const products = await readDb.query("SELECT * FROM products");
res.json(products.rows);
});
// Writes go to primary
app.post("/api/orders", async (req, res) => {
const order = await writeDb.query("INSERT INTO orders...");
res.json(order.rows[0]);
});Compliance:
rf compliance enable soc2
rf compliance enable hipaa # if healthcareFull-stack search:
rf add searchDon't use ILIKE queries on Postgres at scale. Move to managed search for product search, user search, and content search.
Updated stack:
services:
web:
type: web
scaling:
min: 4
max: 40
api:
type: web
scaling:
min: 2
max: 20
worker:
type: worker
scaling:
min: 2
max: 10
databases:
main:
engine: postgres
plan: performance
read_replicas:
- region: eu-west-1
cache:
engine: redis
plan: pro
search:
products:
type: search
queues:
tasks:
type: queueMonthly cost: $200-800
Common mistakes at each phase
| Phase | Mistake | Fix |
|---|---|---|
| Build | Over-engineering (microservices, k8s) | Monolith + single DB |
| Build | No backups | Use managed DB (automatic) |
| Grow | No caching | Add Redis for hot paths |
| Grow | No staging environment | Clone production config |
| Grow | Ignoring slow queries | Check rf db insights weekly |
| Scale | No auto-scaling | Configure min/max with CPU target |
| Scale | Single region | Add regions where users are |
| Scale | No read replicas | Route reads to replicas |
FAQ
When should I add a second service?
When you have background work that slows down API responses. Extract it as a worker. This is usually the first split.
When do I need multi-region?
When you have significant users in multiple continents and they complain about latency. Check your analytics — if 30%+ of traffic comes from another continent, add that region.
How much should infrastructure cost relative to revenue?
5-15% of revenue is healthy. Under 5% means you're probably under-investing. Over 20% means you're over-provisioned or on the wrong platform.
Should I use Kubernetes?
Probably not. Kubernetes makes sense at 20+ services with dedicated DevOps staff. Below that, a managed platform like RaidFrame handles everything Kubernetes does with zero operational overhead.
When should I worry about compliance?
As soon as you handle health data (HIPAA), payment cards (PCI), or EU user data (GDPR). Don't wait for a customer to ask — by then it's a scramble.
Related reading
Ship faster with RaidFrame
Auto-scaling compute, managed databases, global CDN, and zero-config CI/CD. Free tier included.