Observability

Centralized logs, real-time metrics, distributed tracing, and alerts.

Logs

Every service's stdout and stderr are captured, timestamped, and stored. No logging library or agent required.

Stream Live Logs

rf logs
14:23:01 [web]    GET /api/users 200 12ms
14:23:02 [web]    GET /api/users/5 200 8ms
14:23:05 [api]    POST /api/orders 201 45ms
14:23:06 [worker] Processed job email_send (j_abc123) in 230ms

Filter Logs

# By service
rf logs --service api

# By severity
rf logs --level error

# By time range
rf logs --since 2h
rf logs --since "2026-03-16T12:00:00Z" --until "2026-03-16T14:00:00Z"

# Search text
rf logs --search "timeout"
rf logs --search "user_id=u_123"

# Combine filters
rf logs --service api --level error --since 1h --search "database"

# JSON output (for piping)
rf logs --json | jq '.message'

Structured Logging

Output JSON from your app and RaidFrame parses it automatically:

console.log(JSON.stringify({
  level: "info",
  message: "Order created",
  order_id: "o_123",
  user_id: "u_456",
  amount: 99.99,
}));

Structured fields become searchable and filterable in the dashboard and CLI.

Log Retention

PlanRetention
Starter3 days
Pro30 days
Enterprise90 days

Export to your own S3-compatible storage for longer retention:

rf logs export --since 30d --output s3://my-bucket/logs/

Metrics

Built-in metrics for every service with no instrumentation required.

View Metrics

rf metrics
SERVICE     CPU    MEMORY   REQUESTS   P50     P99     ERRORS
web         34%    512MB    142/s      12ms    89ms    0.1%
api         67%    1.2GB    89/s       25ms    210ms   0.3%
worker      12%    256MB    —          —       —       0%
# Detailed metrics for a service
rf metrics --service api --period 24h

# Specific metric
rf metrics --service api --metric response_time_p99 --period 7d

Available Metrics

MetricDescription
cpu_percentCPU utilization
memory_mbMemory usage
memory_percentMemory utilization
disk_mbDisk usage
request_countRequests per second
response_time_p50Median response time
response_time_p9595th percentile response time
response_time_p9999th percentile response time
error_ratePercentage of 5xx responses
instance_countRunning instances
network_in_mbInbound bandwidth
network_out_mbOutbound bandwidth
queue_depthQueue messages pending
db_connectionsActive database connections
db_query_time_msAverage query time

Custom Metrics

Push custom metrics from your application:

import { Metrics } from "@raidframe/sdk";

const metrics = new Metrics();

// Counter
metrics.increment("api.orders.created");

// Gauge
metrics.gauge("api.active_connections", 42);

// Histogram
metrics.histogram("api.checkout.duration_ms", 340);

Custom metrics appear alongside built-in metrics in the dashboard and are usable as auto-scaling triggers.

Prometheus Endpoint

Export metrics in Prometheus format for Grafana or other tools:

rf metrics prometheus-endpoint
✓ Prometheus endpoint enabled
  Scrape URL: https://metrics-abc123.raidframe.net/metrics
  Token: rf_metrics_****

Distributed Tracing

OpenTelemetry-compatible distributed tracing across all services. See how a request flows through your system.

Setup

Tracing is enabled automatically. RaidFrame injects trace context headers (traceparent, tracestate) into all incoming requests.

For richer traces, use the OpenTelemetry SDK:

import { trace } from "@opentelemetry/api";

const tracer = trace.getTracer("my-service");

app.get("/api/orders", async (req, res) => {
  const span = tracer.startSpan("get-orders");
  const orders = await db.query("SELECT * FROM orders");
  span.setAttribute("order_count", orders.length);
  span.end();
  res.json(orders);
});

View Traces

rf traces --service api --since 1h --min-duration 500ms
TRACE ID           DURATION  SERVICES        STATUS
tr_a8f3b2c1d4e5   890ms     web → api → db  ✓
tr_b9c4d3e5f6a7   1.2s      web → api → db  ✗ (db timeout)
tr_c0d5e4f6a7b8   340ms     web → api       ✓
# View specific trace
rf traces show tr_b9c4d3e5f6a7
tr_b9c4d3e5f6a7 — 1.2s — ERROR
├── [web] GET /api/orders (45ms)
│   └── [api] fetchOrders (1.15s)
│       ├── [db] SELECT * FROM orders WHERE... (1.1s) ⚠ SLOW
│       └── [cache] Redis GET orders:cache (2ms) MISS

Error Tracking

Errors are automatically grouped, deduplicated, and linked to the deployment that introduced them.

rf errors --service api --since 24h
ERROR                                    COUNT  FIRST SEEN    LAST SEEN     DEPLOY
TypeError: Cannot read property 'id'     142    2h ago        5m ago        v42
PrismaClientKnownRequestError: timeout   23     6h ago        1h ago        v41
# View error details
rf errors show "TypeError: Cannot read property 'id'"

Shows stack trace, affected endpoints, request samples, and a link to the specific deployment diff.

Status Page

Generate a public status page for your users:

rf status-page enable --domain status.myapp.com
✓ Status page live at https://status.myapp.com
  Monitors: web (HTTP), api (HTTP), pg-main (TCP)
  Auto-updated from health checks

The status page shows real-time service health, uptime percentages, and incident history. No third-party service needed.