Deployment
Kubernetes
rtpbridge is designed for k8s deployment with graceful shutdown support.
Graceful Shutdown
On SIGINT or SIGTERM:
- Stop accepting new WebSocket connections
- Stop accepting new
session.createrequests (returnsSHUTTING_DOWNerror) - Wait for all active sessions to complete naturally
- Force shutdown after
shutdown_max_wait_secs(default: 300 = 5 minutes)
During graceful shutdown, existing orphaned sessions remain attachable until their disconnect_timeout_secs expires or the server exits. Only new session.create calls are rejected with SHUTTING_DOWN. WebSocket connections already established continue to function normally.
Set terminationGracePeriodSeconds in your pod spec to match or exceed shutdown_max_wait_secs.
Example Pod Spec
apiVersion: v1
kind: Pod
spec:
terminationGracePeriodSeconds: 300
containers:
- name: rtpbridge
image: ghcr.io/zyno-io/rtpbridge:latest
args:
- --listen=0.0.0.0:9100
- --media-ip=$(POD_IP)
- --config=/etc/rtpbridge/config.toml
ports:
- containerPort: 9100
name: control
protocol: TCP
- containerPort: 30000
name: rtp-start
protocol: UDP
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
resources:
requests:
cpu: 250m
memory: 128Mi
limits:
cpu: "2"
memory: 512Mi
volumeMounts:
- name: config
mountPath: /etc/rtpbridge
volumes:
- name: config
configMap:
name: rtpbridge-configResource requirements depend on workload — transcoding (especially Opus) is CPU-intensive. Monitor actual usage and adjust accordingly.
Port Ranges
For plain RTP endpoints, you need to expose the configured rtp_port_range (default: 30000-39999). WebRTC endpoints use ICE and can work with any available ports.
Health Checks
rtpbridge exposes a simple HTTP health endpoint:
GET /health → 200 {"status":"ok"}Use this for k8s liveness and readiness probes:
livenessProbe:
httpGet:
path: /health
port: 9100
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 9100
initialDelaySeconds: 2
periodSeconds: 5Prometheus Metrics
rtpbridge exposes metrics at GET /metrics on the control plane port in OpenMetrics text format (Prometheus 0.0.4 compatible).
Scrape Config
scrape_configs:
- job_name: rtpbridge
scrape_interval: 15s
metrics_path: /metrics
static_configs:
- targets: ["rtpbridge:9100"]In Kubernetes with service discovery:
scrape_configs:
- job_name: rtpbridge
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: rtpbridge
action: keep
- source_labels: [__meta_kubernetes_pod_ip]
target_label: __address__
replacement: "$1:9100"Key Metrics
All metrics use the rtpbridge_ prefix:
| Metric | Type | Description |
|---|---|---|
rtpbridge_sessions_total | Counter | Total sessions created |
rtpbridge_sessions_active | Gauge | Currently active sessions |
rtpbridge_endpoints_total | Counter | Total endpoints created |
rtpbridge_endpoints_active | Gauge | Currently active endpoints |
rtpbridge_packets_routed_total | Counter | Total packets routed |
rtpbridge_srtp_errors_total | Counter | SRTP auth/replay errors |
rtpbridge_transcode_errors_total | Counter | Transcode failures |
rtpbridge_packets_recorded_total | Counter | Total packets recorded to PCAP |
rtpbridge_recordings_active | Gauge | Currently active recordings |
rtpbridge_dtmf_events_total | Counter | DTMF events detected |
rtpbridge_events_dropped_total | Counter | Events dropped (backpressure) |
Suggested Alerts
groups:
- name: rtpbridge
rules:
- alert: RtpbridgeDown
expr: up{job="rtpbridge"} == 0
for: 1m
- alert: HighSrtpErrors
expr: rate(rtpbridge_srtp_errors_total[5m]) > 1
for: 5m
- alert: EventsDropped
expr: rate(rtpbridge_events_dropped_total[5m]) > 0
for: 5mReverse Proxy / TLS
The control plane serves plain WebSocket and HTTP. To add TLS, place a reverse proxy in front of rtpbridge.
nginx
upstream rtpbridge {
server 127.0.0.1:9100;
}
server {
listen 443 ssl;
server_name rtpbridge.example.com;
ssl_certificate /etc/ssl/certs/rtpbridge.crt;
ssl_certificate_key /etc/ssl/private/rtpbridge.key;
location / {
proxy_pass http://rtpbridge;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_read_timeout 86400s;
}
}Set proxy_read_timeout high enough to cover long-lived WebSocket sessions. The media plane (RTP/UDP) is not proxied — it must be directly reachable.
Security
The WebSocket control plane does not implement authentication or encryption. It is expected to be secured at the infrastructure level — for example, via an mTLS service mesh (Istio, Linkerd), a reverse proxy with client certificate verification, or network policies that restrict access to trusted services only.
Do not expose the control plane port directly to untrusted networks.
Systemd
Example unit file for bare-metal or VM deployments:
[Unit]
Description=rtpbridge media bridge
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/bin/rtpbridge --config=/etc/rtpbridge/config.toml
Restart=on-failure
RestartSec=5
User=rtpbridge
Group=rtpbridge
LimitNOFILE=65536
TimeoutStopSec=310
# Security hardening
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/var/lib/rtpbridge/recordings /var/lib/rtpbridge/cache
NoNewPrivileges=yes
PrivateTmp=yes
[Install]
WantedBy=multi-user.targetKey settings:
- TimeoutStopSec should exceed
shutdown_max_wait_secs(default 300s) to allow graceful shutdown. - LimitNOFILE should accommodate the expected number of UDP sockets (2 per RTP endpoint + 1 per WebRTC endpoint + overhead).
- ReadWritePaths must include the recording directory and file cache directory.
Note: The default
cache_diris/tmp/rtpbridge-cache, which is ephemeral and may be cleared on reboot. For production systemd deployments, setcache_dir = "/var/lib/rtpbridge/cache"in your config file to match theReadWritePathsabove.
Operational Guidance
File Descriptor Sizing
Each RTP endpoint uses 2 UDP sockets (RTP + RTCP). WebRTC endpoints use 1. File playback endpoints use 0. Size your file descriptor limit accordingly:
fd_estimate = (max_sessions × avg_endpoints_per_session × 2) + 200For example, 5000 sessions with 2 endpoints each: (5000 × 2 × 2) + 200 = 20,200. Set LimitNOFILE in your systemd unit to at least this value.
Event Backpressure
The control plane uses bounded channels (event_channel_size, default 256) for events. If a WebSocket client reads events slower than they are produced, excess events are dropped and an events.dropped event is sent with the count. Critical events (like session.idle_timeout and recording.stopped) use a separate priority channel (critical_event_channel_size, default 64) to avoid being dropped during bursts of normal events.
If you observe frequent events.dropped events, increase event_channel_size or ensure your client processes events promptly.
Key Prometheus Metrics
The /metrics endpoint exposes Prometheus-format metrics. Key metrics for alerting:
| Metric | Type | Alert Condition |
|---|---|---|
rtpbridge_sessions_active | Gauge | Approaching max_sessions |
rtpbridge_endpoints_active | Gauge | Unusually high per-session count |
rtpbridge_recordings_active | Gauge | Approaching per-session limit |
rtpbridge_sessions_total | Counter | Rate drop indicates service issues |
Load Balancing
WebSocket Sticky Sessions
The WebSocket control plane requires sticky sessions (session affinity). A client must reach the same rtpbridge instance for the duration of its WebSocket connection. Without sticky sessions, session.attach cannot reconnect to an orphaned session on a different node.
For HTTP/1.1 WebSocket upgrade, most load balancers maintain the connection naturally. However, if the client reconnects (e.g., after a network blip), ensure the reconnection reaches the same instance:
- nginx: Use
ip_hashorsticky cookieupstream directive. - HAProxy: Use
stick-tablewithstick on srcor cookie-based persistence. - AWS ALB: Enable sticky sessions with application cookie or duration-based stickiness.
- Kubernetes Ingress: Annotate with
nginx.ingress.kubernetes.io/affinity: cookie.
UDP Media Plane
RTP/UDP media packets must reach the rtpbridge instance directly. Load balancers should not proxy UDP media traffic. Clients learn the media IP and ports from SDP negotiation and send directly.
If running multiple instances behind a load balancer, each instance must advertise its own --media-ip so SDP answers contain the correct reachable address.
Network Topology & NAT
rtpbridge uses ICE-lite for WebRTC endpoints: the server is always the controlled agent and advertises only host candidates. No STUN or TURN servers are used or needed on the server side.
This means:
media_ipmust be directly reachable by all peers. Set it to the public/external IP address if peers connect over the internet.- For NAT environments: Set
media_ipto the public IP and ensure the entirertp_port_range(default 30000-39999) is port-forwarded (UDP) to the rtpbridge host. - WebRTC peers must be able to reach
media_ipdirectly. Since rtpbridge only offers host candidates (no server-reflexive or relay candidates), peers behind symmetric NATs may fail to connect unless they use a TURN server on the peer side. - Plain RTP endpoints use even/odd port pairs from
rtp_port_rangefor RTP/RTCP. Ensure this range is open in your firewall. - Firewall rules: Open
rtp_port_range(UDP) for media traffic and the control port (default 9100 TCP) for WebSocket/HTTP.
Docker
# Pin Rust version to match rust-version in Cargo.toml
FROM rust:1.88-bookworm AS builder
RUN apt-get update && apt-get install -y --no-install-recommends libopus-dev && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY Cargo.toml Cargo.lock ./
# Create stubs so Cargo.toml parses (benches/ excluded by .dockerignore)
RUN mkdir -p src benches \
&& echo 'fn main() {}' > src/main.rs \
&& echo 'fn main() {}' > benches/codec_bench.rs \
&& echo 'fn main() {}' > benches/srtp_bench.rs \
&& echo 'fn main() {}' > benches/routing_bench.rs \
&& echo 'fn main() {}' > benches/pcap_bench.rs \
&& cargo build --release \
&& rm -rf src
COPY . .
RUN touch src/main.rs && cargo build --release
FROM debian:bookworm-20250317-slim
RUN apt-get update && apt-get install -y --no-install-recommends libopus0 libssl3 ca-certificates && rm -rf /var/lib/apt/lists/* \
&& useradd -r -s /sbin/nologin rtpbridge \
&& mkdir -p /var/lib/rtpbridge/recordings /var/lib/rtpbridge/media /var/lib/rtpbridge/cache \
&& chown -R rtpbridge:rtpbridge /var/lib/rtpbridge
COPY --from=builder /app/target/release/rtpbridge /usr/local/bin/rtpbridge
EXPOSE 9100
USER rtpbridge
ENTRYPOINT ["rtpbridge"]For container health checks, use your orchestrator's native mechanism (e.g., Kubernetes livenessProbe) pointed at GET /health on the control port.