Engineering

How We Serve Real-Time Dashboard Updates at Scale

March 28, 2025 11 min read

Real-time dashboards sound simple until you actually build one. "Just re-query on a timer" works fine for one user. For hundreds of dashboards each querying multiple data sources every 30 seconds, it becomes a very fast way to kill a database. Here's the architecture we landed on after two rewrites.

Attempt 1: Polling per client

The first version was embarrassingly simple: each browser tab had a setInterval that fired an API call every 30 seconds per chart. Ten charts on a dashboard, 30-second interval, 50 concurrent users — that's 1,000 database queries per minute from refresh alone, before anyone asked a question. We hit the database limit within two weeks of the beta.

Attempt 2: Server-sent events with query deduplication

We moved to SSE and added a query deduplication layer. If 50 users were watching the same dashboard, they'd all subscribe to the same SSE stream, and we'd run the underlying query once, fanning out the result. Deduplication was keyed on a hash of (dashboard_id, component_id, applied_filters).

This cut query volume by ~60% in practice. The problem was SSE connection limits — browsers cap concurrent SSE connections per origin at 6, which meant dashboards with more than 6 charts started queuing. Not great.

Attempt 3: WebSocket fan-out with a result cache

The current architecture uses a single WebSocket connection per browser tab (no per-chart connections) and a server-side fan-out model. Here's how it works:

1Browser opens one WebSocket connection per tab, sends a subscription message listing all component IDs on the current dashboard.
2The server maintains a subscription map: component_id → Set<WebSocket>.
3A separate refresh scheduler runs outside the request path. It batches components that are due for refresh, deduplicates by (connection_id, query_hash), and dispatches queries to a worker pool.
4When a query result comes back, the scheduler writes it to a short-lived Redis cache (TTL: 2× the refresh interval) and fans it out to every subscribed WebSocket.
5Reconnecting clients get the cached result immediately instead of waiting for the next refresh cycle.

Numbers after the rewrite

–78%

Database queries at peak load

< 800ms

P95 update latency

3,200

Concurrent dashboard subscribers

What we'd do differently

The refresh scheduler is still a single-process bottleneck. It works at our current scale but we know it needs to become a distributed scheduler (probably using a Redis sorted set as a priority queue) before we get to 10× current load. We're planning that work for Q3.

We also wish we'd invested in the query result cache earlier. The first two attempts wasted a lot of database reads on users who reconnected mid-session and had to wait for a full refresh cycle. The cache makes reconnect invisible.

We're hiring engineers who want to work on problems like this. If distributed systems and real-time data infrastructure are your thing, check out our open roles.