Back to Blog

How We Reduced Query Latency by 80% for a Series B Startup

February 4, 2026  ·  11 min read

[Image: Before/after query latency comparison chart showing P95 improvement - ]

The customer was a B2B analytics product — a platform that let its own customers explore their usage data through an embedded dashboard. The setup made query latency doubly important: slow queries didn't just frustrate their internal team, they frustrated their customers' customers. Every additional second of load time on their embedded charts was visible in their churn numbers.

When they came to us, P95 query latency was sitting at 9.4 seconds. Median was 2.1 seconds, which sounds acceptable until you realize P95 represents real user sessions, not outliers. Their NPS scores for the analytics feature were dragging down the whole product's ratings.

Here's what we found, and what we changed.

Step 1: Profiling the actual slow queries

The first thing we did was pull three weeks of query logs and categorize them by execution time. This sounds obvious, but the team hadn't done it systematically — they knew things were slow, but they didn't know which queries were responsible for the tail latency.

The distribution was striking. 73% of queries ran in under 800ms. The remaining 27% accounted for 89% of total query execution time. Within that 27%, a specific set of queries involving multi-tenant cross-joins were consistently the worst performers. These were queries that joined user event tables against account-level metadata — common in multi-tenant analytics products, but expensive when not optimized.

Knowing this shaped everything else. Instead of a general optimization effort, we had a specific target.

Step 2: Data model restructuring

The event tables were normalized, which made sense for a transactional database but was wrong for an analytics workload. Every query that needed account context had to join the events table (hundreds of millions of rows) against the accounts table (a few hundred thousand rows) using a UUID foreign key. The optimizer was doing this join on every query execution.

We denormalized. Account attributes — plan tier, industry, company size — were materialized directly into the event table as additional columns. This added storage (about 18% more data on disk) but eliminated the join entirely. The cross-join queries that were taking 8-12 seconds dropped to 1.2-1.8 seconds with no other changes.

Denormalization is often treated as a code smell. In OLAP workloads, it's usually the right call. The tradeoff — higher storage cost, more complex data ingestion — is almost always worth it when the alternative is multi-second joins on hot query paths.

Step 3: Partition strategy overhaul

The event table was partitioned by day. That's a reasonable default, but it was wrong for this workload. Most customer queries were time-filtered AND account-filtered — "show me events for account X over the last 30 days." The day partitioning helped with the time filter but didn't help with the account filter at all, so the query was still scanning every account's data for each of the 30 day partitions it touched.

We switched to a composite partition key: account ID as the primary partition dimension, day as the secondary. Now queries filtering by account ID only scanned that account's partition, not the full dataset. For their largest customers (accounts with dense event histories), this was the single biggest improvement — P95 for large-account queries dropped from 11 seconds to 2.3 seconds.

Step 4: Pre-aggregation for common patterns

After profiling, we identified five query patterns that accounted for 60% of all dashboard loads: daily active users, feature usage by tier, event funnel completion, session duration distribution, and error rate by endpoint. These were run thousands of times per day, often with the same parameters.

We built pre-aggregated materialized views that updated in near-real-time as new events arrived. Dashboard queries for these patterns now read from the materialized view rather than scanning raw events. Latency for these queries went from an average of 3.4 seconds to 94 milliseconds.

The maintenance tradeoff: materialized views need refresh logic, schema changes need to propagate, and you need to keep the view current. But for read-heavy dashboards with predictable query patterns, it's a substantial gain for modest complexity cost.

The results

After six weeks of work — profiling, schema changes, re-ingestion, query tuning — the numbers looked like this: median query latency dropped from 2.1 seconds to 0.4 seconds. P95 dropped from 9.4 seconds to 1.8 seconds. That's an 81% reduction at the 95th percentile.

The customer saw NPS for their analytics feature increase by 22 points over the following quarter. They attributed a measurable reduction in analytics-related churn directly to the latency improvements.

The lesson isn't that these specific techniques always work — it's that the work started with measurement, not assumptions. The fixes that mattered most weren't the ones we expected going in. Profiling first, then optimizing, is the only way to avoid spending weeks on the wrong problem.

CoreCast AI provides built-in query profiling, materialized view management, and automatic partition optimization — so you don't have to do this manually.

Start Your Free Trial or Back to Blog