SQL and B-trees as an API

After spending years optimizing database performance in high-load applications, I've developed a somewhat controversial opinion: SQL query planners, while incredibly sophisticated, might be fundamentally misaligned with the needs of modern, performance-critical applications. Let me explain why, and propose an alternative approach that could help developers write more predictable, scalable database queries by design.

The Problem: SQL's Deceptive Flexibility

As someone whose job security has partly relied on fixing poorly performing SQL queries, I've witnessed firsthand how deceivingly easy it is to introduce performance problems in database operations. SQL's flexibility is both its greatest strength and its biggest weakness when it comes to building scalable applications.

Consider these common scenarios:

  • Adding a seemingly innocent WHERE clause
  • Changing an ORDER BY statement
  • Joining with another table "just to get one more field"

Any of these changes can completely invalidate assumptions about index usage and query performance. What's worse, these performance implications often only become apparent under load or as your data grows.

-- Looks innocent enough, right?
SELECT * FROM users 
WHERE last_login > DATE_SUB(NOW(), INTERVAL 30 DAY)
  AND status = 'active'
ORDER BY created_at DESC
LIMIT 100;

The above query might perform well with 10,000 users. But what happens when you hit 1 million users? The query planner's decisions about index usage might completely change, potentially leading to full table scans or expensive sorting operations.

The Root Cause: Divergent Optimization Goals

The fundamental issue lies in how traditional RDBMS systems like MySQL, PostgreSQL, and SQLite3 are designed. They separate two crucial aspects that should be tightly coupled:

  • The query definition (what data you want)
  • The access path (how to efficiently retrieve that data)

This design is almost counter-productive: you can write any query you want, but you also need to ensure it's properly indexed. Over time, these two aspects tend to diverge as queries evolve but indexes remain static.

Compromises with InnoDB

I've always been inspired by Cassandra's approach 1 to this problem. In Cassandra, if you want to change your query pattern (like adding a new WHERE clause), you must change the table structure itself. While this might seem restrictive, it creates a powerful guarantee: your data model explicitly supports your access patterns.

For several years, I've contemplated the idea of a "strict mode" for MySQL that would only allow queries with exactly matching indexes. However, it has shown that even having a matching index doesn't guarantee optimal performance. Here's a counterintuitive example: Consider a table with a primary key lookup - supposedly one of the most efficient operations in a database:

DELETE * FROM users WHERE id IN (1, 2, 3, 4, 5);

You might expect InnoDB to perform five point lookups using the primary key index. However, if the table is relatively small, InnoDB might choose 2 to perform a full table scan instead! The query planner makes this seemingly bizarre choice because it estimates that reading the entire table sequentially from disk would be faster than performing multiple random I/O operations to traverse the B-tree index. InnoDB is very non-deterministic.

This can lead to some truly unexpected behavior: multiple transactions querying different primary keys (which should be completely independent operations) might cause conflicts and deadlocks. This is particularly surprising since these transactions are working with completely different rows and theoretically shouldn't interfere with each other at all.

The core issue here isn't that InnoDB makes the choice for p50 latency - it might actually be faster on average to scan the table. The problem is that this optimization introduces contention and unpredictability that can severely impact p99 performance in high-concurrency scenarios.

A Proposal: B-tree-First API Design

After years of dealing with these issues, I've come to believe we need a different approach. What if, instead of exposing SQL as the primary interface, we exposed B-trees as the fundamental API primitive?

B-trees support these operations efficiently:

  • Exact key lookups
  • Range scans
  • Prefix searches
  • MIN/MAX retrieval

By constraining developers to these operations, we could guarantee:

  • Predictable query complexity and execution time
  • Explicit index requirements for every query pattern
  • No surprise full table scans or temporary tables
# Hypothetical B-tree-first API
user_index = DB.index('users_by_status_created')
active_users = user_index.range_scan(
  start: ['active', MIN_TIMESTAMP],
  end: ['active', MAX_TIMESTAMP],
  reverse: true,
  limit: 100
)

Additionally, it's critial that it supports partial indexes – indexes that only include rows matching certain conditions 3.

Wrap up

This approach would provide several guarantees that are hard to achieve with traditional SQL:

  • O(log n) complexity for all operations
  • No unexpected query plan changes as data grows
  • Clear resource utilization patterns
  • Better cache utilization due to predictable access patterns

While implementing this in MySQL would be a significant undertaking, I'm curious about experiences from other database communities, particularly PostgreSQL users. It's possible that PostgreSQL's query planner is sophisticated enough that these issues are less prevalent.

I'd love to hear your thoughts:

  • Have you encountered similar challenges with query planners?
  • How do you ensure predictable performance in your database operations?
  • What other approaches have you seen for solving these problems?

Further reading

Footnotes

  1. Cassandra: Data Modeling Concepts

  2. MySQL Bug #112523

  3. Partial Indexes in Postgres

Written in November 2024.
Kir Shatrov

Kir Shatrov helps businesses to grow by scaling the infrastructure. He writes about software, scalability and the ecosystem. Follow him on Twitter to get the latest updates.