Two Hours, Two Bits, and One Very Old Database

· origo's blog


How we read a blog post at breakfast and had a verified SQLite patch before lunch.


It started, as many dangerous ideas do, with a blog post and a cup of coffee.

FloeDB had published a deceptively simple observation: Two Bits Are Better Than One. The premise? Instead of setting one bit per key in a Bloom filter, you set two bits in the same 32-bit word. Same number of memory accesses. Same cache line touch. But mathematically, you cut your false-positive rate roughly in half. It's the kind of idea that makes you mutter "why didn't anyone do this already?" — which, as it turns out, is exactly what we muttered. And then we checked.

SQLite does use Bloom filters. Has for years. They live deep inside the VDBE — the Virtual Database Engine — where OP_Filter and OP_FilterAdd quietly skip unnecessary index seeks during joins. They're the reason your analytic queries over multi-million-row tables finish in milliseconds instead of seconds. And they use, naturally, one bit per key.

The clock was at 18:00. By 20:00, we'd have a working, benchmarked, committed patch.

Here's how.


Act I: The Prompt (18:00 – 18:15) #

We didn't start by reading SQLite source code. We started by talking to an LLM.

"How does SQLite use Bloom filters?" — ChatGPT walked us through the VDBE opcodes, the filterHash() function, the blob-backed bitset in a register, the allocation logic in where.c. It pointed us to forum threads where drh himself discussed the filter sizing heuristics and a past "n bytes vs n×8 bits" bug. It gave us the exact file paths, the exact function signatures.

In fifteen minutes, we had a mental model of the entire subsystem. No grep. No scrolling. Just a conversation.

Then we wrote CHANGE_REQUEST.md — a 380-line design document that specified:

This wasn't a vague "make it faster" ticket. This was a surgical specification, written collaboratively between a human who understood the why and an LLM that could instantly surface the where and how.


Act II: The Build (18:15 – 19:00) #

We handed the change request to an AI coding agent. What followed was an exercise in directed chaos.

SQLite's codebase is… particular. The VDBE main loop in vdbe.c is a ~9,000 line switch statement — one of the largest in any open-source project. Comments warn you: "The formatting of the code in this file is, therefore, important." The amalgamation build (sqlite3.c) is generated by a Tcl script that concatenates everything into one file. Touch the wrong #ifdef and you get cryptic errors about "unterminated conditional directive" in a 250,000-line generated file.

We learned this the hard way. Twice.

The agent's approach was to write an apply_changes.py script that surgically patches vdbe.c and where.c from their clean upstream state. This let us:

  1. git checkout src/vdbe.c src/where.c to reset
  2. python3 apply_changes.py to re-apply
  3. make sqlite3.c && make sqlite3 to rebuild
  4. Test immediately

The patches themselves were surprisingly small:

That's it. The entire semantic change to one of the world's most widely deployed databases fits in about 80 lines of C.


Act III: The Bug Hunt (19:00 – 19:20) #

Of course, it didn't work on the first try. Our text joins showed a 100% false-positive rate — the Bloom filter was doing absolutely nothing.

Integer joins? Fine. 25,769 filter hits (skips), 231,956 misses (probes). Healthy numbers. But text joins? Zero hits. Zero misses. The filter was a ghost.

This is where the debugging got interesting. We added targeted fprintf(stderr, ...) traces to the opcodes — just enough to see:

Bloom ADD: h=16599355121137259014 idx=2259014 mask=1200 nWord=2500000 hash_variant=2
Bloom Probe: h=8987406483371559633 idx=1559633 mask=100000 w=100000 match=1

Hashes were diverse. Masks were correct. Word contents matched. So why was the filter passing everything?

The answer was embarrassingly simple: we'd forgotten to compile stat_check (our measurement harness) with -DSQLITE_BLOOM2=1. The binary was using the new filter for adds but falling through to the old 1-bit test path for probes. The filter was working perfectly — we just couldn't see it through the wrong window.

One flag. Twenty minutes of confusion. Such is debugging.


Act IV: The Numbers (19:20 – 19:35) #

With the correct flags, the results were immediate and unambiguous:

Query Filter Skips Probes Skip Rate
Integer Join (Q1) 25,769 231,956 10%
Integer Anti-Join (Q2) 257,672 53 99.98%
Text Anti-Join (Q5) 257,671 54 99.98%

Read that last row again. 257,671 out of 257,725 probes were skipped. Only 54 false positives out of a quarter million checks. On text keys. With byte-level hashing.

The baseline 1-bit filter? On text anti-joins, it achieved zero skips — because the original filterHash() doesn't hash the actual bytes of strings. It hashes a constant derived from the type flags. Every string produces the same hash. Every probe hits the same bit. The filter is useless for text.

Our H2 variant — which runs the actual string bytes through a simple h = h*31 + byte loop, then finalizes with a splitmix64 mixer — costs almost nothing in CPU time but transforms the filter from decorative to devastating.


The Takeaway #

Let's be honest about what happened here:

  1. A human read a blog post and recognized it could apply to SQLite.
  2. An LLM provided instant domain expertise — file paths, function signatures, historical bugs, forum context.
  3. A human wrote a precise specification using that knowledge.
  4. An AI agent implemented the specification, wrestling with build systems, amalgamation scripts, and 9,000-line switch statements.
  5. A human debugged the final issue (a missing compiler flag — always the stupidest bug).
  6. The result was verified with quantitative evidence.

Act V: The Caveats (or, Why it's a Proof of Concept) #

Let’s be clear: we did not build a "Production-Ready PR." We built a Performance Demonstration.

To prove the potential of the 2-bit idea in just two hours, we took a few surgical shortcuts that would never pass a real code review at SQLite HQ:

We weren't interested in tuning a production cost model today. We were interested in proving that the core mechanic—two bits, one word, better hashing—actually works. And it does.


The Takeaway #

Total time: ~2 hours. From "huh, interesting blog post" to a committed, benchmarked, working patch on one of the most battle-tested codebases in existence.

Is this going to land in upstream SQLite? Maybe, maybe not. drh is famously conservative, and rightfully so — SQLite runs on literally billions of devices. But the proof is there: the idea works, the implementation is clean, the numbers are real, and the change is gated behind a compile-time flag that defaults to off.

More importantly, the process is the story. Not because "AI wrote code" — that's table stakes in 2026. Because an LLM turned a blog post into actionable intelligence in fifteen minutes, and an AI agent turned a specification into a working patch in ninety more. The human's job was what it should be: having the idea, writing the spec, and catching the dumb bug at the end.

Two bits are better than one. And two minds — one biological, one silicon — are better than either alone.


The patch lives on branch experimental/bloom2 at commit f00c839. The change request, benchmarks, and full implementation are available for review. We welcome feedback, skepticism, and better hashing functions.

last updated: