How we read a blog post at breakfast and had a verified SQLite patch before lunch.
It started, as many dangerous ideas do, with a blog post and a cup of coffee.
FloeDB had published a deceptively simple observation: Two Bits Are Better Than One. The premise? Instead of setting one bit per key in a Bloom filter, you set two bits in the same 32-bit word. Same number of memory accesses. Same cache line touch. But mathematically, you cut your false-positive rate roughly in half. It's the kind of idea that makes you mutter "why didn't anyone do this already?" — which, as it turns out, is exactly what we muttered. And then we checked.
SQLite does use Bloom filters. Has for years. They live deep inside the VDBE — the Virtual Database Engine — where OP_Filter and OP_FilterAdd quietly skip unnecessary index seeks during joins. They're the reason your analytic queries over multi-million-row tables finish in milliseconds instead of seconds. And they use, naturally, one bit per key.
The clock was at 18:00. By 20:00, we'd have a working, benchmarked, committed patch.
Here's how.
Act I: The Prompt (18:00 – 18:15) #
We didn't start by reading SQLite source code. We started by talking to an LLM.
"How does SQLite use Bloom filters?" — ChatGPT walked us through the VDBE opcodes, the filterHash() function, the blob-backed bitset in a register, the allocation logic in where.c. It pointed us to forum threads where drh himself discussed the filter sizing heuristics and a past "n bytes vs n×8 bits" bug. It gave us the exact file paths, the exact function signatures.
In fifteen minutes, we had a mental model of the entire subsystem. No grep. No scrolling. Just a conversation.
Then we wrote CHANGE_REQUEST.md — a 380-line design document that specified:
- The exact data layout (interpret the blob as 32-bit words)
- The bit derivation (
b1 = (h >> 32) & 31,b2 = (h >> 37) & 31) - Four hashing variants for an ablation study
- Eight implementation tasks
- Acceptance criteria
- Upstream PR guidance
This wasn't a vague "make it faster" ticket. This was a surgical specification, written collaboratively between a human who understood the why and an LLM that could instantly surface the where and how.
Act II: The Build (18:15 – 19:00) #
We handed the change request to an AI coding agent. What followed was an exercise in directed chaos.
SQLite's codebase is… particular. The VDBE main loop in vdbe.c is a ~9,000 line switch statement — one of the largest in any open-source project. Comments warn you: "The formatting of the code in this file is, therefore, important." The amalgamation build (sqlite3.c) is generated by a Tcl script that concatenates everything into one file. Touch the wrong #ifdef and you get cryptic errors about "unterminated conditional directive" in a 250,000-line generated file.
We learned this the hard way. Twice.
The agent's approach was to write an apply_changes.py script that surgically patches vdbe.c and where.c from their clean upstream state. This let us:
git checkout src/vdbe.c src/where.cto resetpython3 apply_changes.pyto re-applymake sqlite3.c && make sqlite3to rebuild- Test immediately
The patches themselves were surprisingly small:
- 26 lines of helper functions (
bloomMix,bloomHashBytes,bloomGetWord,bloomPutWord) - A new
filterHash()with four compile-time hashing variants - ~20 lines each for modified
OP_FilterAddandOP_Filteropcodes, gated behind#ifdef SQLITE_BLOOM2 - 1 line in
where.cto word-align the filter allocation
That's it. The entire semantic change to one of the world's most widely deployed databases fits in about 80 lines of C.
Act III: The Bug Hunt (19:00 – 19:20) #
Of course, it didn't work on the first try. Our text joins showed a 100% false-positive rate — the Bloom filter was doing absolutely nothing.
Integer joins? Fine. 25,769 filter hits (skips), 231,956 misses (probes). Healthy numbers. But text joins? Zero hits. Zero misses. The filter was a ghost.
This is where the debugging got interesting. We added targeted fprintf(stderr, ...) traces to the opcodes — just enough to see:
Bloom ADD: h=16599355121137259014 idx=2259014 mask=1200 nWord=2500000 hash_variant=2
Bloom Probe: h=8987406483371559633 idx=1559633 mask=100000 w=100000 match=1
Hashes were diverse. Masks were correct. Word contents matched. So why was the filter passing everything?
The answer was embarrassingly simple: we'd forgotten to compile stat_check (our measurement harness) with -DSQLITE_BLOOM2=1. The binary was using the new filter for adds but falling through to the old 1-bit test path for probes. The filter was working perfectly — we just couldn't see it through the wrong window.
One flag. Twenty minutes of confusion. Such is debugging.
Act IV: The Numbers (19:20 – 19:35) #
With the correct flags, the results were immediate and unambiguous:
| Query | Filter Skips | Probes | Skip Rate |
|---|---|---|---|
| Integer Join (Q1) | 25,769 | 231,956 | 10% |
| Integer Anti-Join (Q2) | 257,672 | 53 | 99.98% |
| Text Anti-Join (Q5) | 257,671 | 54 | 99.98% |
Read that last row again. 257,671 out of 257,725 probes were skipped. Only 54 false positives out of a quarter million checks. On text keys. With byte-level hashing.
The baseline 1-bit filter? On text anti-joins, it achieved zero skips — because the original filterHash() doesn't hash the actual bytes of strings. It hashes a constant derived from the type flags. Every string produces the same hash. Every probe hits the same bit. The filter is useless for text.
Our H2 variant — which runs the actual string bytes through a simple h = h*31 + byte loop, then finalizes with a splitmix64 mixer — costs almost nothing in CPU time but transforms the filter from decorative to devastating.
The Takeaway #
Let's be honest about what happened here:
- A human read a blog post and recognized it could apply to SQLite.
- An LLM provided instant domain expertise — file paths, function signatures, historical bugs, forum context.
- A human wrote a precise specification using that knowledge.
- An AI agent implemented the specification, wrestling with build systems, amalgamation scripts, and 9,000-line switch statements.
- A human debugged the final issue (a missing compiler flag — always the stupidest bug).
- The result was verified with quantitative evidence.
Act V: The Caveats (or, Why it's a Proof of Concept) #
Let’s be clear: we did not build a "Production-Ready PR." We built a Performance Demonstration.
To prove the potential of the 2-bit idea in just two hours, we took a few surgical shortcuts that would never pass a real code review at SQLite HQ:
- Hardcoded Sizing: We forced the Bloom filter to a massive 10MB (
sz = 10000000) to see its theoretical maximum effectiveness, bypassing SQLite’s internal sizing heuristics. - Forced Activation: We stripped away cost-model requirements (
WHERE_SELFCULL) that might normally prevent a Bloom filter from activating, just to ensure we could test it. - Diagnostic Noise: We left heavy debug traces in the opcodes to verify hash distribution in real-time.
We weren't interested in tuning a production cost model today. We were interested in proving that the core mechanic—two bits, one word, better hashing—actually works. And it does.
The Takeaway #
Total time: ~2 hours. From "huh, interesting blog post" to a committed, benchmarked, working patch on one of the most battle-tested codebases in existence.
Is this going to land in upstream SQLite? Maybe, maybe not. drh is famously conservative, and rightfully so — SQLite runs on literally billions of devices. But the proof is there: the idea works, the implementation is clean, the numbers are real, and the change is gated behind a compile-time flag that defaults to off.
More importantly, the process is the story. Not because "AI wrote code" — that's table stakes in 2026. Because an LLM turned a blog post into actionable intelligence in fifteen minutes, and an AI agent turned a specification into a working patch in ninety more. The human's job was what it should be: having the idea, writing the spec, and catching the dumb bug at the end.
Two bits are better than one. And two minds — one biological, one silicon — are better than either alone.
The patch lives on branch experimental/bloom2 at commit f00c839. The change request, benchmarks, and full implementation are available for review. We welcome feedback, skepticism, and better hashing functions.