I'm thrilled to share that I’ve joined Yugabyte as a technical Product Manager! This role is particularly meaningful to me as it blends the best of both worlds: hands-on engineering work with strategic customer engagement. It allows me to collaborate across diverse teams—engineering, marketing, solutions architects, and support—providing a comprehensive view of the customer experience and the chance to design impactful features that truly delight our users.
You might wonder why I’m joining a company based on Postgres compatibility, especially after my recent tweet (and LinkedIn post) where I boldly stated:
"The best database out there, by far, is Oracle. It works in the highest percentage of situations. Now, if you think PostgreSQL works for you, you're lucky, because it works for some cases and doesn’t for others. If you're using MySQL and it works, you're even luckier. It's not because MySQL or PostgreSQL are as good as Oracle, it's just that you're lucky."
So why Yugabyte? Yugabyte is taking Postgres to the next level, addressing and re-architecting areas where standard Postgres has traditionally struggled. This opens the door for Yugabyte to compete in the upper echelons of database performance—territory that, until now, has been almost exclusively Oracle’s.
Some of my top priorities for Postgres have been:
Addressing issues with vacuuming, bloat, and transaction wraparound
Enhancing observability and performance instrumentation
Yugabyte is actively tackling these concerns, among others, which I’ll explore in future posts, including improvements in sharding, connection management, and instant cloning (a particular passion of mine from my years at Delphix).
My initial focus at Yugabyte will be on observability, specifically extending wait events and performance instrumentation. I’m thrilled about this, as it aligns perfectly with my past work at Oracle, Embarcadero, Amazon RDS, and Datadog.
For this post, I want to dive into one particular innovation: how vacuuming has been alleviated in Yugabyte.
Why Vacuuming Is No Longer Needed in Yugabyte
Yugabyte has eliminated the notorious txn problem and vacuuming issues. Yugabyte leverages Log-Structured Merge (LSM) trees instead of traditional heap tables, using SST (Sorted Sequence Table) files as storage. This shift to LSM trees and SST files eliminates the need for vacuuming by efficiently handling data writes and deletions, unlike traditional Postgres storage that requires vacuuming to reclaim space from deleted or updated rows. Here’s how LSM and SST structures manage this:
Immutable SST Files and Append-Only Writes
In an LSM tree structure, data is written in an append-only manner to an in-memory structure (memtable), which is then flushed to SST files on disk. Once data is written to an SST file, it becomes immutable.
This immutability removes the need to revisit or “vacuum” old files, as new data is simply appended as new SST files. Deletes and updates are managed through merging SST files , rather than physically removing them immediately. The time window for retaining old data can be configured (think like undo retention on Oracle)
Compaction Instead of Vacuuming
Compaction in LSM trees merges multiple SST files to remove obsolete data and keeps data organized to reduce read amplification.
During compaction:
Deleted records are permanently removed when versions are no longer required.
Updated records are merged so that only the latest version is retained.
This background compaction effectively “cleans up” data without a dedicated vacuuming process, allowing ongoing operations without locking or latching.
Garbage Collection During Reads
The system only reads the latest SST files or the latest data version marked as current, ignoring obsolete entries.
Compactions eventually eliminate these obsolete entries without needing manual vacuuming.
Reduction of Storage Fragmentation
The sequential, append-only nature of LSM minimizes fragmentation. Compaction further consolidates data, reducing the need for any extra space-reclamation processes.
Summary: How LSM and SST Structures Avoid Vacuuming
Immutability of SST Files: No need for in-place updates, eliminating file cleanup.
Compaction Process: Regularly merges and reclaims obsolete data dynamically.
No Fragmentation Issues: Sequential data organization avoids fragmentation, common in heap tables and B-Tree indexes, which typically require vacuuming.
In essence, the compaction mechanism in LSM trees, combined with the immutable nature of SST files, naturally manages data without vacuuming, making LSM structures highly efficient for write-heavy and frequently updated workloads. The primary disadvantage of LSM trees—space amplification during compaction—is minimized here by sharding. Each LSM tree is per tablet (shard), with compaction staggered across shards.
I just started diving into this technology yesterday so look forward to providing more details and insights as I get ramped.
Appendix
Yugabyte Enterprise Level improvements over Postgres
For more details, here are a couple of great posts by Franck Pachot, detailing Yugabyte’s innovative solutions to address the major issues faced with standard Postgres.
More information on LSM/STT
Youtube:
On Docs:
Hybrid Logical clocks - Gets XID wrapping
Comments