Start of main content
How ScyllaDB makes LSM-tree compaction state-of-art by leveraging RUM conjecture and controller theory
LSM-tree storage engines are known for providing very fast write throughput, as they allow new data to be flushed into immutable files called Sorted String Table, a.k.a. SSTable. But there's no such thing as a free lunch. This append-only nature of the LSM tree generates read amplification over time, because a given key can be redundantly stored across multiple immutable files. That's the reason storage engines have to employ a background compaction process, to reduce both read and space amplification. Compaction itself is very complex and it plays an essential role in the system's performance. There are multiple compaction policies, and each suits a particular workload better. For example, leveled policy is better for workloads performing overwrites that are very sensitive to read latency, but that comes at the cost of higher write amplification, meaning that the system cannot keep up with the write rate unless you throw faster disks or more instances at it. On the other hand, there's the tiered policy that favors write performance but at the cost of higher read and space amplification. There can also be a policy which mixes both tiers and levels, to have something in between, efficiency wise. The RUM conjecture, which states that only two amplification factors can be optimized at the expense of a third, is used by engineers to come up with efficient policies for a variety of workloads. A good example is the ScyllaDB incremental compaction approach that breaks large files into fixed-size fragments, to fix the 100% space overhead that affects most implementations of tiered policy out there. ScyllaDB also leverages industrial controller theory to avoid the endless tuning cycle for compaction speed. If compaction is slower than it should be, the compaction backlog increases, resulting in bad read latency and potentially the system running out of disk space. If compaction is running too fast, then resources are being needlessly stolen from foreground requests, affecting both latency and throughput. So nothing is better than a closed-loop control system that measures backlog and adjusts compaction speed accordingly. That frees your DBAs to perform more important tasks.