How NoSQL Databases Write So Quickly
NoSQL databases are popular for ingesting large amounts of data quickly. But why? Most relational databases rely on B-Trees, which are optimized for reads, and are expensive to update. This limits ingestion speed. So how do NoSQL databases achieve better write performance? Databases like CassandraDB and HBase use a structure called a “Log-Structured Merge” tree. To understand how this tree makes writes more efficient, we need to break down how these databases ingest and write data to disk.
When new data is ingested, the data is stored in a memtable - these are in-memory tables that typically use object-key mapping, and are usually implemented as a balanced binary tree. Storing data in memory can get expensive, so when a memtable reaches a certain size, it is flushed to disk in a special format - an immutable sorted string table (SSTable). SSTables are written sequentially to disk, making them very quick to store. The most recent SSTable is added to the LSM tree. Each SSTable represents a small chronological subset of incoming changes. SSTables are immutable, so updates are actually new entries that supersede old entries for a given key. To delete an entry, a tombstone is added to a given key, marking it as deleted. Counterintuitively, deleting a record actually takes up space.
When new data is added to the database (on disk), it is first added to the base level. The base level is a sorted data structure, such as a B-tree, that is optimized for inserting new data quickly. As the base level grows, it is periodically compacted and merged with the next level, creating a new, higher level. This process continues until the highest level is reached. As you can see, B-trees can still be leveraged by NoSQL databases - data is compacted, segmented, and then turned into B-trees in a hierarchy.
The lowest level is called the base level, and it contains the most recent data. The higher levels are called the merge levels, and they contain older data that has been compacted and merged together.
When data is retrieved from the LSM tree, the search starts at the highest level and works its way down to the base level. This allows the search to be more efficient, as the higher levels contain less data and are therefore faster to search.
The advantage of LSM trees is that they are designed to be very efficient at both inserting new data and searching for existing data. They are also highly scalable, as new machines can be added to the cluster to increase capacity, and the data can be distributed across multiple machines.
However, LSM trees are not suitable for all use cases. They are not well suited for workloads that involve frequent updates or deletions, as these operations can be expensive and can cause the tree to become fragmented over time. Additionally, LSM trees can have a higher write amplification, which means that the data written in the disk is more than the data that is actually stored.
Now you have a solid understanding of LSMs, and how some NoSQL databases use them to get crazy fast write performance compared to traditional relational databases. If you have any questions or comments, please reach out. Until next time!