Apache Cassandra Write Path, Compaction and Use Cases in 3 Minutes

Over the last years, Apache Cassandra became one of the most popular NoSQL solutions for big data.

It started back in 2008 as an open-sourced product from Facebook, became an Apache Incubator project in 2009, and graduated to a top-level in 2010.

It is three-minutes-tech series, I’m Alex Sergeenko and today we will see how Cassandra writes data. 

Let’s look at the write path in Cassandra.

There are four major participants involved in the process. They are: a write or a mutation, the commit log, the memtable, and SSTables.

When some mutation has triggered, the changes at first get persisted to the commit log. It is an append-only structure and each node in a cluster uses its own commit log. Commit logs are not replicated but we can tune the consistency and increase durability to the desired level using Replication Factor.

Due to its append-only nature commit logs ensure extremely fast sequential writes even when using spinning HDDs because no random seeks are needed. Commit logs are split into segments which reduces the number of seeks needed to write to a disk. 

Writing to commit log also lowers the chances of losing the data due to sudden crushes.

After a mutation is written to a commit log it gets written to a memtable. That’s an important point – the commit log and the memtable are not written in parallel.

Memtable is an in-memory data structure that is subjected to be periodically written to immutable files called SSTables or sorted-string tables. In a nutshell, a memtable represents a row of storage data. After a mutation gets written to an SSTable the round-trip considers completed and an acknowledgment returns to the client.

Commit logs are faster since they have to write fewer data compared to SSTables flushing. On the startup, Cassandra reads the commit log from the last known “good” position and re-applies these changes to related Memtables.

Long story short, when you write to Cassandra there are only fast sequential append to the commit log and even faster writing to memtable are involved.

These optimizations make Cassandra extremely fast on writes since no time-consuming operations are used.

But what’s happening after the acknowledgment is sent?

Memtables are periodically flushing to SSTables in a sequential manner as was mentioned above. Once SSTable is created based on the data from memtable it becomes immutable.

Generally speaking, Cassandra is not about random IO but about a sequential one. That’s become a significant benefit under heavy workloads.

SSTables are subjects of compaction. And compaction is a process when two or more SSTables merge sorting together and forming a new SSTable as a result. 

If there was an update or delete, the newest value for the field is retained by compaction and written to a new SSTable, while the older versions are discarded.

Pay attention, again we make sequential reads from SSTs during compaction, then make a sequential write of a resulting SST, and finally delete old files.

According to previously mentioned features, Cassandra is ideally suited for time-series and other append-only scenarios with high write rates, such as change-data capture and event sourcing.

In opposite short-lived objects and write-delete scenarios are posed a danger for Cassandra’s performance because such approaches lead to high volumes of so-called tombstones which may over-utilize storage and lead to longer compaction cycles.