visit
The new databases that have emerged during this time have adopted names such as NoSQL and NewSQL, emphasizing that good old SQL databases fell short when it came to meeting the new demands.Despite their different design choices for particular protocols, these databases have adopted, for the most part, a shared-nothing, distributed computing architecture. While the processing power of every computing system is ultimately limited by physical constraints and, in cases such as distributed databases where parallel executions are involved, by the implications of , most of these systems offer the theoretical possibility of unlimited horizontal capacity scaling for both compute and storage. Each node represents a unit of compute and storage that can be added to the system as needed.However, as Cockroach Labs CEO and co-founder Spencer Kimball explains in the case of CockroachDB, designing one of these new databases from scratch is a herculean task that requires highly knowledgeable and skillful engineers working in coordination and making very carefully thought decisions. For databases such as CockroachDB, having a reliable, high-performance way to store and retrieve data from stable storage is essential. Designing a library that provides fast stable storage leveraging either filesystem or raw devices is a very difficult problem because of the elevated number of edge cases that are required to get right.
Technical design. Because one of the most common use cases of the new databases is storing data that is generated by high-throughput sources, it is important that the store engine is able to handle write-intensive workloads, all while offering acceptable read performance. RocksDB implements what is known in the database literature as a aka LSM tree.
Going into the details of LSM trees, and RocksDB’s implementation of the same, is out of the scope of this blog, but suffice it to say that it’s an indexing structure optimized to handle high-volume—sequential or random—write workloads. This is accomplished by treating every write as an append operation. A mechanism, that goes by the name of runs—transparently for the developer—in the background, removing data that is no longer relevant such as deleted keys or older versions of valid keys.Performance. The choice of a given technical design for performance reasons needs to be backed with empirical verification of the choice. During his time at Facebook, in the context of the project, a fork of MySQL that replaces InnoDB with RocksDB as MySQL’s storage engine, Mark Callaghan performed extensive and rigorous performance measurements to compare MySQL performance on InnoDB vs on RocksDB. Details can be found .
Not surprisingly, RocksDB regularly comes out as vastly superior in write-intensive benchmarks. Interestingly, while InnoDB was also regularly better than RocksDB in read-intensive benchmarks, this advantage, in relative terms, was not as big as the advantage RocksDB provides in the case of write-intensive tasks over InnoDB. Here is an example in the case of a I/O bound benchmark on :Tunability. RocksDb provides several tunable parameters to extract the best performance on different hardware configurations. While the technical design provides an architectural reason to favor one type of solution over another, achieving optimal performance on particular use cases usually requires the flexibility of tuning certain parameters for those use cases.
RocksDB provides a long list of parameters that can be used for this purpose. Samsung’s Praveen Krishnamoorthy presented at the 2015 annual meetup an extensive on how RocksDB can be tuned to accommodate different workloads.Manageability. In mission-critical solutions such as distributed databases, it is essential to have as much control and monitoring capabilities as possible over critical components of the system, such as the storage engine in the nodes.
Facebook introduced several important to RocksDB, such as dynamic option changes and the availability of detailed statistics for all aspects of RocksDB internal operations including compaction, that are required by enterprise grade software products.Production references. The world of enterprise software, particularly when it comes to databases, is extremely risk averse. For totally understandable reasons—risk of monetary losses and reputational damage in case of data loss or data corruption—nobody wants to be a guinea pig in this space.
RocksDB was developed by Facebook with the original motivation of of its massive MySQL cluster hosting its user production database from InnoDB to RocksDB. The migration was completed by resulting in a 50% storage savings for Facebook. Having Facebook lead the development and maintenance of RocksDB for its most critical use cases in their multibillion dollar business is a very important endorsement, particularly for developers of databases that lack Facebook’s resources to develop and maintain their own storage engines.Language bindings. RocksDB offers a key-value API, available for C++, C and Java. These are the most widely used programming languages in the distributed database world.
When considering all these 6 areas holistically, RocksDB is a very appealing choice for a distributed database developer looking for a fast, production tested storage engine.