HashKV

Enabling Efficient Updates in Key-value Storage via Hashing

Download:

hashkv-1.0.0.tar.gz (Released in June 2018) (MD5: 1d40fcc34a8beb8fadd2e720448950bc)

Introduction

Persistent key-value (KV) stores mostly build on the Log-Structured Merge (LSM) tree for high write performance, yet the LSM-tree suffers from the inherently high I/O amplification. KV separation mitigates I/O amplification by storing only keys in the LSM-tree and values in separate storage. However, the current KV separation design remains inefficient under update-intensive workloads due to its high garbage collection (GC) overhead in value storage. We propose HashKV, which aims for high update performance atop KV separation under update-intensive workloads. HashKV uses hash-based data grouping, which deterministically maps values to storage space so as to make both updates and GC efficient. We further relax the restriction of such deterministic mappings via simple but useful design extensions. We compare HashKV with state-of-the-art KV stores via extensive testbed experiments, and show that HashKV achieves 4.6× throughput and 53.4% less write traffic compared to the current KV separation design.

Publications

Overview

The prototype is written in C++ and uses 3rd parity libraries, including

Minimal Requirement

Minimal setup to test the prototype:

Installation

On Ubuntu 14.04 LTS (Server), install

$ sudo apt-get update
$ sudo apt-get install g++ libboost-system-dev libboost-filesystem-dev libboost-thread-dev libsnappy-dev cmake zlib1g-dev

Download and extract the source code tarball

$ tar zxf hashkv-*.tar.gz

Setup the environment variable for HashKV as the root directory of folder hashkv

$ cd hashkv
$ export HASHKV_HOME=$(pwd)

Compile HdrHistogram_c (libhdr_histogram.so) under lib/HdrHistogram_c-0.9.4,

$ cd ${HASHKV_HOME}/lib/HdrHistogram_c-0.9.4
$ cmake .
$ make

Compile LevelDB (libleveldb.so) under lib/leveldb,

$ cd ${HASHKV_HOME}/lib/leveldb
$ make

Compile the prototype and the test program.

$ cd ${HASHKV_HOME}
$ make

Testing the Prototype

The test program is generated under bin/ after compilation, named hashkv_test.

Before running, add path to shared libraries under lib/leveldb/out-shared, lib/HdrHistogram_c-0.9.4/src:

$ export LD_LIBRARY_PATH="$HASHKV_HOME/lib/leveldb/out-shared:$HASHKV_HOME/lib/HdrHistogram_c-0.9.4/src:$LD_LIBRARY_PATH"

Then, switch to folder bin/

$ cd ${HASHKV_HOME}/bin

Create the folder for key storage (LSM-tree), which is named leveldb by default

$ mkdir leveldb

Create the folder for value storage

$ mkdir data_dir

Clean the LSM-tree and value storage folders **

$ rm -f data_dir/* leveldb/*

Choose and copy one of the example configuration files [db_sample_config.ini] corresponding to different designs to run the test program

$ cp [db_sample_config.ini] config.ini

Run the test program of the prototype under the chosen design

$ ./hashkv_test data_dir 100000

**Note that since the data layout differs between designs, the LSM-tree and the value store folders must be cleared when one switches to another design.