Enabling Efficient Updates in Key-value Storage via Hashing
hashkv-1.0.0.tar.gz (Released in June 2018) (MD5: 1d40fcc34a8beb8fadd2e720448950bc)
Persistent key-value (KV) stores mostly build on the Log-Structured Merge (LSM) tree for high write performance, yet the LSM-tree suffers from the inherently high I/O amplification. KV separation mitigates I/O amplification by storing only keys in the LSM-tree and values in separate storage. However, the current KV separation design remains inefficient under update-intensive workloads due to its high garbage collection (GC) overhead in value storage. We propose HashKV, which aims for high update performance atop KV separation under update-intensive workloads. HashKV uses hash-based data grouping, which deterministically maps values to storage space so as to make both updates and GC efficient. We further relax the restriction of such deterministic mappings via simple but useful design extensions. We compare HashKV with state-of-the-art KV stores via extensive testbed experiments, and show that HashKV achieves 4.6× throughput and 53.4% less write traffic compared to the current KV separation design.
The prototype is written in C++ and uses 3rd parity libraries, including
Minimal setup to test the prototype:
On Ubuntu 14.04 LTS (Server), install
g++
(version 4.8.4 or above)libboost-system-dev
, libboost-filesystem-dev
, libboost-thread-dev
libsnappy-dev
cmake
zlib1g-dev
$ sudo apt-get update
$ sudo apt-get install g++ libboost-system-dev libboost-filesystem-dev libboost-thread-dev libsnappy-dev cmake zlib1g-dev
Download and extract the source code tarball
$ tar zxf hashkv-*.tar.gz
Setup the environment variable for HashKV as the root directory of folder hashkv
$ cd hashkv
$ export HASHKV_HOME=$(pwd)
Compile HdrHistogram_c (libhdr_histogram.so
) under lib/HdrHistogram_c-0.9.4
,
$ cd ${HASHKV_HOME}/lib/HdrHistogram_c-0.9.4
$ cmake .
$ make
Compile LevelDB (libleveldb.so
) under lib/leveldb
,
$ cd ${HASHKV_HOME}/lib/leveldb
$ make
Compile the prototype and the test program.
$ cd ${HASHKV_HOME}
$ make
The test program is generated under bin/
after compilation, named hashkv_test
.
Before running, add path to shared libraries under lib/leveldb/out-shared
, lib/HdrHistogram_c-0.9.4/src
:
$ export LD_LIBRARY_PATH="$HASHKV_HOME/lib/leveldb/out-shared:$HASHKV_HOME/lib/HdrHistogram_c-0.9.4/src:$LD_LIBRARY_PATH"
Then, switch to folder bin/
$ cd ${HASHKV_HOME}/bin
Create the folder for key storage (LSM-tree), which is named leveldb
by default
$ mkdir leveldb
Create the folder for value storage
$ mkdir data_dir
Clean the LSM-tree and value storage folders **
$ rm -f data_dir/* leveldb/*
Choose and copy one of the example configuration files [db_sample_config.ini]
corresponding to different designs to run the test program
hashkv_sample_config.ini
vlog_sample_config.ini
leveldb_sample_config.ini
$ cp [db_sample_config.ini] config.ini
Run the test program of the prototype under the chosen design
$ ./hashkv_test data_dir 100000
**Note that since the data layout differs between designs, the LSM-tree and the value store folders must be cleared when one switches to another design.