Applied Distributed Systems Lab (ADSLab)

Department of Computer Science and Engineering, The Chinese University of Hong Kong
 

High-Performance Key-Value Storage


Persistent key-value (KV) stores are an integral part of large-scale storage infrastructures for storing massive structured data. Modern KV workloads become more diversified and comprise a mix of access types including reads, writes, and range queries. Such mixed workloads complicate the design of KV stores to achieve high performance in reads, writes, and range queries, while supporting scalable storage.

We have studied the performance issues of state-of-the-art KV stores. In particular, we revisit and enhance the design of indexing structures (i.e., the data structures used for indexing KV pairs). HashKV1 aims for high update performance under update-intensive workloads. It builds on the Log-Structured Merge (LSM) tree for high write performance, yet its performance is challenged by the high I/O amplification of the LSM-tree due to frequent compaction. We design HashKV based on the existing KV separation technique, atop which we further design hash-based data grouping to mitigate the garbage collection overhead by deterministically mapping values to storage locations. UniKV2 carefully combines hash indexing and the LSM-tree to simultaneously achieve high performance in reads, writes, and range queries. DiffKV3 builds on KV separation to carefully manage the ordering for keys and values, so as to simultaneously achieve high performance in writes, reads, and scans. It manages keys using the conventional LSM-tree with fully-sorted ordering (within each level of the LSM-tree), while managing values with partially-sorted ordering with respect to the fully-sorted ordering of keys in a coordinated way for preserving high scan performance.

HashKV, UniKV, and DiffKV are all designed as local KV stores that run on single machines. We further study distributed KV stores for networked environments. MemEC4 is a distributed in-memory KV store that achieves high data availability and maintains low data redundancy across storage servers via erasure coding. It specifically targets small objects by re-designing indexing structures that allow KV pairs and their metadata to be erasure-coded in entirety. ECHash5 applies erasure coding in decentralized KV stores. It preserves the elasticity of decentralized KV stores via a novel fragmented erasure coding model that eliminates parity updates when servers are added or removed under consistent hashing. Elastic Reed-Solomon codes6 supports I/O-efficient redundancy transitioning in decentralized erasure-coded KV stores.

  1. Yongkun Li, Helen H. W. Chan, Patrick P. C. Lee, and Yinlong Xu.
    "Enabling Efficient Updates in KV Storage via Hashing: Design and Performance Evaluation."
    ACM Transactions on Storage (TOS), 15(3), pp. 20:1-20:29, August 2019.
    (An earlier version appeared in USENIX ATC 2018)
    [pdf] [arXiv] [doi] [software]

  2. Qiang Zhang, Yongkun Li, Patrick P. C. Lee, Yinlong Xu, Qiu Cui, Liu Tang.
    "UniKV: Toward High-Performance and Scalable KV Storage in Mixed Workloads via Unified Indexing."
    Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE 2020), Dallas, TX, USA, April 2020.
    (AR: 129/568 = 22.7%)
    [pdf] [pptx] [software]

  3. Yongkun Li, Zhen Liu, Patrick P. C. Lee, Jiayu Wu, Yinlong Xu, Yi Wu, Liu Tang, Qi Liu, and Qiu Cui.
    "Differentiated Key-Value Storage Management for Balanced I/O Performance."
    Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC 2021), July 2021.
    (AR: 64/341 = 18.8%)

  4. Matt M. T. Yiu, Helen H. W. Chan, and Patrick P. C. Lee.
    "Erasure Coding for Small Objects in In-Memory KV Storage."
    Proceedings of the 10th ACM International Systems and Storage Conference (SYSTOR 2017), Haifa, Israel, May 2017.
    (AR: 17/45 = 37.8%)
    [pdf] [pptx] [software]

  5. Liangfeng Cheng, Yuchong Hu, and Patrick P. C. Lee.
    "Coupling Decentralized Key-Value Stores with Erasure Coding."
    Proceedings of the ACM Symposium on Cloud Computing 2019 (SoCC 2019), Santa Cruz, CA, USA, November 2019.
    (AR: 39/157 = 24.8%)
    [pdf] [pptx] [software]

  6. Si Wu, Zhirong Shen, and Patrick P. C. Lee.
    "Enabling I/O-Efficient Redundancy Transitioning in Erasure-Coded KV Stores via Elastic Reed-Solomon Codes."
    Proceedings of the 39th International Symposium on Reliable Distributed Systems (SRDS 2020), Shanghai, China, September 2020.
    (AR: 32/131 = 24.4%)
    Prof. C. V. Ramamoorthy Best Paper Award.
    [pdf] [pptx] [software]


Last updated in June 2021.