Applied Distributed Systems Lab (ADSLab)

Department of Computer Science and Engineering, The Chinese University of Hong Kong
 

Encrypted Deduplication for Outsourced Storage


Outsourcing storage management to the cloud is appealing for enterprises to cope with the unprecedented growth of data. We require that outsourced storage satisfy both storage efficiency (i.e., low storage footprints) and data confidentiality (i.e., protection from unauthorized access). To achieve both goals, we explore how to carefully combine encryption and deduplication, which we refer to as encrypted deduplication, for outsourced storage. Deduplication is a known technique for eliminating content-level redundancy for storage savings. It stores only a physical copy of duplicate chunks and references all duplicate chunks to the physical copy via small-size pointers (erasure coding can be applied to the physical copies after deduplication with controlled redundancy for fault tolerance). However, conventional symmetric-key encryption is incompatible with deduplication, as it uses a distinct key (e.g., obtained via random key generation) for encryption. This causes duplicate plaintext chunks to be encrypted into distinct ciphertext chunks, thereby prohibiting deduplication on the ciphertext chunks. Encrypted deduplication seamlessly combines encryption and deduplication by requiring each plaintext chunk be encrypted by a key derived from the chunk content itself, so that duplicate plaintext chunks are always encrypted into identical ciphertext chunks for deduplication.

We have extensively addressed the fault tolerance, security, performance issues of encrypted deduplication. In particular, we target backup applications, which carry highly redundant data for deduplication and are major use cases in outsourced storage.

Secure Cloud Storage

We design various encrypted deduplication systems for cloud storage. CDStore1 is a multi-cloud storage system that unifies deduplication and secret sharing. CDStore builds on a primitive called convergent dispersal that always encodes duplicate plaintext chunks into identical secret shares for deduplication. REED2 enables rekeying for encrypted deduplication to support dynamic access control. REED transforms plaintext chunks into a deterministic all-or-nothing-transform (AONT) package. It then encrypts a small part of the AONT package with a renewable key, while the remaining large part of the AONT package still preserves identical content for deduplication. This achieves lightweight rekeying and maintains storage efficiency. We also propose Metadedup3 applies encrypted deduplication to metadata (e.g., file recipes) to achieve further storage savings.

  • Mingqiang Li, Chuan Qin, and Patrick P. C. Lee.
    "CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal."
    Proceedings of the USENIX Annual Technical Conference (USENIX ATC 2015), Santa Clara, CA, USA, July 2015.
    (AR: 47/221 = 21.3%)
    [pdf] [pptx] [software]

  • Chuan Qin, Jingwei Li, and Patrick P. C. Lee.
    "The Design and Implementation of a Rekeying-aware Encrypted Deduplication Storage System."
    ACM Transactions on Storage (TOS), 13(1), 9:1-9:30, March 2017.
    (An earlier version appeared in DSN 2016)
    [pdf] [software] [doi]

  • Jingwei Li, Suyu Huang, Yanjing Ren, Zuoru Yang, Patrick P. C. Lee, Xiaosong Zhang, and Yao Hao.
    "Enabling Secure and Space-Efficient Metadata Management in Encrypted Deduplication."
    Accepted for publication in IEEE Transactions on Computers (TC).
    (An earlier version appeared in MSST 2019)
    [main pdf] [supplementary pdf] [software] [doi]

Defense Against Frequency Analysis

We observe that the deterministic nature of encrypted deduplication (i.e., duplicate plaintext chunks are always encrypted into identical ciphertext chunks) is susceptible to the classical frequency analysis attack. We present a study1 on the information leakage of encrypted deduplication due to frequency analysis, from both attack and defense perspectives. We show that adversaries can leverage the locality property of backup workloads (i.e., chunks tend to co-occur together across backups) to aggravate information leakage in encrypted deduplication. To relax the deterministic nature of encrypted deduplication, we propose TED2, a tunable encrypted deduplication primitive that encrypts a small proportion of duplicate plaintext chunks by different keys in a configurable manner. TED balances the trade-off between storage efficiency and data confidentiality by allowing users to configure a storage blowup factor, under which the information leakage is minimized for any input workload.

  1. Jingwei Li, Patrick P. C. Lee, Chufeng Tan, Chuan Qin, and Xiaosong Zhang.
    "Information Leakage in Encrypted Deduplication via Frequency Analysis: Attacks and Defenses."
    ACM Transactions on Storage (TOS), 16(1), pp. 4:1-4:30, March 2020.
    (An earlier version appeared in DSN 2017; one of three nominations for the Best Paper Award.)
    [pdf] [arXiv] [doi] [software]

  2. Jingwei Li, Zuoru Yang, Yanjing Ren, Patrick P. C. Lee, and Xiaosong Zhang.
    "Balancing Storage Efficiency and Data Confidentiality with Tunable Encrypted Deduplication."
    Proceedings of the 15th European Conference on Computer Systems (EuroSys 2020), Heraklion, Crete, Greece, April 2020.
    (AR: 43/234 = 18.4%)
    [pdf] [pptx] [software]

SGX for Encrypted Deduplication

Existing encrypted deduplication approaches build on expensive cryptographic primitives that incur substantial performance slowdown. We present SGXDedup1, which leverages shielded executions based on Intel Software Guarded Extensions (SGX) to speed up encrypted deduplication based on server-aided message-locked encryption (MLE), while preserving security via SGX. SGXDedup implements a suite of secure interfaces to execute MLE key generation and proof-of-ownership operations in SGX enclaves. It also incorporates various designs to support secure and efficient enclave operations.

  1. Yanjing Ren, Jingwei Li, Zuoru Yang, Patrick P. C. Lee, and Xiaosong Zhang.
    "Accelerating Encrypted Deduplication via SGX."
    Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC 2021), July 2021.
    (AR: 64/341 = 18.8%)


Last updated in June 2021.