Information Leakage in Encrypted Deduplication via Frequency Analysis


In this project, we study how frequency analysis practically affects information leakage in encrypted deduplication storage, from both attack and defense perspectives. We first propose a new inference attack that exploits chunk locality to increase the coverage of inferred chunks. We conduct trace-driven evaluation on both real-world and synthetic datasets, and show that the new inference attack can infer a significant fraction of plaintext chunks under backup workloads. To protect against frequency analysis, we propose two defense approaches, namely MinHash encryption and scrambling. Our trace-driven evaluation shows that our combined MinHash encryption and scrambling scheme effectively mitigates the inference attack, while maintaining high storage efficiency and incurring limited metadata access overhead.

This website provides both attack and defense toolkits against the FSL dataset to demonstrate how frequency analysis can be launched and defended. It also provides a deduplication-based prototype to demonstrate the metadata access overhead of our combined defense scheme.



Please find the README file in the package.

Demo Video


The software is developed by the Applied Distributed Systems Laboratory in the Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK).


The source code is released under the GNU/GPL license.