Encoding-Aware Replication: Enabling Efficient and Reliable Transition from Replication to Erasure Coding in Clustered File Systems

Introduction

To balance performance and storage efficiency, modern clustered file systems (CFSes) often first store data with random replication (i.e., distributing replicas across randomly selected nodes), followed by encoding the replicated data with erasure coding. We argue that random replication, while being commonly used, does not take into account erasure coding and hence will raise both performance and availability issues to the subsequent encoding operation. We propose encoding-aware replication, which carefully places the replicas so as to (i) avoid cross-rack downloads of data blocks during encoding, (ii) preserve availability without data relocation after encoding, and (iii) maintain load balancing as in random replication. We implement encoding-aware replication on HDFS, and show via testbed experiments that it achieves significant encoding throughput gains over random replication. We also show via discrete-event simulations that encoding-aware replication remains effective under various parameter choices in a large-scale setting. We further show that encoding-aware replication evenly distributes replicas as in random replication.

Publication

Runhui Li, Yuchong Hu, and Patrick P. C. Lee.
"Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File Systems."
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2015) (Regular paper), Rio de Janeiro, Brazil, June 2015.
[pdf] [pptx]

Downloads

Facebook's Hadoop-20:
- hadoop-20-master.zip (md5sum: 7394336f791d06d3003d64082de05385)
EAR Hadoop Release
- Version 1.0.2 (August 2016): EAR-Hadoop-1.0.2.tar.gz (md5sum: 129d42795a7232c459a5d046056d1423)
- Version 1.0.1 (August 2015): EAR-Hadoop-1.0.1.tar.gz (md5sum: 8e3c5e5076c545927207a4c844e936a5)
- Version 1.0.0 (May 2015): EAR-Hadoop-1.0.0.tar.gz (md5sum: 25b044f8348a78cb50150c86d751a2f9)
EAR Simulator
- Version 1.0.0 (May 2015): EAR-Simulator-1.0.0.tar.gz (md5sum: e7f42dddf9ae7d53991f73f9e8c3a9c5)
ChangeLog

People

Encoding-Aware Replication is developed by the Advanced Network and System Research Laboratory in the Department of Computer Science and Engineering at the Chinese University of Hong Kong.

Runhui Li (PhD)
Yuchong Hu (Postdoc)
Patrick P. C. Lee (Faculty)

License

The source code of Encoding-Aware Replication is released under the GNU/GPL license.