Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice

Introduction

Repair performance in hierarchical data centers is often bottlenecked by cross-rack network transfer. Recent theoretical results show that the cross-rack repair traffic can be minimized through repair layering, whose idea is to partition a repair operation into inner-rack and cross-rack layers. However, how repair layering should be implemented and deployed in practice remains an open issue. In this work, we address this issue by proposing a practical repair layering framework called DoubleR. We design two families of practical double regenerating codes (DRC), which not only minimize the cross-rack repair traffic, but also have several practical properties that improve state-of-the-art regenerating codes. We implement and deploy DoubleR atop Hadoop Distributed File System (HDFS), and show that DoubleR maintains the theoretical guarantees of DRC and improves the repair performance of regenerating codes in both node recovery and degraded read operations.

Publications

Downloads

DoubleR depends on the following third-party packages. We provide their copies here only for users to try DoubleR. We do not own the packages.

DoubleR source code:

People

DoubleR is co-developed by (i) School of Computer Science and Technology at Huazhong University of Science and Technology (HUST) and (ii) the Applied Distributed Systems Lab in the Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK).

License

The source code of DoubleR is released under the GNU/GPL license.

Contacts

Please contact Xiaolu Li if you have any questions and comments.