CHR: A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

Introduction

In practical distributed storage systems, storage nodes are of heterogeneous types and have different transmission bandwidths. Thus, traditional recovery solutions that simply minimize the number of data blocks being read may no longer be optimal in a heterogeneous environment.

We seek to optimize failure recovery for XOR-based erasure codes under heterogeneous settings. We formulate the recovery problem as an optimization problem in which storage nodes are associated with generic costs, each of which denotes the cost of reading one symbol from the corresponding storage node. Our objective is to minimize the total recovery cost.

We implement the recovery scheme as a C language API called CHR, which provides optimal recovery solution for single-node failure in RAID-6-coded storage systems with heterogeneous types of storage nodes. Specifically, CHR focuses on two RAID-6 codes: RDP and EVENODD. Given the inputs (e.g., the specific coding scheme, stripe size, the failed node, the transmission bandwidth of each storage node), CHR returns an optimal recovery solution, which specifies how to download symbols from other surviving nodes for recovering the data of the input failed node. The detail of CHR is described in the following paper:

Publication

Download

People

CHR is developed by Key Laboratory of High Performance Computing in the Department of Computer Science and Engineering at the University of Science and Technology of China (USTC). This project is also affiliated with the Advanced Network and System Research Laboratory at CUHK.

Faculty: Students:

Please contact Yunfeng Zhu if you have any questions.