CodFS: An Erasure-Coded Clustered Storage System for Efficient Updates and Recovery

Introduction

Many modern storage systems adopt erasure coding to provide data availability guarantees with low redundancy. Log-based storage is often used to append new data rather than overwrite existing data so as to achieve high update efficiency, but introduces significant I/O overhead during recovery due to reassembling updates from data and parity chunks. We propose parity logging with reserved space, which comprises two key design features: (1) it takes a hybrid of in-place data updates and log-based parity updates to balance the costs of updates and recovery, and (2) it keeps parity updates in a reserved space next to the parity chunk to mitigate disk seeks. We further propose a workload-aware scheme to dynamically predict and adjust the reserved space size. We prototype an erasure-coded clustered storage system called CodFS, and conduct testbed experiments on different update schemes under synthetic and real-world workloads. We show that our proposed update scheme achieves high update and recovery performance, which cannot be simultaneously achieved by pure in-place or log-based update schemes.

Publication

Download

People

CodFS is developed by the Advanced Network and System Research Laboratory in the Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK).

Please contact Patrick P. C. Lee if you have any questions.

License

The source code of CodFS is released under the GNU/GPL license.

Acknowledgments

The software of CodFS codes uses open source libraries Jerasure (Revision 1.2A) developed by Prof. James S. Plank.

The work is supported by grants AoE/E-02/08 and ECS CUHK419212 from the University Grants Committee of Hong Kong and ITS/250/11 from the Innovation and Technology Fund of Hong Kong.