FastDR: Boosting Degraded Reads in Heterogeneous Erasure-Coded Storage Systems

Installation Steps

The following installation steps are based on Ubuntu 12.04. Note that the steps should be conducted in both NameNode and all DataNodes.

Install HADOOP-0.22.0

install gcc, g++, ant, java sdk

install and configure openssh-server (make sure each node can be public key accessible by others)

configure HADOOP_HOME (where you will place your HADOOP) as an environment variable

download and untar hadoop-0.22.0.tar.gz (http://archive.apache.org/dist/hadoop/core/hadoop-0.22.0/hadoop-0.22.0.tar.gz), and mv to $HADOOP_HOME
Download and extract our FastDR source code

tar zxvf fastdr-1.0.0.tar.gz
Generate two .so files and mv the two files to /usr/lib/

cd fastdr-1.0.0/native/

make

sudo mv libdecode_basic.so libdecode_crdr.so /usr/lib/
Integrate FastDR into HADOOP-0.22.0, through running install.sh

mv fastdr-1.0.0 $HADOOP_HOME/

bash install.sh

Note that you should modify some hardcoded parameters according to your experimental settings, before running install.sh:

the costs for storage nodes (in "getCosts" from fastdr-1.0.0/java/org/apache/hadoop/hdfs/DistributedBDRFileSystem.java)
block placement sequence (in "initialIdx2Ip" from fastdr-1.0.0/java/org/apache/hadoop/hdfs/server/namenode/BlockPlacementPolicyBDR.java)

Configurations

The following configurations should be conducted in both NameNode and all DataNodes.

Configure HADOOP-0.22.0

In $HADOOP_HOME/conf/masters (slaves), enter the hostname of your namenode (datanode)

In $HADOOP_HOME/conf/core-site.xml, configure *hadoop.tmp.dir* (where files are stored), and *fs.default.name* (a URI directed to HDFS)

In $HADOOP_HOME/conf/hdfs-site.xml, configure the file as follows:

configure *dfs.replication* as 1

configure *dfs.block.size* (in unit of BYTE)

configure *fs.hdfs.impl* (the file system for hdfs) as "org.apache.hadoop.hdfs.DistributedBDRFileSystem"

configure *dfs.block.replicator.classname* (block placement policy) as "org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicyBDR"

configure *hdfs.bdr.data.length* (number of data nodes in an erasure-coded group, e.g., 6 for CRS(6,3,4))

configure *hdfs.bdr.parity.length* (number of parity nodes in an erasure-coded group, e.g., 3 for CRS(6,3,4))

configure *hdfs.bdr.strip.size* (number of blocks per storage node in an erasure-coded group, e.g., 4 for CRS(6,3,4))

configure *hdfs.bdr.debug* (true: debug mode; false: non-debug mode)

configure *hdfs.bdr.use.crdr* (the method adopted for realizing degraded reads, e.g., true: EG; false: the basic approach)

In $HADOOP_HOME/conf/hadoop_env.sh, configure the file as follows:

export JAVA_HOME=*the absolute path for your JAVA SDK*

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
Configure environment variables

configure JAVA_HOME (where you place your JAVA SDK) as an environment variable

export PATH=$PATH:$JAVA_HOME/bin

export PATH=$PATH:$HADOOP_HOME/bin

Running FastDR

Format and run HDFS. The following steps are conducted in NameNode

hadoop namenode -format

start-hdfs.sh
Test FastDR. The following steps are conducted in ClientNode

generate a file of a stripe size, and then write to HDFS

remove blocks in one specific DataNode and restart HDFS, so as to artificially generate node failures

import the related hadoop jars and the newly generated hadoop-0.22.0-raid.jar, and then develop a client program for accessing data blocks

Through determining the following configurations, you can conduct extensive experiments. Have fun:)

block size
the degraded read approach
number of data nodes in an erasure-coded group
number of parity nodes in an erasure-coded group
number of blocks per storage node in an erasure-coded group