Installation Steps

The following installation steps are based on Ubuntu 12.04. Note that the steps should be conducted in both NameNode and all DataNodes.

  1. Install HADOOP-0.22.0

    install gcc, g++, ant, java sdk

    install and configure openssh-server (make sure each node can be public key accessible by others)

    configure HADOOP_HOME (where you will place your HADOOP) as an environment variable

    download and untar hadoop-0.22.0.tar.gz (http://archive.apache.org/dist/hadoop/core/hadoop-0.22.0/hadoop-0.22.0.tar.gz), and mv to $HADOOP_HOME

  2. Download and extract our FastDR source code

    tar zxvf fastdr-1.0.0.tar.gz

  3. Generate two .so files and mv the two files to /usr/lib/

    cd fastdr-1.0.0/native/

    make

    sudo mv libdecode_basic.so libdecode_crdr.so /usr/lib/

  4. Integrate FastDR into HADOOP-0.22.0, through running install.sh

    mv fastdr-1.0.0 $HADOOP_HOME/

    bash install.sh

Note that you should modify some hardcoded parameters according to your experimental settings, before running install.sh:
  1. the costs for storage nodes (in "getCosts" from fastdr-1.0.0/java/org/apache/hadoop/hdfs/DistributedBDRFileSystem.java)

  2. block placement sequence (in "initialIdx2Ip" from fastdr-1.0.0/java/org/apache/hadoop/hdfs/server/namenode/BlockPlacementPolicyBDR.java)

Configurations

The following configurations should be conducted in both NameNode and all DataNodes.

  1. Configure HADOOP-0.22.0

    In $HADOOP_HOME/conf/masters (slaves), enter the hostname of your namenode (datanode)

    In $HADOOP_HOME/conf/core-site.xml, configure *hadoop.tmp.dir* (where files are stored), and *fs.default.name* (a URI directed to HDFS)

    In $HADOOP_HOME/conf/hdfs-site.xml, configure the file as follows:

    configure *dfs.replication* as 1

    configure *dfs.block.size* (in unit of BYTE)

    configure *fs.hdfs.impl* (the file system for hdfs) as "org.apache.hadoop.hdfs.DistributedBDRFileSystem"

    configure *dfs.block.replicator.classname* (block placement policy) as "org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicyBDR"

    configure *hdfs.bdr.data.length* (number of data nodes in an erasure-coded group, e.g., 6 for CRS(6,3,4))

    configure *hdfs.bdr.parity.length* (number of parity nodes in an erasure-coded group, e.g., 3 for CRS(6,3,4))

    configure *hdfs.bdr.strip.size* (number of blocks per storage node in an erasure-coded group, e.g., 4 for CRS(6,3,4))

    configure *hdfs.bdr.debug* (true: debug mode; false: non-debug mode)

    configure *hdfs.bdr.use.crdr* (the method adopted for realizing degraded reads, e.g., true: EG; false: the basic approach)

    In $HADOOP_HOME/conf/hadoop_env.sh, configure the file as follows:

    export JAVA_HOME=*the absolute path for your JAVA SDK*

    export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

  2. Configure environment variables

    configure JAVA_HOME (where you place your JAVA SDK) as an environment variable

    export PATH=$PATH:$JAVA_HOME/bin

    export PATH=$PATH:$HADOOP_HOME/bin

Running FastDR

  1. Format and run HDFS. The following steps are conducted in NameNode

    hadoop namenode -format

    start-hdfs.sh

  2. Test FastDR. The following steps are conducted in ClientNode

    generate a file of a stripe size, and then write to HDFS

    remove blocks in one specific DataNode and restart HDFS, so as to artificially generate node failures

    import the related hadoop jars and the newly generated hadoop-0.22.0-raid.jar, and then develop a client program for accessing data blocks

Through determining the following configurations, you can conduct extensive experiments. Have fun:)
  1. block size

  2. the degraded read approach

  3. number of data nodes in an erasure-coded group

  4. number of parity nodes in an erasure-coded group

  5. number of blocks per storage node in an erasure-coded group