Create directories and configure ownership + permissions on the appropriate hosts as described below. If any of these directories already exist, we recommend deleting and recreating them.
Use the following instructions to create appropriate directories:
- We strongly suggest that you edit and source the files included in - scripts.zipfile (downloaded in Download Companion Files).- Alternatively, you can also copy the contents to your - ~/.bash_profile) to set up these environment variables in your environment.
On the node that hosts the NameNode service, execute the following commands:
mkdir -p $DFS_NAME_DIR chown -R $HDFS_USER:$HADOOP_GROUP $DFS_NAME_DIR chmod -R 755 $DFS_NAME_DIR
where:
- $DFS_NAME_DIRis the space separated list of directories where NameNode stores the file system image. For example,- /grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn.
- $HDFS_USERis the user owning the HDFS services. For example,- hdfs.
- $HADOOP_GROUPis a common group shared by services. For example,- hadoop.
On all the nodes that can potentially host the SecondaryNameNode service, execute the following commands:
mkdir -p $FS_CHECKPOINT_DIR chown -R $HDFS_USER:$HADOOP_GROUP $FS_CHECKPOINT_DIR chmod -R 755 $FS_CHECKPOINT_DIR
where:
- $FS_CHECKPOINT_DIRis the space separated list of directories where SecondaryNameNode should store the checkpoint image. For example,- /grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn.
- $HDFS_USERis the user owning the HDFS services. For example,- hdfs.
- $HADOOP_GROUPis a common group shared by services. For example,- hadoop.
On all DataNodes, execute the following commands:
mkdir -p $DFS_DATA_DIR chown -R $HDFS_USER:$HADOOP_GROUP $DFS_DATA_DIRM chmod -R 750 $DFS_DATA_DIR
On the JobTracker and all Datanodes, execute the following commands:
mkdir -p $MAPREDUCE_LOCAL_DIR chown -R $MAPRED_USER:$HADOOP_GROUP $MAPREDUCE_LOCAL_DIR chmod -R 755 $MAPREDUCE_LOCAL_DIR
where:
- $DFS_DATA_DIRis the space separated list of directories where DataNodes should store the blocks. For example,- /grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn.
- $HDFS_USERis the user owning the HDFS services. For example,- hdfs.
- $MAPREDUCE_LOCAL_DIRis the space separated list of directories where MapReduce should store temporary data. For example,- /grid/hadoop/mapred /grid1/hadoop/mapred /grid2/hadoop/mapred.
- $MAPRED_USERis the user owning the MapReduce services. For example,- mapred.
- $HADOOP_GROUPis a common group shared by services. For example,- hadoop.
On all nodes, execute the following commands:
mkdir -p $HDFS_LOG_DIR chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_LOG_DIR chmod -R 755 $HDFS_LOG_DIR
mkdir -p $MAPRED_LOG_DIR chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_LOG_DIR chmod -R 755 $MAPRED_LOG_DIR
mkdir -p $HDFS_PID_DIR chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_PID_DIR chmod -R 755 $HDFS_PID_DIR
mkdir -p $MAPRED_PID_DIR chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_PID_DIR chmod -R 755 $MAPRED_PID_DIR
where:
- $HDFS_LOG_DIRis the directory for storing the HDFS logs.- This directory name is a combination of a directory and the - $HDFS_USER. For example,- /var/log/hadoop/hdfswhere- hdfsis the- $HDFS_USER.
- $HDFS_PID_DIRis the directory for storing the HDFS process ID.- This directory name is a combination of a directory and the - $HDFS_USER. For example,- /var/run/hadoop/hdfswhere- hdfsis the- $HDFS_USER.
- $MAPRED_LOG_DIRis the directory for storing the MapReduce logs.- This directory name is a combination of a directory and the - $MAPRED_USER. For example,- /var/log/hadoop/mapredwhere- mapredis the- $MAPRED_USER.
- $MAPRED_PID_DIRis the directory for storing the MapReduce process ID.- This directory name is a combination of a directory and the - $MAPRED_USER. For example,- /var/run/hadoop/mapredwhere- mapredis the- $MAPRED_USER.


