Hadoop configuration file details

1,hadoop-env.sh

2.core-site.xml

parameterexplain
fs.defaultFSDescribes the URI (including protocol, host name and port number) of the NameNode node in the cluster. The host is the host name or IP address of the NameNode, and the port is the port where the NameNode listens to RPC. If it is not specified, the default is 8020. Each machine in the cluster needs to know the address of the NameNode. The DataNode will first register on the NameNode so that their data can be used. The independent client program interacts with the DataNode through this URI to obtain the block list of files.
io.file.buffer.sizeSet the read / write cache size in the SequenceFiles file
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.1.100:900</value>
        <description>192.168.1.100 For server IP In fact, the host name can also be used</description>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
        <description>The attribute value is in KB,131072KB This is the default 64 M</description>
    </property> 
    </configuration>

3,hdfs-site.xml

attributemeaning
dfs.namenode.name.dirComma separated directory names are a list of directories where NameNode stores permanent metadata. NameNode stores the same metadata file in each directory on the list. For example: file:/data/hadoop/dfs/name
dfs.datanode.data.dirThe directory names separated by commas are the list of directories where DataNode stores data blocks. Each data block is stored in a directory. For example: file:/data/hadoop/dfs/data
dfs.namenode.checkpoint.dirThe directory names separated by commas are the list of directories where the secondary NameNode stores checkpoints. A copy of the checkpoint file is stored in each directory listed.
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>The number of slices, pseudo distributed, can be configured as 1</description>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop/tmp/namenode</value>
        <description>The path to the persistent storage of namespaces and transactions on the local file system</description>
    </property>
    <property>
        <name>dfs.namenode.hosts</name>
        <value>datanode1, datanode2</value>
        <description>datanode1, datanode2 Respectively corresponding DataNode Host name of the server</description>
    </property>
    <property>
        <name>dfs.blocksize</name>
        <value>268435456</value>
        <description>Large file system HDFS The block size is 256 M,The default value is 64 M</description>
    </property>
    <property>
        <name>dfs.namenode.handler.count</name>
        <value>100</value>
        <description>added NameNode Server thread processing from DataNodes of RPCS</description>
    </property>
</configuration>

4,yarn-site.xml

attributemeaning
yarn.resourcemanager.hostnameThe host name of the machine running the resource manager. The default value is 0.0.0.0. For example: 10.200.4.117
yarn.resourcemanager.addressHost name and port of the PRC server running resource manager. For example: 10.200.4.117:8032
yarn.nodemanager.local-dirsComma separated directory names are the local temporary storage space of YARN container. When the application ends, the data is cleared. It is best to distribute these directories to all local disks to improve the efficiency of disk I/O operations. Generally, YARN local storage uses the same disks and partitions (but different directories) as DataNode data block storage.
yarn.nodemanager.aux-servicesComma separated service names are a list of additional services run by the node manager. Each service is implemented by a class defined by the attribute yarn.nodemanager.auxservices.servicename.class. By default, no additional services are specified.
<configuration>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>192.168.1.100:8081</value>
        <description>IP Address 192.168.1.100 It can also be replaced with the host name</description>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>192.168.1.100:8082</value>
        <description>IP Address 192.168.1.100 It can also be replaced with the host name</description>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>192.168.1.100:8083</value>
        <description>IP Address 192.168.1.100 It can also be replaced with the host name</description>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>192.168.1.100:8084</value>
        <description>IP Address 192.168.1.100 It can also be replaced with the host name</description>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.1.100:8085</value>
        <description>IP Address 192.168.1.100 It can also be replaced with the host name</description>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>FairScheduler</value>
        <description>Common classes: CapacityScheduler,FairScheduler,orFifoScheduler</description>
    </property>
    <property>
        <name>yarn.scheduler.minimum</name>
        <value>100</value>
        <description>Company: MB</description>
    </property>
    <property>
        <name>yarn.scheduler.maximum</name>
        <value>256</value>
        <description>Company: MB</description>
    </property>
    <property>
        <name>yarn.resourcemanager.nodes.include-path</name>
        <value>nodeManager1, nodeManager2</value>
        <description>nodeManager1, nodeManager2 Corresponding server host name</description>
    </property>
</configuration>

5,slaves

[root@Hadoop171 hadoop]# vim workers
Configuring nodes for datanode

6.mapred-site.xml

parameterexplain
mapreduce.framework.nameThe execution framework is set to Hadoop YARN
<configuration>
    <property>
    <name> mapreduce.framework.name</name>
      <value>yarn</value>
       <description>The execution frame is set to Hadoop YARN</description>
    </property>
 </configuration>
parameterexplain
maprecude.jobhistory.addressDefault port number 10020
mapreduce.jobhistory.webapp.addressDefault port number 19888
<configuration>
<property>
        <name> mapreduce.jobhistory.address</name>
        <value>192.168.1.100:10200</value>
          <description>IP Address 192.168.1.100 Can be replaced with host name</description>
     </property>
      <property>
        <name>mapreduce.jobhistory.webapp.address</name>
          <value>192.168.1.100:19888</value>
        <description>IP Address 192.168.1.100 Can be replaced with host name      </description>
    </property>
    </configuration>

web access port

NameNodeDefault port number 50070 3.x version default port number 9870
ResourceManagerDefault port number 8088
MapReduce JobHistory ServerDefault port number 19888
secondnamenodeDefault port number 50090

Tags: Hadoop

Posted on Sat, 11 Sep 2021 01:26:25 -0400 by rish1103