Chapter 3 basic Hadoop commands and Java API s

catalogue

3.1 common commands of HDFS in Hadoop

3.1.1 Shell based operation

        1. Create directory command

        2. Upload files to HDFS

        3. List files on HDFS

        4. View the contents of a file under HDFS

        5. Copy the files in HDFS to the local system

        6. Delete the document under HDFS

3.2 Java based   Operation of API

3.2.1 preliminary preparation

        (1) Installing hadoop for windows

          (2) IDEA create project

        (3) Add jar package to our project

3.2.2 Java API based operations

3.1 common commands of HDFS in Hadoop

        As we already know, HDFS is distributed storage, which can store large batches of files. If you want to operate the files, you can complete it through the following commands, such as reading files, uploading files, deleting files and establishing directories. HDFS provides two access methods, which are based on Shell and Java API.

3.1.1 Shell based operation

        The following describes the commands we often use when operating HDFS in the Shell

        1. Create directory command

        When HDFS creates a directory, mkdir is used. The command format is as follows

hdfs dfs -mkdir file name

        Command example:

hdfs dfs -mkdir /demo   #Create a demo folder in the root directory of hdfs

hdfs dfs -mkdir -p /demo/test  #Create folder / demo/test recursively under hdfs root directory

        2. Upload files to HDFS

        When uploading a file, the file is first copied to the DataNode. File upload is successful only if all datanodes accept complete data. The command format is as follows:

hdfs dfs -put filename route

        Command example:

hdfs dfs -put test.txt /demo   #Put the test.txt file in the demo folder

        3. List files on HDFS

        Use the - ls command to list files on HDFS. Note that there is no concept of "current working directory" in HDFS. The command format is as follows:

hdfs dfs -ls route

        Command example:

hdfs dfs -ls /demo

        4. View the contents of a file under HDFS

        View through "- cat file name", and the command format is as follows

hdfs dfs -cat file name

        Command example:

hdfs dfs -cat /demo/test.txt

        5. Copy the files in HDFS to the local system

        Use the "- get file 1 file 2" command to copy the file in a directory in HDFS to a file in the local system. The command format is as follows:

hdfs dfs -get File name local path

        Command example:

hdfs dfs -get /demo/test.txt /  #Copy the test.txt file in the demo folder to the local root directory

        6. Delete the document under HDFS

        Delete the files under HDFS through the "- rmr file" command. The command format is as follows:

hdfs dfs -rm -r file

        Command example:

hdfs dfs -rm -r /demo/test.txt  #Delete the test.txt file in the demo folder

3.2 Java based   Operation of API

3.2.1 preliminary preparation

        (1) Installing hadoop for windows

         First of all, we should put Hadoop on our windows without configuring anything, that is, decompress the Hadoop package on our computer.

          (2) IDEA create project

         Create a project HdfsDemo in the IDEA editor. The directory structure is as follows. The out directory is not available when it is created on the job.

         Then create a package under the src directory. Under the package is our java code. At the same time, we need to put the two hadoop files on our virtual machine under the src directory, as shown in the figure below.

          The configuration of the two files is shown in the figure; The mapping name in two files was changed to IP address, because windows can't recognize your mapping name - Hadoop 5, so it was changed to IP address

          This IP address is my Secondary NameNode address

          This IP address is the host IP address of my NameNode.

        (3) Add jar package to our project

        Click File Project Structure in the upper left corner of IDEA

          The first time we open it, there is nothing, then click the + sign in the right border box, and then click JARS OR directories

        At this time, I began to look for directories and add jar packages. At this time, we need to find the hadoop folder, and then add all the jar packages in the share/hadoop folder below it. Don't use it.

 

         For example: when adding packages, we enter the folder share/hadoop/mapreduce, and then we import packages in this directory. Then we enter the lib directory under mapreduce, and then we import packages. Generally, there are bags in these two places. We can search for specific information on the Internet. It's enough to use the API here to access our hdfs

3.2.2 Java API based operations

        This section will introduce accessing HDFS through Java API s. First, let's talk about the main classes involved in file operations in HDFS

        Configuration class: the object of this class encapsulates the configuration of the client or server

        File system class: the object of this class is a file system object. Some methods of this object can be used to operate files. File system FS = file system.get (CONF): get the object through the static method get of file system.

        FSDataInputStream and FSDataOutputStream: these two classes are input / output streams in HDFS, which are obtained through the open method and create method of FileSystem respectively.

        The code is as follows:

package HdfsDemo;
import java.io.File;
import java.io.IOException;

import javafx.scene.chart.ScatterChart;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.util.ByteArrayManager;

public class HdfsDemo {
    public static void main(String[] args) {
        //Call the following function
        createFolder();     //Create folder function
        //uploadFile();      // Upload folder function
        //downloadFile();        // Download folder function
        //listFile(new Path("/"));   // Show directory functions
    }

    private static void createFolder() {
        Configuration conf = new Configuration();
        try {
            //Get the object of the file system through the configuration information
            FileSystem fs = FileSystem.get(conf);
            //Create a folder under the specified path
            Path path = new Path("/HdfsDemo");
            fs.mkdirs(path);
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
    public static void listFile(Path path){
        //Define a configuration object
        Configuration conf = new Configuration();
        try {
            FileSystem fs = FileSystem.get(conf);
            //Incoming path indicates that the list of folders under a path is displayed
            //Put all the file metadata under the given path into an array of FileStatus
            //The FileStatus object encapsulates the metadata of files and directories, including file length, block size, permissions and other information
            FileStatus[] fileStatusArray = fs.listStatus(path) ;
            for (int i = 0 ; i < fileStatusArray.length ; i++) {
                FileStatus fileStatus = fileStatusArray[i];
                //First, check whether it is a folder, and if so, recursion
                if (fileStatus.isDirectory()) {
                    System.out.println("The current path is:"+ fileStatus.getPath());
                    listFile(fileStatus.getPath());
                }else {
                    System.out.println("The current path is:"+fileStatus.getPath());
                }
            }

        } catch(IOException e) {
            e.printStackTrace();
        }


    }
    public static void uploadFile() throws IOException {
        Configuration conf = new Configuration() ;
        try {
            FileSystem fs = FileSystem.get(conf);
            //Define the file path and upload path
            Path src = new Path("Local file path");
            Path dest = new Path("Server path");
            //Upload files from Bendu to the server
            fs.copyFromLocalFile(src,dest);
        } catch (IOException e) {
            e.printStackTrace();
        }


    }
    public static void downloadFile(){
        Configuration conf = new Configuration() ;
        try{
            FileSystem fs = FileSystem.get(conf);
            //Define the path of the downloaded file and the path of the local file
            Path dest = new Path("/HdfsDemo/a.txt");
            Path src = new Path("D://a.txt");
            //Download from server to local
            fs.copyToLocalFile(dest,src);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

        We can see that we have a main function, which calls createFolder(), uploadFile(), downloadFile(), listFile() respectively. If you call that function, you will get any results. The path in the code can be changed by yourself.

        So far, we have completed the basic commands of a shell of hadoop cluster and the basic operations with Java API.

Tags: Java Big Data Hadoop

Posted on Sat, 20 Nov 2021 19:09:11 -0500 by andy666