catalogue
3.1 common commands of HDFS in Hadoop
4. View the contents of a file under HDFS
5. Copy the files in HDFS to the local system
6. Delete the document under HDFS
3.2 Java based Operation of API
(1) Installing hadoop for windows
(3) Add jar package to our project
3.2.2 Java API based operations
3.1 common commands of HDFS in Hadoop
As we already know, HDFS is distributed storage, which can store large batches of files. If you want to operate the files, you can complete it through the following commands, such as reading files, uploading files, deleting files and establishing directories. HDFS provides two access methods, which are based on Shell and Java API.
3.1.1 Shell based operation
The following describes the commands we often use when operating HDFS in the Shell
1. Create directory command
When HDFS creates a directory, mkdir is used. The command format is as follows
hdfs dfs -mkdir file name
Command example:
hdfs dfs -mkdir /demo #Create a demo folder in the root directory of hdfs hdfs dfs -mkdir -p /demo/test #Create folder / demo/test recursively under hdfs root directory
2. Upload files to HDFS
When uploading a file, the file is first copied to the DataNode. File upload is successful only if all datanodes accept complete data. The command format is as follows:
hdfs dfs -put filename route
Command example:
hdfs dfs -put test.txt /demo #Put the test.txt file in the demo folder
3. List files on HDFS
Use the - ls command to list files on HDFS. Note that there is no concept of "current working directory" in HDFS. The command format is as follows:
hdfs dfs -ls route
Command example:
hdfs dfs -ls /demo
4. View the contents of a file under HDFS
View through "- cat file name", and the command format is as follows
hdfs dfs -cat file name
Command example:
hdfs dfs -cat /demo/test.txt
5. Copy the files in HDFS to the local system
Use the "- get file 1 file 2" command to copy the file in a directory in HDFS to a file in the local system. The command format is as follows:
hdfs dfs -get File name local path
Command example:
hdfs dfs -get /demo/test.txt / #Copy the test.txt file in the demo folder to the local root directory
6. Delete the document under HDFS
Delete the files under HDFS through the "- rmr file" command. The command format is as follows:
hdfs dfs -rm -r file
Command example:
hdfs dfs -rm -r /demo/test.txt #Delete the test.txt file in the demo folder
3.2 Java based Operation of API
3.2.1 preliminary preparation
(1) Installing hadoop for windows
First of all, we should put Hadoop on our windows without configuring anything, that is, decompress the Hadoop package on our computer.
(2) IDEA create project
Create a project HdfsDemo in the IDEA editor. The directory structure is as follows. The out directory is not available when it is created on the job.
Then create a package under the src directory. Under the package is our java code. At the same time, we need to put the two hadoop files on our virtual machine under the src directory, as shown in the figure below.
The configuration of the two files is shown in the figure; The mapping name in two files was changed to IP address, because windows can't recognize your mapping name - Hadoop 5, so it was changed to IP address
This IP address is my Secondary NameNode address
This IP address is the host IP address of my NameNode.
(3) Add jar package to our project
Click File Project Structure in the upper left corner of IDEA
The first time we open it, there is nothing, then click the + sign in the right border box, and then click JARS OR directories
At this time, I began to look for directories and add jar packages. At this time, we need to find the hadoop folder, and then add all the jar packages in the share/hadoop folder below it. Don't use it.
For example: when adding packages, we enter the folder share/hadoop/mapreduce, and then we import packages in this directory. Then we enter the lib directory under mapreduce, and then we import packages. Generally, there are bags in these two places. We can search for specific information on the Internet. It's enough to use the API here to access our hdfs
3.2.2 Java API based operations
This section will introduce accessing HDFS through Java API s. First, let's talk about the main classes involved in file operations in HDFS
Configuration class: the object of this class encapsulates the configuration of the client or server
File system class: the object of this class is a file system object. Some methods of this object can be used to operate files. File system FS = file system.get (CONF): get the object through the static method get of file system.
FSDataInputStream and FSDataOutputStream: these two classes are input / output streams in HDFS, which are obtained through the open method and create method of FileSystem respectively.
The code is as follows:
package HdfsDemo; import java.io.File; import java.io.IOException; import javafx.scene.chart.ScatterChart; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hdfs.util.ByteArrayManager; public class HdfsDemo { public static void main(String[] args) { //Call the following function createFolder(); //Create folder function //uploadFile(); // Upload folder function //downloadFile(); // Download folder function //listFile(new Path("/")); // Show directory functions } private static void createFolder() { Configuration conf = new Configuration(); try { //Get the object of the file system through the configuration information FileSystem fs = FileSystem.get(conf); //Create a folder under the specified path Path path = new Path("/HdfsDemo"); fs.mkdirs(path); } catch (IOException e) { e.printStackTrace(); } } public static void listFile(Path path){ //Define a configuration object Configuration conf = new Configuration(); try { FileSystem fs = FileSystem.get(conf); //Incoming path indicates that the list of folders under a path is displayed //Put all the file metadata under the given path into an array of FileStatus //The FileStatus object encapsulates the metadata of files and directories, including file length, block size, permissions and other information FileStatus[] fileStatusArray = fs.listStatus(path) ; for (int i = 0 ; i < fileStatusArray.length ; i++) { FileStatus fileStatus = fileStatusArray[i]; //First, check whether it is a folder, and if so, recursion if (fileStatus.isDirectory()) { System.out.println("The current path is:"+ fileStatus.getPath()); listFile(fileStatus.getPath()); }else { System.out.println("The current path is:"+fileStatus.getPath()); } } } catch(IOException e) { e.printStackTrace(); } } public static void uploadFile() throws IOException { Configuration conf = new Configuration() ; try { FileSystem fs = FileSystem.get(conf); //Define the file path and upload path Path src = new Path("Local file path"); Path dest = new Path("Server path"); //Upload files from Bendu to the server fs.copyFromLocalFile(src,dest); } catch (IOException e) { e.printStackTrace(); } } public static void downloadFile(){ Configuration conf = new Configuration() ; try{ FileSystem fs = FileSystem.get(conf); //Define the path of the downloaded file and the path of the local file Path dest = new Path("/HdfsDemo/a.txt"); Path src = new Path("D://a.txt"); //Download from server to local fs.copyToLocalFile(dest,src); } catch (IOException e) { e.printStackTrace(); } } }
We can see that we have a main function, which calls createFolder(), uploadFile(), downloadFile(), listFile() respectively. If you call that function, you will get any results. The path in the code can be changed by yourself.
So far, we have completed the basic commands of a shell of hadoop cluster and the basic operations with Java API.