preface
At present, most blog Hadoop articles stay at the Hadoop 2.x stage. This series will supplement and update the new features not available in 2.x according to the full set of Hadoop 3. X tutorials of dark horse programmer big data. Pay attention to three times with one key and don't get lost next time!
1.1 analysis of implementation ideas
-
Parse command line parameters using Google Option.
-
Read the data directory to be collected and generate an upload task. The upload task contains a task file, which contains which files to upload to HDFS.
-
Execute the task, read the task file to be uploaded, and upload the files in the task file to HDFS one by one. During and after uploading, special marks need to be added to the task file.
1.2 parameter analysis of Google option command line
In order to realize the flexibility of the program, you can manually specify where to collect data and what kind of location to report to HDFS. Because you want to receive parameters from the command line, Google option is used here for parsing. Here is the GitHub address of Google option: https://github.com/pcj/google-options
1.2.1 g oogle option introduction
Google option this is the command line parameter parser in Bazel Project. The com.google.devtools.common.options package has been split into a separate jar for general utilities.
Bazel: it is Google's open source construction tool. Its speed is very fast, more than five times that of Maven. Cache and incremental construction are adopted. To modify one line of code, bazel only takes 0.5s, but Maven needs to rebuild it. Bazel can be easily extended to other languages. It supports Java and C + + natively, and now it also supports Rust, Go, Scala, etc
1.2.2 installing Google option
<dependency> <groupId>com.github.pcj</groupId> <artifactId>google-options</artifactId> <version>1.0.0</version> </dependency>
1.2.3 use mode
-
Create a class to define all command line options. This class needs to inherit from OptionBase
package example; import com.google.devtools.common.options.Option; import com.google.devtools.common.options.OptionsBase; import java.util.List; /** * Command-line options definition for example server. */ public class ServerOptions extends OptionsBase { @Option( name = "help", abbrev = 'h', help = "Prints usage info.", defaultValue = "true" ) public boolean help; @Option( name = "host", abbrev = 'o', help = "The server host.", category = "startup", defaultValue = "" ) public String host; @Option( name = "port", abbrev = 'p', help = "The server port.", category = "startup", defaultValue = "8080" ) public int port; @Option( name = "dir", abbrev = 'd', help = "Name of directory to serve static files.", category = "startup", allowMultiple = true, defaultValue = "" ) public List<String> dirs; }
-
Parse these parameters and use them
package example; import com.google.devtools.common.options.OptionsParser; import java.util.Collections; public class Server { public static void main(String[] args) { OptionsParser parser = OptionsParser.newOptionsParser(ServerOptions.class); parser.parseAndExitUponError(args); ServerOptions options = parser.getOptions(ServerOptions.class); if (options.host.isEmpty() || options.port < 0 || options.dirs.isEmpty()) { printUsage(parser); return; } System.out.format("Starting server at %s:%d...\n", options.host, options.port); for (String dirname : options.dirs) { System.out.format("\\--> Serving static files at <%s>\n", dirname); } } private static void printUsage(OptionsParser parser) { System.out.println("Usage: java -jar server.jar OPTIONS"); System.out.println(parser.describeOptions(Collections.<String, String>emptyMap(), OptionsParser.HelpVerbosity.LONG)); } }
1.2.4 parameter analysis of development public opinion reporting program
1.2.4.1 create parameter entity class with G oogleOption
-
At cn.itcast.sentiment_ Create a SentimentOptions class under the upload.arg package and inherit from OptionsBase
-
Define the following parameters
(1) Help, you can display the help information of the command. help h default parameters
(2) Location to collect data source s
(3) Generate temporary directory temp to be uploaded_ dir t “/tmp/sentiment”
(4) Generate the HDFS path to upload to output o
Reference code:
import com.google.devtools.common.options.Option; import com.google.devtools.common.options.OptionsBase; /** * Parameter entity class * (1) Help, you can display the help information of the command. help h default parameters * (2) Location to collect data source s * (3) Generate temporary directory temp to be uploaded_ dir t "/tmp/sentiment" * (4) Generate the HDFS path to upload to output o */ public class SentimentOptions extends OptionsBase { @Option( name = "help", abbrev = 'h', help = "print the help information", defaultValue = "true" ) public boolean help; @Option( name = "source", abbrev = 's', help = "Where to collect data", defaultValue = "" ) public String sourceDir; @Option( name = "pending_dir", abbrev = 'p', help = "Generate the directory to be uploaded", defaultValue = "/tmp/pending/sentiment" ) public String pendingDir; @Option( name = "output", abbrev = 'o', help = "Generate to upload to HDFS route", defaultValue = "" ) public String output; }
1.2.4.2 parsing parameters in main method
import com.google.devtools.common.options.Option; import com.google.devtools.common.options.OptionsBase; /** * Parameter entity class * (1) Help, you can display the help information of the command. help h default parameters * (2) Location to collect data source s * (3) Generate temporary directory temp to be uploaded_ dir t "/tmp/sentiment" * (4) Generate the HDFS path to upload to output o */ public class SentimentOptions extends OptionsBase { @Option( name = "help", abbrev = 'h', help = "print the help information", defaultValue = "true" ) public boolean help; @Option( name = "source", abbrev = 's', help = "Where to collect data", defaultValue = "" ) public String sourceDir; @Option( name = "pending_dir", abbrev = 'p', help = "Generate the directory to be uploaded", defaultValue = "/tmp/pending/sentiment" ) public String pendingDir; @Option( name = "output", abbrev = 'o', help = "Generate to upload to HDFS route", defaultValue = "" ) public String output; }
Postscript
📢 Blog home page: https://manor.blog.csdn.net
📢 Welcome to praise 👍 Collection ⭐ Leaving a message. 📝 Please correct any errors!
📢 This article was originally written by manor and started on CSDN blog 🙉
📢 Hadoop series articles will be updated every day! ✨