A new idea of command line parsing (described in Go language)

Author Ke Zhi
Source: Ali technical official account

I. overview

Command line parsing is a technology that almost every back-end programmer will use, but compared with business logic, these details are not important. If you only pursue to meet simple requirements, the processing of command line will be relatively simple, and any back-end programmer can handle it easily. Go standard library provides flag library for everyone to use.

However, when we slightly want to enrich our command line functions, the problems begin to become complex. For example, we should consider how to deal with optional and required options, how to set the default value for optional options, how to deal with subcommands, subcommands of subcommands, parameters of subcommands, and so on.

At present, cobra is the most widely used and powerful command line parsing library in Go language, but its rich functions make Cobra extremely complex compared with the flag of the standard library. In order to reduce the complexity of use, cobra even provides the function of code generation, which can automatically generate the command line skeleton. However, automatic generation not only saves development time, but also makes the code less intuitive.

By breaking everyone's inherent impression of the command line and re combing the concept of the command line, this paper develops a powerful but very simple command line parsing method. This method supports any number of subcommands and optional and mandatory parameters. Default values can be provided for optional parameters. It supports the simultaneous use of configuration files, environment variables and command-line parameters. The effective priority of configuration files, environment variables and command-line parameters is increased in turn. This design can be more in line with the principle of 12 factor.

II. Existing command line parsing methods

The Go standard library flag provides a very simple command-line parsing method. After defining the command-line parameters, you only need to call the flag.Parse method.

// demo.go
var limit int
flag.IntVar(&limit, "limit", 10, "the max number of results")
flag.Parse()
fmt.Println("the limit is", limit)

// results of enforcement
$ go run demo.go 
the limit is 10
$ go run demo.go -limit 100
the limit is 100

It can be seen that the flag library is very simple to use. After setting the command line parameters, you only need to call flag.Parse to resolve the parameters. When defining a command line parameter, you can specify the default value and instructions for using this parameter.

If you want to process subcommands, flag can't do anything. At this time, you can choose to parse subcommands yourself, but you can use cobra library directly.

Here is an example given by cobra to demonstrate the use of this library

package main

import (
  "fmt"
  "strings"

  "github.com/spf13/cobra"
)

func main() {
  var echoTimes int

  var cmdPrint = &cobra.Command{
    Use:   "print [string to print]",
    Short: "Print anything to the screen",
    Long: `print is for printing anything back to the screen.
For many years people have printed back to the screen.`,
    Args: cobra.MinimumNArgs(1),
    Run: func(cmd *cobra.Command, args []string) {
      fmt.Println("Print: " + strings.Join(args, " "))
    },
  }

  var cmdEcho = &cobra.Command{
    Use:   "echo [string to echo]",
    Short: "Echo anything to the screen",
    Long: `echo is for echoing anything back.
Echo works a lot like print, except it has a child command.`,
    Args: cobra.MinimumNArgs(1),
    Run: func(cmd *cobra.Command, args []string) {
      fmt.Println("Echo: " + strings.Join(args, " "))
    },
  }

  var cmdTimes = &cobra.Command{
    Use:   "times [string to echo]",
    Short: "Echo anything to the screen more times",
    Long: `echo things multiple times back to the user by providing
a count and a string.`,
    Args: cobra.MinimumNArgs(1),
    Run: func(cmd *cobra.Command, args []string) {
      for i := 0; i < echoTimes; i++ {
        fmt.Println("Echo: " + strings.Join(args, " "))
      }
    },
  }

  cmdTimes.Flags().IntVarP(&echoTimes, "times", "t", 1, "times to echo the input")

  var rootCmd = &cobra.Command{Use: "app"}
  rootCmd.AddCommand(cmdPrint, cmdEcho)
  cmdEcho.AddCommand(cmdTimes)
  rootCmd.Execute()
}

You can see that the addition of subcommands makes the code a little more complex, but the logic is still clear. Subcommands and commands follow the same definition template. Subcommands can also define their own subcommands.

$ go run cobra.go echo times hello --times 3
Echo: hello
Echo: hello
Echo: hello

cobra has powerful functions and clear logic, so it has been widely recognized by everyone. However, there are two problems that I am not satisfied with. Although the problem is not big, it is always haunting and depressing.

1. The parameter definition is separated from the command logic

From the definition of -- times above, we can see that the definition of parameters is separated from the definition of command logic (i.e. Run here). When we have a large number of subcommands, we prefer to put the definition of commands in different files or even directories, which will lead to the situation that the definition of commands is scattered, but the parameter definitions of all commands are concentrated together.

Of course, this problem can also be solved by cobra. Just move the parameter definition from main function to init function, and disperse the init function together with the definition of subcommand. For example, the subcommand times is defined in the times.go file, and the init function is defined in the file, which defines the parameters of times. However, this leads to the need to define a large number of global variables when there are many parameters, which is like a thorn in the back for people who pursue clear and concise code without side effects.

Why not put the parameter definition in the command function like the flag library? In this way, the code is more compact and the logic is more intuitive.

// Why can't I write like this?
func times(){
    cobra.IntVarP(&echoTimes, "times", "t", 1, "times to echo the input")
    cobra.Parse()
}

I believe you will understand that the times function can only be called after parsing the command line parameters, which requires that the command line parameters should be defined in advance. If the parameter definition is placed in times, it means that the relevant parameters will be parsed only when calling the times function, which is as unreasonable as asking the mobile phone to change the theme according to the shell color. However, is this really the case?

2. The sequence definition of child command and parent command is not flexible enough

When developing tools with subcommands or even multi-level subcommands, we often face the problem of choosing cmd {resource} {action} or cmd {action} {resource}, that is, who is the subcommand and who is the parameter of resource and action. For example, Kubernetes is designed to use action as the subcommand: kubectl get pods... Kubectl get deploy, When the action varies greatly due to different resources, the resource is often selected as the sub command, such as aliyun ECS... Aliyun ram

In the actual development process, at the beginning, we may not be able to determine which action or resource is better as a sub command. In the case of multi-level sub commands, this choice may be more difficult.

When not using any library, developers may choose to initialize relevant resources in the parent command and execute code logic in the child command, so it becomes very difficult for the parent command and child command to exchange with each other. This is actually a wrong logic. Calling a sub command does not mean that the parent command must be called. For command-line tools, the process will exit after the command is executed, and the resources initialized by the parent command will not be reused in the sub command.

Cobra's design allows you to avoid this error logic. Its subcommand needs to provide a Run function. In this function, you should initialize resources, execute business logic and destroy the whole life cycle of resources. However, cobra still needs to define the parent command, that is, the echo command must be defined before the echo times sub command can be defined. In fact, in many scenarios, the parent command does not execute logic, especially in the scenario where resource is used as the parent command. The only function of the parent command is to print the usage of the command.

cobra makes the definition of child command and parent command very simple, but the link relationship between parent and child still needs to be modified. Is there a way to make this process easier?

Third, re recognize the command line

There are many terms about the command line, such as argument, flag and option. The design of cobra is based on the definition of the following concepts
Commands represent actions, Args are things and Flags are modifiers for those actions.

In addition, more concepts are extended based on these definitions, such as persistent flags for flags applicable to all subcommands, local flags for flags only used for current subcommands, required flags for required flags, etc.

These definitions are the core design sources of cobra. In order to solve the two problems I mentioned above, we need to re-examine these definitions. To this end, we analyze what is a command line step by step from scratch.

The command line is just a string that can be parsed and executed by the shell

$ cmd arg1 arg2 arg3

The command line and its parameters are essentially a string. The meaning of a string is interpreted by the shell. For the shell, a command line consists of commands and parameters, which are separated by whitespace.

Anything else? No, there are no parent commands, child commands, persistent parameters and local parameters. It doesn't matter whether a parameter starts with a double line (- -), a single line (-) or other characters. It's just a string. These strings are passed by the shell to the program you want to execute and placed in the array os.Args (Go language).

2 parameters, identification and options

From the above description, it can be seen that an argument is the name of a string separated by a blank character after the command line, and a parameter can be given different meanings in the command line.

Parameters starting with horizontal lines or double horizontal lines look special. In combination with the code, this type of parameter has its unique function, that is, associating a value with a variable in the code. This type of parameter is called flag. Recall that there are many parameters in the os.Args array. These parameters are not directly related to the variables in the command, and the flag is essentially a key value pair. In our code, the function of assigning values to this variable is realized by associating the key with a variable.

flag.IntVar(&limit, "limit", 10, "the max number of results")

// Variable binding. When - limit 100 is specified on the command line, it means that we assign the value of 100 to the variable limit

Flag gives us the ability to assign a value to a variable in the code directly through the command line. So a new question is, if I don't assign a value to this variable, can the program continue to run? If you cannot continue running, this parameter (flag is only a special parameter) is required, otherwise it is optional. Another possibility is that the command line defines multiple variables. If any variable has a value, the program can be executed. That is, as long as any one of the multiple identifiers is specified, the program can be executed. From this point of view, these identifiers or parameters can also be called options.

Through the above analysis, we find that the concepts of parameter, identification and option are intertwined with each other, with both differences and similar meanings. The identifier is a parameter starting with a horizontal line, and the parameter (if any) after the identifier name is the value of the identifier. These parameters may be required or optional, or one of several options, so they can also be called options.

3 subcommands

Through the above analysis, we can easily conclude that the subcommand is only a special parameter. This parameter is no different from other parameters in appearance (unlike the identification starting with a horizontal line), but this parameter will trigger special actions or functions (any action can be encapsulated as a function).

Comparing the identifier with the subcommand, we will accidentally find the association: the identifier is associated with the variable and the subcommand is associated with the function! They have the same purpose to identify the following parameters, which are the values of variables. Then all the parameters behind the subcommand are the parameters of the function (not the function parameters at the language level).

The more interesting question is, why does the logo need to start with a horizontal line? If there is no horizontal line, can the purpose of associating variables be achieved? This is obviously possible, because subcommands have no horizontal lines, and there is no difference between the association of variables and the association of functions. In essence, this association is realized by the name of the identifier or subcommand. What does the horizontal line do?

Whether it is associated with a variable or a function is still determined by the name of the parameter, which is predefined in the code. There is no horizontal line to distinguish between identification and subcommands, and the association of variables or parameters can be completed.

For example:

// Parameters without horizontal lines can also implement associated variables or functions
for _, arg := range os.Args{
    switch arg{
        case "limit": // Set the limit variable
        case "scan": // Call scan function
    }
}

It can be seen that identification has no special role in the implementation of core functions, and the role of horizontal lines is mainly used to enhance readability. However, it should be noted that although we do not need identification in essence, once we have identification, we can use its features to realize additional functions, such as netstat -lnt, where - lnt is the syntax sugar of - l -n -t.

4 composition of command line

Through the above analysis, we can give different concepts to the parameters of the command line

  • Flag: a parameter starting with a horizontal line or double horizontal line. The flag is composed of an identification name and an identification parameter

    • --flagname flagarg
  • Non identification parameter
  • Subcommand, subcommand also has subcommand, identification and non identification parameters
$ command --flag flagarg subcommand subcmdarg --subcmdfag subcmdflagarg

Four heuristic command line parsing

Let's revisit the first requirement, that is, we expect the implementation of any subcommand to be as simple as using the flag of the standard library. This means that only when the function is executed can the command-line parameters be parsed. If we can distinguish the subcommand from other parameters, we can first execute the function corresponding to the subcommand and then parse the parameters of the subcommand.

The reason why flag calls Parse in main is because shell already knows that the first item of the string is the command itself, and all the items in the back are parameters. Similarly, if we can identify the sub commands, we can make the following code possible:

func command(){
    // Define flags
    // Call Parse function
}

The key to the problem is how to distinguish subcommands from other parameters. The identification name starts with a horizontal line or double horizontal line, which can be clearly distinguished. Others need to distinguish molecular commands, subcommand parameters and identification parameters. After careful consideration, we can find that although we expect that the parameters do not need to be defined in advance, the subcommands can be defined in advance. By comparing the parameters with non ID names with the predefined subcommands, we can identify the subcommands.

To demonstrate how to identify subcommands, we take the cobra code above as an example. Assuming that the cobra.go code is compiled into a program app, its command line can be executed

$ app echo times hello --times 3

According to the concept of cobra, times is a subcommand of echo, and echo is a subcommand of app. We regard echo times as a sub command of app as a whole.

1. Simple analysis process

  1. Define the echo subcommand to be associated with the function echo, and the echo times subcommand to be associated with the function echoTimes
  2. Parse string echo times hello --times 3
  3. Analyze the first parameter and match it to our predefined echo subcommand through echo. At the same time, it is found that this is also the prefix of echo times command. At this time, only knowing what the latter parameter is, can we determine whether the user calls echo or echo times
  4. Parsing the second parameter, we match the echo times subcommand through times, and it is no longer the prefix of any subcommand. At this time, confirm that the subcommand is echo times, and all other parameters are parameters of the subcommand.
  5. If the second parameter is parsed as hello, it can only match the echo subcommand, and the echo function will be called instead of the echoTimes function.

2 heuristic detection process

The above analysis is relatively simple, but in reality, we often expect to allow the logo to appear anywhere on the command line. For example, we expect to add a new option to control the printing color - color red. Logically, the color option is more about the description of echo than the description of times. Therefore, we expect to support the following command line:

$ app echo --color red times hello --times 3

At this time, the subcommand we expect to call is still echo times. However, the intermediate parameters complicate the situation, because the parameter red here may be the identification parameter (red) of -- color, a part of the subcommand, or a parameter of the subcommand. What's more, the user may incorrectly write the parameter as -- color times

The so-called heuristic detection means that when the red parameter is parsed, we do not know whether red is a subcommand (or the prefix part of the subcommand) or a parameter of the subcommand. Therefore, we can assume that it is the prefix of the subcommand for matching. If it cannot be matched, it will be treated as a subcommand parameter.

  1. When red is parsed, echo red is used to search predefined subcommands. If it cannot be searched, red is regarded as a parameter
  2. When parsing times, use echo times to search for predefined subcommands. At this time, you can search for echo times subcommands

You can see that red does not need to distinguish between the identification parameter of -- color or the non identification parameter of the subcommand. As long as it does not match any subcommand, you can confirm that it must be the parameter of the subcommand.

3. Arbitrary writing order of subcommands

The subcommand is essentially a string. The heuristic analysis above has realized the recognition of any subcommand string, provided that the string is defined in advance. That is to associate this string with a function. This design makes the parent command and child command just a logical concept, which has nothing to do with the specific code implementation. What we need to do is to adjust the mapping.

Maintain mapping relationships

# Associate to echoTimes function
"echo times" => echoTimes

# The adjustment subcommand just changes the mapping
"times echo" => echoTimes

V. Cortana: implementation of heuristic command line parsing

In order to realize the above ideas, I developed Cortana project. Cortana introduces Btree to establish the mapping relationship between subcommands and functions. Thanks to its prefix search ability, when users enter any subcommand prefix, the program will automatically list all available subcommands. The heuristic command line parsing mechanism can parse the subcommand before parsing the specific identification or subcommand parameters, so as to search the function mapped by the subcommand. In the mapped function, it can truly parse the subcommand parameters and realize variable binding. In addition, Cortana makes full use of the characteristics of Go language Struct Tag to simplify the process of variable binding.

We re implemented the cobra code with cortana

package main

import (
  "fmt"
  "strings"

  "github.com/shafreeck/cortana"
)

func print() {
  cortana.Title("Print anything to the screen")
  cortana.Description(`print is for printing anything back to the screen.
For many years people have printed back to the screen.`)
  args := struct {
    Texts []string `cortana:"texts"`
  }{}

  cortana.Parse(&args)
  fmt.Println(strings.Join(args.Texts, " "))
}

func echo() {
  cortana.Title("Echo anything to the screen")
  cortana.Description(`echo is for echoing anything back. 
Echo works a lot like print, except it has a child command.`)
  args := struct {
    Texts []string `cortana:"texts"`
  }{}

  cortana.Parse(&args)
  fmt.Println(strings.Join(args.Texts, " "))
}

func echoTimes() {
  cortana.Title("Echo anything to the screen more times")
  cortana.Description(`echo things multiple times back to the user by providing
  a count and a string.`)
  args := struct {
    Times int      `cortana:"--times, -t, 1, times to echo the input"`
    Texts []string `cortana:"texts"`
  }{}
  cortana.Parse(&args)

  for i := 0; i < args.Times; i++ {
    fmt.Println(strings.Join(args.Texts, " "))
  }
}

func main() {
  cortana.AddCommand("print", print, "print anything to the screen")
  cortana.AddCommand("echo", echo, "echo anything to the screen")
  cortana.AddCommand("echo times", echoTimes, "echo anything to the screen more times")
  cortana.Launch()
}

The command usage is exactly the same as cobra, except for some differences in the automatically generated help information

# Output automatically generated help information without adding any sub commands
$ ./app
Available commands:

print                         print anything to the screen
echo                          echo anything to the screen
echo times                    echo anything to the screen more times

# The - H, - help option is enabled by default, and developers do not need to do anything
$ ./app print -h
Print anything to the screen

print is for printing anything back to the screen.
For many years people have printed back to the screen.

Usage: print [texts...]

  -h, --help                     help for the command
  
# echo any content
$ ./app echo hello world
 hello world
 
# echo any number of times
$ ./app echo times hello world --times 3
 hello world
 hello world
 hello world

# --The times parameter can be anywhere
$ ./app echo --times 3 times hello world
 hello world
 hello world
 hello world

1 options and defaults

args := struct {
    Times int      `cortana:"--times, -t, 1, times to echo the input"`
    Texts []string `cortana:"texts"`
}{}

You can see that the echo times command has an -- times identifier. In addition, it is the content to be echoed. The content is essentially a command-line parameter, and may be divided into multiple parameters because there are spaces in the content.

As mentioned above, identification essentially binds a value to a variable. The name of the identification, such as -- times here, is associated with the variable args.Times. For other parameters that are not identified, these parameters have no name. Therefore, we are bound to a Slice, that is, args.Texts

Cortana defines its own structure tag, which is used to specify its long ID name, short ID name, default value and description of this option. The format is: cortana:"long, short, default, description"

  • Long identifier name: -- flagname. Any identifier supports the format of long identifier name. If it is not written, the field name will be used by default
  • short ID Name: - f, can be omitted
  • Default: can be any value matching the field type. If omitted, it will be null by default. If it is a single horizontal "-", it indicates that the user must provide a value
  • Description: the description information of this option is used to generate help information. The description can contain any printable characters (including commas and spaces)

For ease of memory, the Tag name cortana can also be written as lsdd, that is, the English initials of the above four parts.

2 subcommands and aliases

Addcommon can add any subcommand, which essentially establishes the mapping relationship between subcommands and their processing functions.

cortana.AddCommand("echo", echo, "echo anything to the screen")

In this example, the print command and echo command are the same. In fact, we can associate them by alias

// Define print as the alias of echo command
cortana.Alias("print", "echo")

Executing the print command actually executes echo

$ ./app print -h
Echo anything to the screen

echo is for echoing anything back. 
Echo works a lot like print, except it has a child command.

Available commands:

echo times                    echo anything to the screen more times


Usage: echo [texts...]

  -h, --help                     help for the command

The alias mechanism is very flexible. You can set an alias for any command and parameter. For example, we expect to implement the three subcommand and print any string three times. It can be implemented directly through alias:

cortana.Alias("three", "echo times --times 3")
# three is an alias for echo times --times 3
$ ./app three hello world
 hello world
 hello world
 hello world

3 help identification and commands

Cortana automatically generates help information for any command. This behavior can also be disabled through cortana.DisableHelpFlag, or you can set your favorite ID name through cortana.HelpFlag.

cortana.Use(cortana.HelpFlag("--usage", "-u"))
# Customize -- usage to print help information
$ ./app echo --usage
Echo anything to the screen

echo is for echoing anything back. 
Echo works a lot like print, except it has a child command.

Available commands:

echo times                    echo anything to the screen more times


Usage: echo [texts...]

  -u, --usage                    help for the command

Cortana does not provide help subcommand by default, but we can easily implement help command ourselves by using alias mechanism.

cortana.Alias("help", "--help")
// The help command is implemented through the alias, which is used to print the help information of any subcommand
$ ./app help echo times
Echo anything to the screen more times

echo things multiple times back to the user by providing
        a count and a string.

Usage: echo times [options] [texts...]

  -t, --times <times>            times to echo the input. (default=1)
  -h, --help                     help for the command

4 configuration files and environment variables

In addition to binding variables through command-line parameters, Cortana also supports user-defined binding configuration files and environment variables. Cortana is not responsible for parsing configuration files or environment variables. Users can realize this requirement with the help of a third-party library. Cortana's main role here is to merge values from different sources according to priority. The order of priority followed is as follows:

Default value < configuration file < environment variable < parameter

Cortana is designed to facilitate users to use configurations in any format. Users only need to implement the Unmarshaler interface. For example, use JSON as the configuration file:

cortana.AddConfig("app.json", cortana.UnmarshalFunc(json.Unmarshal))

Cortana completely hands over the parsing of configuration files or environment variables to a third-party library. Users can freely define how to bind configuration files to variables, such as using jsonTag.

5 No subcommands?

Cortana's design decouples command search and parameter analysis, so they can be used independently. For example, in the scenario without sub commands, parameter analysis can be realized directly in the main function:

func main(){
  args := struct {
    Version bool `cortana:"--version, -v, , print the command version"`
  }{}
  cortana.Parse(&args)
  if args.Version {
    fmt.Println("v0.1.1")
        return
  }
    // ...
}
$ ./app --version
v0.1.1

Vi. summary

Command line parsing is a function that everyone will use, but it is not particularly important. Unless it is a tool focusing on the use of the command line, we don't need to pay too much attention to command line parsing in general programs. Therefore, I sincerely thank the readers who are interested in the topic of this article and can read the last part of the article.

Flag library is easy to use and Cobra has rich functions. These two libraries can meet almost all our needs. However, in the process of writing command-line programs, I always feel that the existing libraries are flawed. The flag library only solves the problem of identification resolution. Although the cobra library supports the resolution of subcommands and parameters, it couples the resolution of subcommands and parameters, resulting in the separation of parameter definitions from functions. Cortana's core appeal is to decouple command search and parameter analysis. I invented a heuristic analysis method by returning to the essence of command line parameters, and finally achieved the above goal. This decoupling makes Cortana not only have rich functions like cobra, but also have the same use experience as flag. This experience of realizing powerful functions with a very simple mechanism through exquisite design makes me feel very comfortable. I hope to share my happiness with you through this article.

Project address: https://github.com/shafreeck/cortana

Database core concepts

Database, in short, can be regarded as an electronic file cabinet - the place where electronic files are stored. Users can add, intercept, update, delete and other operations on the data in the files. Database Management System (DBMS) is a computer software system designed to manage databases. It generally has basic functions such as storage, interception, security and backup. If you want to learn about databases, you need to understand the concepts of SQL, index, view and lock. This lesson takes you into databases.

click here , join the study.

Tags: Go Windows Database Kubernetes shell

Posted on Wed, 10 Nov 2021 21:37:06 -0500 by phr0stbyte