Compared with other languages, string has its own special implementation in Go language, so it needs to be put forward here.
What is a string?
A string in the Go language is a byte slice. We can create a string by placing the contents between double quotes "". Let's look at a simple example of creating and printing strings.
package main import ( "fmt" ) func main() { name := "Hello World" fmt.Println(name) }
The above program will output Hello World.
The strings in Go are Unicode compatible and encoded using UTF-8.
Get each byte of the string separately
Since the string is a byte slice, we can get every byte of the string.
package main import ( "fmt" ) func printBytes(s string) { for i:= 0; i < len(s); i++ { fmt.Printf("%x ", s[i]) } } func main() { name := "Hello World" printBytes(name) }
On line 8 of the above program, len(s) returns the number of bytes in the string, and then we use a for loop to print these bytes in hexadecimal form.% n The x format qualifier specifies the hexadecimal encoding. The above program output 48 65 6c 6c 6f 20 57 6f 72 6c 64. These printed characters are the result of "Hello World" encoded in Unicode UTF-8. In order to better understand the strings in go, you need to have a basic understanding of Unicode and UTF-8. I recommend reading it https://naveenr.net/unicode-character-set-and-utf-8-utf-16-utf-32-encoding/ To understand what Unicode and UTF-8 are.
Let's slightly modify the above program to print each character of the string.
package main import ( "fmt" ) func printBytes(s string) { for i:= 0; i < len(s); i++ { fmt.Printf("%x ", s[i]) } } func printChars(s string) { for i:= 0; i < len(s); i++ { fmt.Printf("%c ",s[i]) } } func main() { name := "Hello World" printBytes(name) fmt.Printf("\n") printChars(name) }
In the printChars method (line 16),% c format qualifier is used to print the characters of the string. The output of this program is:
48 65 6c 6c 6f 20 57 6f 72 6c 64 H e l l o W o r l d
The above program obtains each character of the string. Although it seems legal, there is a serious bug. Let me disassemble this code to see what we did wrong.
package main import ( "fmt" ) func printBytes(s string) { for i:= 0; i < len(s); i++ { fmt.Printf("%x ", s[i]) } } func printChars(s string) { for i:= 0; i < len(s); i++ { fmt.Printf("%c ",s[i]) } } func main() { name := "Hello World" printBytes(name) fmt.Printf("\n") printChars(name) fmt.Printf("\n") name = "Señor" printBytes(name) fmt.Printf("\n") printChars(name) }
The output of the above code is:
48 65 6c 6c 6f 20 57 6f 72 6c 64 H e l l o W o r l d 53 65 c3 b1 6f 72 S e à ± o r
On line 28 of the above program, we try to output the character Se ñ or, but we output the wrong S e à o ± R. Why does the program perform perfectly when dividing Hello World, but there is an error when dividing Se ñ or? This is because the Unicode Code Point of ñ is U+00F1. Its UTF-8 encoding occupies c3 and b1 bytes. Its UTF-8 encoding occupies two bytes c3 and b1. When printing characters, we assume that the encoding of each character will occupy only one byte, which is wrong. In UTF-8 coding, a Code Point may occupy more than one byte of space. So what should we do? rune can help us solve this problem.
rune
Rune is the built-in type of Go language. It is also another name for int32. In Go language, Rune represents a code point. No matter how many bytes a code point occupies, it can be represented by a rune. Let's modify the above program to print characters with Rune.
package main import ( "fmt" ) func printBytes(s string) { for i:= 0; i < len(s); i++ { fmt.Printf("%x ", s[i]) } } func printChars(s string) { runes := []rune(s) for i:= 0; i < len(runes); i++ { fmt.Printf("%c ",runes[i]) } } func main() { name := "Hello World" printBytes(name) fmt.Printf("\n") printChars(name) fmt.Printf("\n\n") name = "Señor" printBytes(name) fmt.Printf("\n") printChars(name) }
In line 14 of the above code, the string is converted to a rune slice. Then we cycle through the characters. The output of the program is
48 65 6c 6c 6f 20 57 6f 72 6c 64 H e l l o W o r l d 53 65 c3 b1 6f 72 S e ñ o r
The output above is perfect, which is the result we want:).
for range loop of string
The above program is a good way to traverse strings, but Go gives us a simpler way to do this: use the for range loop.
package main import ( "fmt" ) func printCharsAndBytes(s string) { for index, rune := range s { fmt.Printf("%c starts at byte %d\n", rune, index) } } func main() { name := "Señor" printCharsAndBytes(name) }
In line 8 of the above program, the string is traversed using the for range loop. The loop returns the byte position of the current run. The output of the program is:
S starts at byte 0 e starts at byte 1 ñ starts at byte 2 o starts at byte 4 r starts at byte 5
From the above output, you can clearly see that ñ occupies two bytes:).
Constructing strings from byte slices
package main import ( "fmt" ) func main() { byteSlice := []byte{0x43, 0x61, 0x66, 0xC3, 0xA9} str := string(byteSlice) fmt.Println(str) }
In the above program, byteSlice contains the hexadecimal byte of the string Caf é encoded with UTF-8. The program output is Caf é.
What happens if we replace hexadecimal with the corresponding hexadecimal value? Can the above program work? Let's try:
package main import ( "fmt" ) func main() { byteSlice := []byte{67, 97, 102, 195, 169}//decimal equivalent of {'\x43', '\x61', '\x66', '\xC3', '\xA9'} str := string(byteSlice) fmt.Println(str) }
The output of the above program is also Caf é
Constructing strings with rune slices
package main import ( "fmt" ) func main() { runeSlice := []rune{0x0053, 0x0065, 0x00f1, 0x006f, 0x0072} str := string(runeSlice) fmt.Println(str) }
In the above program, runeSlice contains the hexadecimal Unicode code point of the string Se ñ or. This program will output Se ñ or.
Length of string
The func RuneCountInString(s string) (n int) method in the utf8 package package is used to obtain the length of the string. This method passes in a string parameter and returns the number of runes in the string.
package main import ( "fmt" "unicode/utf8" ) func length(s string) { fmt.Printf("length of %s is %d\n", s, utf8.RuneCountInString(s)) } func main() { word1 := "Señor" length(word1) word2 := "Pets" length(word2) }
The output result of the above program is:
length of Señor is 5 length of Pets is 4
The string is immutable
The string in Go is immutable. Once a string is created, it cannot be modified.
package main import ( "fmt" ) func mutate(s string)string { s[0] = 'a'//any valid unicode character within single quote is a rune return s } func main() { h := "hello" fmt.Println(mutate(h)) }
In line 8 of the above program, we try to change the first character in this string to 'a'. This operation is illegal because the string is immutable. So the program throws an error main.go:8: cannot assign to s[0].
To modify the string, you can convert the string into a rune slice. The slice can then make any desired changes and then be converted into a string.
package main import ( "fmt" ) func mutate(s []rune) string { s[0] = 'a' return string(s) } func main() { h := "hello" fmt.Println(mutate([]rune(h))) }
In line 7 of the above program, the mutate function receives a rune slice parameter, which modifies the first element of the slice to 'a', then converts the rune slice into a string and returns the string. This function is called on line 13 of the program. We convert h into a rune slice and pass it to mutate. This program outputs aello.