_02 Strings and Characters

string literal

let someString = "Some string literal value"

Multiline string literal value

let quotation = """ The White Rabbit put on his spectacles. "Where shall I begin, please your Majesty?" he asked. "Begin at the beginning," the King said gravely, "and go on till you come to the end; then stop." """

The following two strings are identical:

let singleLineString = "These are the same." let multilineString = """ These are the same. """

When the source code contains a line break in a multi-line string literal, it also appears in the value of the string. If you want to use line breaks to make the source code easier to read, but you don't want them to be part of the string value, write a backslash (\):

let softWrappedQuotation = """ The White Rabbit put on his spectacles. "Where shall I begin, \ please your Majesty?" he asked. "Begin at the beginning," the King said gravely, "and go on \ till you come to the end; then stop." """

To generate multiline string text that begins or ends with a line break, write an empty line on the first or last line. For example:

let lineBreaks = """ This string starts with a line break. It also ends with a line break. """

Multiline strings can be indented to match surrounding code. The space before the end quote ("") tells Swift which spaces to ignore before all other lines. However, if you write a space at the beginning of a line outside the space before the end quote, the space will be included.

Special Characters
The escaped special characters \0 (empty character), \(backslash), \t (horizontal tab), \n (line break), \r (carriage return),'(double quotation mark) and'(single quotation mark) are arbitrary Unicode scalar values written as \u, where n is a 1-8-bit hexadecimal number.

let wiseWords = "\"Imagination is more important than knowledge\" - Einstein" // "Imagination is more important than knowledge" - Einstein let dollarSign = "\u" // $, Unicode scalar U+0024 let blackHeart = "\u" // ♥, Unicode scalar U+2665 let sparklingHeart = "\u" // 💖, Unicode scalar U+1F496

Since multiline string text uses three double quotes instead of one, double quotes (") can be included in multiline string text without escaping. To include text in a multiline string:"", at least one quote must be escaped. For example:

let threeDoubleQuotationMarks = """ Escaping the first quotation mark \""" Escaping all three quotation marks \"\"\" """

Extended string delimiter
String text can be placed in the extended delimiter to contain special characters in the string without invoking its effects. For example, print string text #"Line 1\nLine 2"#print a line-break escape sequence (\n) instead of printing a string across two lines. For example, if the string is #"Line 1\nLine 2"# and you want to interrupt the line, you can use #"Line 1\nLine 2"#. Similarly, ### "Line1\### #nLine2"## also completes line interruption.

String text created with an extended delimiter can also be multiline string text. You can use an extended delimiter to include text in a multiline string: """, overriding the default behavior that ends with text. For example:

let threeMoreDoubleQuotationMarks = #""" Here are three more double quotes: """ """#

Initialize an empty string

var emptyString = ""// empty string literal var anotherEmptyString = String()// initializer syntax // Both strings are empty, they are equivalent

if emptyString.isEmpty {// Use isEmpty to judge nullity print("Nothing to see here")// printed }

String Variability

var variableString = "Horse" variableString += " and carriage" // variableString is now "Horse and carriage" let constantString = "Highlander" constantString += " and another Highlander" // this reports a compile-time ERROR! - a constant string cannot be modified

String is a value type
Swift's string type is a value type. If a new string value is created, the string value is copied when passed to a function or method, or when assigned to a constant or variable. In each case, a new copy of the existing string value is created, and a new copy is passed or assigned instead of the original version.

Swift's default replicated string string serial ensures that when a function or method passes a string value, you clearly have the string value wherever it comes from. You can be sure that the string passed will not be modified unless you modify it yourself.
Behind the scenes, Swift's compiler optimizes the use of strings so that they are actually copied only when absolutely necessary. This means you always get good performance when using strings as value types.

character

for character in "Dog!🐶" { print(character) } // D // o // g // ! // 🐶

Alternatively, you can create individual character constants or variables from a single string of text by providing a character type comment:

let exclamationMark: Character = "!"

String values can be constructed by passing an array of character values as a parameter to an initializer:

let catCharacters: [Character] = ["C", "a", "t", "!", "🐱"] let catString = String(catCharacters)// Construct a string from an array of characters print(catString)// Cat!🐱

Connection string to character

let string1 = "hello" let string2 = " there" var welcome = string1 + string2// "hello there" welcome += string2// "hello there there"

You can use the append() method of String type to add characters:

let exclamationMark: Character = "!" welcome.append(exclamationMark)// "hello there there!"

Strings or characters cannot be appended to existing character variables because character values can only contain a single character.

If you use multiple lines of string text to construct lines of a longer string, you want each line in the string to end with a newline, including the last line. For example:

let badStart = """ one two """ let end = """ three """ print(badStart + end) // Print two lines: // one // twothree let goodStart = """ one two """ print(goodStart + end) // Print three lines: // one // two // three

String Interpolation
String interpolation is a method of constructing new string values from a combination of constants, variables, text, and expressions by including their values in the string text. String interpolation can be used in single-line and multiline string text. Each item inserted into a string text is enclosed in parentheses with a backslash (\):

let multiplier = 3 let message = "\(multiplier) times 2.5 is \(Double(multiplier) * 2.5)" // message is "3 times 2.5 is 7.5"

You can use extended string delimiters to create strings containing characters that would otherwise be treated as string interpolation. For example:

print(#"Write an interpolated string in Swift using \(multiplier)."#) // Prints "Write an interpolated string in Swift using \(multiplier)."

Use string interpolation in strings that use extended delimiters. For example:

print(#"6 times 7 is \#(6 * 7)."#) // Prints "6 times 7 is 42."

Unicode
Unicode is an international standard for encoding, representing, and processing text in different writing systems. It enables you to represent almost any character in any language in a standardized form and read and write these characters from external sources, such as text files or Web pages. Swift's string and character types fully conform to Unicode, as described in this section.

Unicode Scalar Value
Behind the scenes, Swift's native string type is constructed from Unicode scalar values. Unicode scalar values are the only 21-bit numbers for a character or modifier, such as U+0061 for the Latin lowercase letter A ("a"), or U+1F425 for the positive chicken ("a").🐥”).

Note that not all 21-bit Unicode scalar values are assigned to a single character. Some scalars are reserved for future assignment or for use in UTF-16 encoding. Scalar values that have been assigned to characters usually also have names, such as the Latin letter A and the front chicken in the example above.

Extended glyph cluster
Each instance of the Swift character type represents an extended glyph cluster. An extended glyph cluster is a sequence of one or more Unicode scalars that (when combined) generate a single human-readable character.

Here is an example. The letter can be expressed as a single Unicode scalar (Latin lowercase E with an acute note, or U+00E9). However, the same letter can also be represented as a pair of scalars - a standard letter E (Latin lowercase e, or U+0065), followed by a combined acute accent scalar (U+0301)The combined accent scalar is graphically applied to the scalar before it, converting e to EWhen it is rendered by a text rendering system that supports Unicode.

In both cases, the letter_is represented as a single Swift character value, representing an extended glyph cluster. In the first case, the cluster contains a scalar; in the second case, it is a cluster of two scalars:

let eAcute: Character = "\u"// é let combinedEAcute: Character = "\u\u"// e followed by ́ also is é

Extended glyph clusters are a flexible way to represent many complex script characters as a single character value. For example, Korean syllables in Korean alphabets can be represented as pre-synthesized or decomposed sequences. In Swift, both representations meet the criteria for a single character value:

let precomposed: Character = "\u"// 한 let decomposed: Character = "\u\u\u"// ᄒ, ᅡ, ᆫ also is 한

Extended glyph clusters allow scalars used to enclose tags, such as a closed circle or U+20DD, to enclose other Unicode scalars as part of a single character value:

let enclosedEAcute: Character = "\u\u"// é⃝

Unicode scalars of regional indicator symbols can be combined in pairs to form a single character value, such as a combination of the regional indicator symbol letter U (U+1F1FA) and the regional indicator symbol letter S (U+1F1F8):

let regionalIndicatorForUS: Character = "\u\u"// 🇺🇸

Character Count

let unusualMenagerie = "Koala 🐨, Snail 🐌, Penguin 🐧, Dromedary 🐪" print("unusualMenagerie has \(unusualMenagerie.count) characters") // Prints "unusualMenagerie has 40 characters"

Note that Swift's use of extended glyph clusters for character values means that string concatenation and modifications may not always affect the string's character count.

For example, if you initialize a new string using the four-character word cafe and append a combined acute accent (U+0301) to the end of the string, the resulting string will still have a character count of 4, and the fourth character will be e_, not e:

var word = "cafe" print("the number of characters in \(word) is \(word.count)") // Prints "the number of characters in cafe is 4" word += "\u" // COMBINING ACUTE ACCENT, U+0301 print("the number of characters in \(word) is \(word.count)") // Prints "the number of characters in café is 4"

An extended glyph cluster can be composed of multiple Unicode scalars. This means that different characters and different representations of the same character may require different amounts of memory to store. As a result, each character in Swift takes up a different amount of memory in the string representation. Therefore, if you do not traverse the string to determine its extended glyph cluster boundaries, you cannot calculate the characters in the stringIf you are using a particularly long string value, note that the count property must iterate over the Unicode scalar in the entire string to determine the character of the string.
The count property does not always return the same character count as the NSString property with the same character. The length of the NSString is based on the number of 16-bit code units in the UTF-16 representation of the string, not the number of Unicode extended glyph clusters in the string.

Accessing and modifying strings
Use the startIndex property to access the position of the first character in the string. The endIndex property is the position after the last character in the string. Therefore, the endIndex property is not a valid parameter for string subscripts. If the string is empty, the startIndex and endIndex are equal.

String's index(before:) and index(after:) methods can be used to access indexes before and after a given index. To access indexes that are farther away from a given index, you can use the index (: offsetBy:) method instead of calling one of the methods multiple times.

let greeting = "Guten Tag!" greeting[greeting.startIndex]// G greeting[greeting.index(before: greeting.endIndex)]// ! greeting[greeting.index(after: greeting.startIndex)]// u let index = greeting.index(greeting.startIndex, offsetBy: 7) greeting[index]// a

greeting[greeting.endIndex] // Error greeting.index(after: greeting.endIndex) // Error

Use the indices property to access all indexes for a single character in a string.

for index in greeting.indices { print("\(greeting[index]) ", terminator: "")// "G u t e n T a g ! " }

By default, the print() function ends its output by adding a line break at the end of a line. To output a value without a line break, end it with an empty line break - for example, print(someValue, terminator: "").

The startIndex and endIndex properties and index(before:), index(after:), and index (: offsetBy:) methods can be used on any type that complies with the Collection protocol. This includes strings and collection types such as Array, Dictionary, and Set.

Insert and delete
To insert a single character into a string at the specified index, use the insert (: at:) method; to insert the contents of another string at the specified index, use the insert(contentsOf:at:) method.

var welcome = "hello" welcome.insert("!", at: welcome.endIndex)// "hello!" welcome.insert(contentsOf: " there", at: welcome.index(before: welcome.endIndex))// "hello there!"

To delete a single character from a string at a specified index, use the remove(at:) method; to delete a substring within a specified range, use the removeSubrange(:) method:

welcome.remove(at: welcome.index(before: welcome.endIndex))// "hello there" let range = welcome.index(welcome.endIndex, offsetBy: -6)..<welcome.endIndex welcome.removeSubrange(range)// "hello"

You can use insert (: at:), insert (contentsOf:at:), remove (at:), and removeSubrange(:) methods on any type that complies with the RangeReplaceableCollection protocol. This includes strings and collection types such as Array, Dictionary, and Set.

Substrings
Unlike strings, substrings only take a short time to operate on them. When you are ready to store the results for a longer time, you can convert the substrings to instances of strings. For example:

let greeting = "Hello, world!" let index = greeting.firstIndex(of: ",") ?? greeting.endIndex let beginning = greeting[..<index]// "Hello" It reuses memory used by greeting // Convert the result to a String for long-term storage. let newString = String(beginning)// It is created from a substring and has its own storage space

Like strings, each substring has a memory area for storing the characters that make up the substring. The difference between a string and a substring is that, as a performance optimization, a substring can reuse part of the memory used to store the original string or another substring.(Strings have similar optimizations, but if two strings share memory, they are equal.)This performance optimization means that you do not have to pay the performance cost of replicated memory until you modify the string or substring. As mentioned above, substrings are not suitable for long-term storage because they reuse the storage of the original string. As long as any of their substrings is used, the entire original string must be saved in memory.

In the example above, a greeting is a string, which means that it has a memory area that stores the characters that make up the string. Since it begins with a substring of the greeting, it reuses the memory used by the greeting. Instead, a newString is a string that has its own storage space when it is created from the substring. The following diagram shows these relationships:

Both strings and substrings conform to the StringProtocol protocol, which means that string manipulation functions can easily accept StringProtocol values. Such functions can be called using string or substring values.

Compare Strings

let quotation = "We're a lot alike, you and I." let sameQuotation = "We're a lot alike, you and I." if quotation == sameQuotation { print("These two strings are considered equal")// printed }

If extended glyph clusters have the same linguistic meaning and appearance, they are canonically equivalent even if they are composed of different Unicode scalars behind the scenes.

// "Voulez-vous un café?" using LATIN SMALL LETTER E WITH ACUTE let eAcuteQuestion = "Voulez-vous un caf\u?" // "Voulez-vous un café?" using LATIN SMALL LETTER E and COMBINING ACUTE ACCENT let combinedEAcuteQuestion = "Voulez-vous un caf\u\u?" if eAcuteQuestion == combinedEAcuteQuestion { print("These two strings are considered equal")// printed }

In contrast, the Latin capital letter A (U+0041, or "A") used in English is not the same as the Cyrillic capital letter A (U+0410, or "A") used in Russian. These characters are visually similar, but have different linguistic meanings:

let latinCapitalLetterA: Character = "\u" let cyrillicCapitalLetterA: Character = "\u" if latinCapitalLetterA != cyrillicCapitalLetterA { print("These two characters aren't equivalent.")// printed }

Equal prefix and suffix
To check whether a string has a specific string prefix or suffix, call the hasPrefix(:) and hasSuffix(:) methods of the string, both of which take a single parameter of type String and return a Boolean value.

let romeoAndJuliet = [ "Act 1 Scene 1: Verona, A public place", "Act 1 Scene 2: Capulet's mansion", "Act 1 Scene 3: A room in Capulet's mansion", "Act 1 Scene 4: A street outside Capulet's mansion", "Act 1 Scene 5: The Great Hall in Capulet's mansion", "Act 2 Scene 1: Outside Capulet's mansion", "Act 2 Scene 2: Capulet's orchard", "Act 2 Scene 3: Outside Friar Lawrence's cell", "Act 2 Scene 4: A street in Verona", "Act 2 Scene 5: Capulet's mansion", "Act 2 Scene 6: Friar Lawrence's cell" ]

var act1SceneCount = 0 for scene in romeoAndJuliet { if scene.hasPrefix("Act 1 ") { act1SceneCount += 1 } } print("There are \(act1SceneCount) scenes in Act 1") //=> "There are 5 scenes in Act 1"

var mansionCount = 0 var cellCount = 0 for scene in romeoAndJuliet { if scene.hasSuffix("Capulet's mansion") { mansionCount += 1 } else if scene.hasSuffix("Friar Lawrence's cell") { cellCount += 1 } } print("\(mansionCount) mansion scenes; \(cellCount) cell scenes") //=> "6 mansion scenes; 2 cell scenes"

Unicode representation of string
When a Unicode string is written to a text file or other storage, the Unicode scalars in the string are encoded in one of several Unicode-defined encoding formats. Each form encodes the string in a small block called a code unit. These include UTF-8 encoding format (encoding the string as an 8-bit code unit), UTF-16 encoding format(Encode the string as a 16-bit code unit) and UTF-32 encoding format (Encode the string as a 32-bit code unit).

Swift provides several different ways to access the Unicode representation of strings. You can use the for-in statement to iterate over strings to access their individual character values as glyph clusters extended by Unicode.

Or, access string values in one of three other Unicode-compliant representations:
A collection of UTF-8 code units (accessed using the utf8 attribute of a string)
A collection of UTF-16 code units (accessed using the utf16 property of a string)
A set of 21-bit Unicode scalar values, equivalent to the UTF-32 encoding of a string (accessed using the Unicode Scalars property of the string)

Each example below shows the different representations of the following string, which is composed of the characters D, o, g,! (double exclamation mark or Unicode scalar U+203C) and🐶 Characters (dog face or Unicode scalar U+1F436):

let dogString = "Dog!!🐶"

UTF-8 representation
The UTF-8 representation of a string can be accessed by iterating over the utf8 attribute of a string. The type of this attribute is String.UTF8View, which is a collection of unsigned 8-bit (UInt8) values with one value per byte in the UTF-8 representation of the string:

for codeUnit in dogString.utf8 { print("\(codeUnit) ", terminator: "") } print("") // Prints "68 111 103 226 128 188 240 159 144 182 "

In the example above, the first three decimal code unit values (68,111,103) represent the characters D, o, and g, and their UTF-8 representations are the same as ASCII representations. The next three decimal code unit values (226, 128, 188) are three-byte UTF-8 representations of double exclamation mark characters. The last four code unit values (240, 159, 144, 182) are four-byte UTF-8 representations of dog face characters.

UTF-16 representation
The UTF-16 representation of a string can be accessed by iterating over the utf16 attribute of a string. The type of this attribute is String.UTF16View, which is a collection of unsigned 16-bit (UInt16) values, in which each 16-bit code unit corresponds to a value:

for codeUnit in dogString.utf16 { print("\(codeUnit) ", terminator: "") } print("") // Prints "68 111 103 8252 55357 56374 "

Unicode Scalar Representation
Unicode scalar representations of string values can be accessed by iterating over the Unicode Scalars property of a string. This property is of type Unicode ScalarView, which is a collection of values of type Unicode Scalar.
Each UnicodeScalar has a value attribute that returns the 21-bit value of a scalar, which is represented in the UInt32 value:

for scalar in dogString.unicodeScalars { print("\(scalar.value) ", terminator: "") } print("") // Prints "68 111 103 8252 128054 "

The value attributes of the first three UnicodeScalar values (68,111,103) again represent the characters D, o, and g.
The fourth codeUnit value (8252) is also the decimal equivalent of the hexadecimal value 203C, which represents the Unicode scalar U+203C of the double exclamation mark character.
The fifth and last value attribute of UnicodeScalar (128054) is the decimal equivalent of hexadecimal value 1F436, which represents the Unicode scalar U+1F436 for dog face characters.

As an alternative to querying its value properties, each UnicodeScalar value can also be used to construct new string values, such as using string interpolation:

for scalar in dogString.unicodeScalars { print("\(scalar) ") } // D // o // g // ‼ // 🐶

12 October 2021, 12:30 | Views: 6666

Add new comment

0 comments