Python string learning notes

Article catalog

character string

Basic characteristics

The essence of string is: character sequence. Python strings are immutable.

Python does not support single character types, which are also used as a string.

code

  • Python 3 directly supports Unicode and can represent characters in any written language in the world. The character of Python 3 is 16 bit Unicode by default, and ASCII is a subset of Unicode.
  • Use the built-in function ord() to get the encoding for the character.
  • Use the built-in function chr() to get the corresponding characters according to the encoding.
>>> ord('Horse')
39532
>>> chr(39532)
'Horse'

Representation (creation)

a = "I'm Tom"  # A pair of double quotes

b = 'Tom said:"I am Tom"'  # A pair of single quotes

c = '''Tom said:"I'm Tom"'''  # A pair of three single quotes

d = """Tom said:"I'm Tom" """  # A pair of three double quotes

Escape character

Use \ to indicate an escape character and \ at the end of a line to indicate a continuation character.

  • \n for a newline
  • \t for a tab
  • \'display a normal single quotation mark
  • \"Display a normal double quotation mark
  • \Represents a normal backslash
  • \r for a return
  • \b represents a backspace

Note: in python, add r before the string to represent the native string

k = r'good mor\ning'
print(k)  # good mor\ning

String splicing

  • str + str
  • Space

Note: both methods generate new string objects

>>> 'a' + 'b'
'ab'
>>> 'c' 'd'
'cd'

String copy

  • str * int
>>> 'jack'*3
'jackjackjack'

Common operations

1. Get length
  • len function can get the length of string

    mystr = 'It's a fine day today. It's beautiful everywhere'
    print(len(mystr))  # 17 get the length of the string
    
2. Search content
  • find

    • Returns the starting index value of the first occurrence of the searched content in the string, or - 1 if it does not exist
    • S.find(sub[, start[, end]]) -> int
  • rfind

    • Similar to the find() function, but starting from the right

      str1 = 'hello'
      print(str1.rfind('l'))  # 3
      
  • index

    • Just like the find() method, it returns - 1 when the find method is not found, and an exception when the index is not found
  • rindex

    • Similar to index(), but starting from the right
3. Judgment
  • startswith

    • Determine whether the string starts with the specified content

    • S.startswith(prefix[, start[, end]]) -> bool

      print('hello'.startswith('he'))  # True
      
  • endswith

    • Determine whether the string ends with the specified content

    • S.endswith(suffix[, start[, end]]) -> bool

      print('hello'.endswith('o'))  # True
      
  • isalpha

    • Determine whether the string is a pure letter

      mystr = 'hello world'
      print(mystr.isalpha()) # False because there is a space in the middle
      
  • isdigit

    • To determine whether a string is a pure number, if there are numbers other than 0-9, the result is False

      print('good'.isdigit())  # False
      
      print('123'.isdigit())  # True
      
      print('3.14'.isdigit())  # False
      
  • isalnum

    • Determine whether it is composed of numbers and letters. Returns False whenever non numeric and alphabetic characters appear

      print('hello123'.isalnum())  # True
      print('hello'.isalnum())  # True
      
  • isspace

    • Returns True if the string contains only spaces, False otherwise

      print('  '.isspace())  # True
      
  • isascii

    • Returns True if all characters in the string are ASCII; otherwise, returns False.

    • The judgment must be in the form of string, otherwise an error will be reported

      a = 'a'
      print(a.isascii())  # True
      
  • isupper

    • Returns True if all the letters in the string are uppercase; otherwise, returns False

      b = 'Hello'
      print(b.isupper())  # False
      c = 'LUCY'
      print(c.isupper())  # True
      
  • islower

    • Returns True if all the letters in the string are lowercase; otherwise, returns False

      b = 'Hello'
      print(b.islower())  # False
      c = 'lucy'
      print(c.islower())  # True
      
  • isnumeric

    • Checks whether a string consists of only numbers. This method is only for unicode objects

      • The numbers here include: Arabic numbers, roman numbers, Chinese Simplified numbers, Chinese traditional numbers
    • To define a string as Unicode, just add the "u" prefix before the string

      str1 = u"One one①②③⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩ❶❷❸❺❺❻❼❽❾❿2009"
      print(str1.isnumeric())  # True
      
  • isprintable

    • Determines whether it is a printable string. If all characters are printable, returns True. Otherwise, returns False

    • Non printable characters can be carriage return, line feed, tab

      str1 = 'abc'
      print(str1.isprintable())  # True
      
      str2 = 'abc\tdef'
      print(str2.isprintable())  # False
      
  • istitle

    • Determine whether the first letter is uppercase and other letters are lowercase

      str1 = 'LuCy'
      print(str1.istitle())  # False
      str2 = 'Lucy Ha'
      print(str2.istitle())  # True
      
  • isidentifier

    • Determine whether a string is a valid Python identifier

      str1 = 'True'
      print(str1.isidentifier())  # True keyword cannot be detected as variable name
      str2 = '3abc'
      print(str2.isidentifier())  # False
      str3 = 'username'
      print(str3.isidentifier())  # True
      
    • A string is considered a valid identifier if it contains only alphanumeric or underscore characters. A valid identifier cannot begin with a number or contain any spaces

  • isdecimal

    • Returns True if the string contains only decimal characters, False otherwise

      str1 = u"this2009"
      print(str1.isdecimal())  # False
      str2 = u"23443434"
      print(str2.isdecimal())  # True
      
    • This method only exists in unicode objects

4. Calculate the number of occurrences
  • count

    • Returns the number of times the queried string appears in the original string between start and end

    • S.count(sub[, start[, end]]) -> int

      str1 = 'hello'
      print(str1.count('l'))  # 2
      
5. Replacement
  • replace

    • Replace the content specified in the string. If you specify count times, the replacement will not exceed count times

    • replace(self, old, new, count)

      msg = 'He's awesome, he's showy, he's handsome'
      msg1 = msg.replace('he', 'lucy')  # Replace all by default
      msg2 = msg.replace('he', 'lucy', 2)  # From left to right, replace the first two
      print(msg1)  # lucy's awesome, lucy's pretty, lucy's handsome
      print(msg2)  # lucy's awesome, lucy's pretty, he's handsome
      
      • The character to be replaced is not in the string and will not report an error

        s1 = 'abcd'
        print(s1.replace('lucy', 'cc'))  # abcd
        
    • In the whole process, a new string is actually created, and the original string does not change.

6. Cut string
  • split

    • You can cut a string into a list

    • The default maximum number of divisions is - 1, which means unlimited and can be omitted; you can also specify the maximum number of divisions by yourself

      x = 'zhangsan-hahaha-tom-tony-lucy'
      y = x.split('-', -1)
      z = x.rsplit('-')
      print(y)  # ['zhangsan', 'hahaha', 'tom', 'tony', 'lucy']
      print(z)  # ['zhangsan', 'hahaha', 'tom', 'tony', 'lucy']
      
      x = 'zhangsan-hahaha-tom-tony-lucy'
      print(x.split('-', 2))  # ['zhangsan', 'hahaha', 'tom-tony-lucy']
      
      x = '-hahaha-tom-tony-lucy'
      y = x.split('-')
      
      print(y)  # ['', 'hahaha', 'tom', 'tony', 'lucy']
      
    • By default, it is divided by spaces, line breaks, and tabs

      s = 'my name is lucy'
      s1 = s.split()
      print(s1)  # ['my', 'name', 'is', 'lucy']
      
  • rsplit

    • The usage is basically the same as split, only from right to left

      x = 'zhangsan-hahaha-tom-tony-lucy'
      print(x.rsplit('-', 2))  # ['zhangsan-hahaha-tom', 'tony', 'lucy']
      
  • splitlines

    • Separated by rows, returns a list containing rows as elements

      str1 = 'hello\nworld'
      print(str1.splitlines())  # ['hello', 'world']
      
  • partition

    • Specify a string STR as the separator, and divide the original string into three parts: before STR, after STR, and after str. These three parts make up a tuple

      print('agdaXhhXhjjs'.partition('X'))  # ('agda', 'X', 'hhXhjjs')
      
  • rpartition

    • Similar to the partition() function, but starting from the right

      print('agdaXhhXhjjs'.rpartition('X'))  # ('agdaXhh', 'X', 'hjjs')
      
7. Modify case
  • capitalize

    • Capitalize the first word

      mystr = 'hello world'
      print(mystr.capitalize()) # Hello world
      
  • title

    • Capitalize each word

      mystr = 'hello world'
      print(mystr.title()) # Hello World
      
  • lower

    • All lowercase

      mystr = 'hElLo WorLD'
      print(mystr.lower()) # hello world
      
  • upper

    • All capitalized

      mystr = 'hello world'
      print(mystr.upper())  #HELLO WORLD
      
  • casefold()

    • Converts all uppercase letters in a string to lowercase characters

      s1 = 'I Love Python'
      print(s1.casefold())  # i love python
      
  • swapcase()

    • Change small and medium write of string to upper case and upper case to lower case

      s1 = 'I Love Python'
      print(s1.swapcase())  # i lOVE pYTHON
      
8. Space handling
  • ljust

    • Returns a string of a specified length, complemented (left justified) with white space characters on the right

      str = 'hello'
      print(str.ljust(10))  # hello filled in five spaces on the right
      
    • If its length is greater than the specified length, no processing will be done

    • Fill characters can be specified, and the default is space

      print('lucy'.ljust(10, '+'))  # lucy++++++
      
  • rjust

    • Returns a string of a specified length, with white space on the left (right justified)

      str = 'hello'
      print(str.rjust(10))  #      hello filled in five spaces on the left
      
    • Fill characters can be specified, and the default is space

  • center

    • Returns a string of a specified length, complemented (centered) with white space characters at both ends

      str = 'hello'
      print(str.center(10))  #  Add spaces at both ends of hello to center the content
      
    • Fill characters can be specified, and the default is space

  • Remove the leading and trailing blanks, including spaces, tabs, and line breaks

    • lstrip

      • Remove the white space character to the left of the string

        mystr = '    he   llo      '
        print(str.lstrip())  #He LLO only removes the spaces on the left, and the spaces in the middle and right are reserved
        
    • rstrip

      • Remove the white space character to the right of the string

        mystr = '    he   llo      '
        print(str.rstrip())  #    The space to the right of he LLO is removed
        
    • strip

      • Remove white space characters on both sides of a string

        str = '    he   llo      '
        print(str.strip())  #he   llo
        
      • Specify delete character

        s = 'fgk too k white ser'
        s1 = s.strip('fkgres')  # At the same time
        print(s1)  # Too white
        
  • expandtabs()

    • Turn the tab symbol ('\ t') in the string into a space. The default number of spaces for the tab symbol ('\ t') is 8

      s1 = 'a\tbcd'
      print(s1)
      print(s1.expandtabs())
      print(s1.expandtabs(tabsize=4))
      print(s1.expandtabs(tabsize=0))
      # a	bcd
      # a       bcd
      # a   bcd
      # abcd
      
9. String splicing
  • join

    • S.join(iterable)

      s = 'lucy'
      s1 = '+'.join(s)
      print(s1)  # l+u+c+y
      
      print('+'.join({'name': 'lucy', 'age': 18}))  # name+age
      
    • Function: can quickly convert a list or tuple into a string, separated by specified characters

      • Premise: the elements in a list or tuple must be of str type

        l1 = ['my', 'name', 'is', 'lucy']
        s1 = ' '.join(l1)
        print(s1)  # my name is lucy
        
    • It is recommended to use this method for string splicing, which is more efficient than str+str, because join only creates string objects once.

10. Encryption and decryption (mapping replacement)
  • maketrans

    Create a conversion table for character mapping.

    str.maketrans(intab, outtab,delchars)

    intab -- a string of characters to replace in a string.
    outtab -- the corresponding string of mapped characters.
    delchars -- optional parameter, indicating that each character in the string will be mapped to None
    
    intab and outtab are strings and must be the same length
    
  • translate

    The characters in the string are converted according to the conversion table given by the maketrans() function.

    Note: filter first (turn to None), then convert

in_str = 'afcxyo'
out_str = '123456'

# maketrans() generates the transformation table, which must be called with str
# map_ The type of table is Dictionary
map_table = str.maketrans(in_str, out_str)

# Use translate() for conversion
my_love = 'I love fairy'
new_my_love = my_love.translate(map_table)

print(new_my_love)  # I l6ve 21ir5
in_str = 'afcxyo'
out_str = '123456'

# maketrans() generates the transformation table, which must be called with str
map_table = str.maketrans(in_str, out_str, 'yo')

# Use translate() for conversion
my_love = 'I love fairy'
new_my_love = my_love.translate(map_table)

print(new_my_love)  # I lve 21ir
11. Fill 0 before string
  • zfill()

    • Returns a string of a specified length. The original string is right justified and filled with 0
    a = 3
    b = str(a).zfill(4)
    print(b)  # 0003
    
    • Use scenario

    The number in string format is not the same as we expected when sorting. For example, 11 is in front of 2, which brings some problems. For example, when merging some files named with numbers, the order of merging files may change. Then fill 0 in front of the numbers to keep the length of these numbers consistent, and the problem will be solved.

slice

Slicing: to copy a specified section of content from a string and generate a new string.

m[start:end:step] head and tail

  • Step: step. Default is 1

    m = 'abcdefghigklmnopqrstuvwxyz'
    print(m[2:9])  # cdefghi
    

    Step size cannot be 0, otherwise error will be reported

    m = 'abcdefghigklmnopqrstuvwxyz'
    print(m[2:9:0])  # report errors
    
  • When the step size is negative, it means getting from right to left

    m = 'abcdefghigklmnopqrstuvwxyz'
    print(m[3:15:-1])  # no data
    print(m[15:3:-1])  # ponmlkgihgfe
    
  • If start and end are negative numbers, the index is from the right

    m = 'abcdefghigklmnopqrstuvwxyz'
    print(m[-9:-5])  # rstu
    
  • Reverse order

    m = 'abcdefghigklmnopqrstuvwxyz'
    print(m[::-1])  # zyxwvutsrqponmlkgihgfedcba
    
  • If only start is set, it will "intercept" to the end

    m = 'abcdefghigklmnopqrstuvwxyz'
    print(m[2:])  # cdefghigklmnopqrstuvwxyz
    
  • If only end is set, it will "intercept" from the beginning

    m = 'abcdefghigklmnopqrstuvwxyz'print(m[:9])  # abcdefghi
    
  • If start and end are not in the range of [0, string length - 1], no error will be reported

    m = 'abcdefghigklmnopqrstuvwxyz'
    print(m[-100:-1])  # abcdefghigklmnopqrstuvwxy
    

String resident mechanism

String resident: a method to save only one identical and immutable string. Different values are stored in the string resident pool.

Python supports string resident mechanism, for strings that conform to identifier rules (only including underscores (?) , letters, and numbers) enables the string resident mechanism.

>>> a = 'abc_123'
>>> b = 'abc_123'
>>> a is b
True
>>> c = 'abc#'
>>> d = 'abc#'
>>> c is d
False

Variable string

In Python, strings belong to immutable objects and do not support in place modification. If you need to modify the value, you can only create a new string object. However, often we do need to modify the string in place. You can use io.StringIO Object or array module, no new string will be created.

>>> import io
>>> s = 'hello, Lucy'
>>> sio = io.StringIO(s)
>>> sio
<_io.StringIO object at 0x7f8bbfdd8948>
>>> sio.seek(4)
4
>>> sio.write('k')
1
>>> sio.getvalue()
'hellk, Lucy'

Tags: Python ascii encoding

Posted on Thu, 04 Jun 2020 09:20:04 -0400 by Xyox