Fluent Python reading notes - Chapter 2 - sequence types (list, tuple, etc.)

Chapter 2 array of sequences

2.1 sequence type overview

The Python standard library implements rich sequence types in C:

  • Container sequence

    list, tuple and collections.deque can store different types of data

  • Flat sequence

    str, bytes, byte array, memoryview, array.array these sequences can only accommodate one type

The container sequence stores the reference of the object, while the flat sequence stores the value of the object rather than the reference. That is, a flat sequence is actually a continuous memory space, but it can only store basic types such as characters, bytes and values.

Whether the sequence type can be modified can be divided into:

  • Mutable Sequence

    list,bytearray,array.array,collections.deque,memoryview

  • Immutable Sequence

    tuple,str,bytes

2.2 list derivation

The following describes the list, starting with list derivation

>>> x = 'ABC'
>>> dummy = [ord(x) for x in x]

Note: in Python 2. X, the assignment operation after the for keyword in list derivation may affect the variables with the same name in the context of list derivation, but this problem will not affect Python 3.

Syntax hint: Python ignores line breaks in [], {}, () in the code. (continuation character in Python: \)

Comparison of list derivation with filter and map

>>> symbols = '$¢£¥€¤'
>>> beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
>>> beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))

It can be seen that although the combination of filter and map can also achieve the effect of list derivation, it is obviously more complex, and its efficiency is not necessarily faster than list derivation.

In short, list derivation has only one function: generating lists. If you want to generate other sequence types, you need to use generator expressions.

2.3 generator expression

To generate a tuple, array.array and other sequence types, it is obvious that we can use list derivation to generate a list, and then convert the list to other sequence types. But generator expressions are a better choice. This is because the generator expression can produce elements one by one, instead of creating a complete list and then passing it into a constructor. Obviously, generator expressions save more memory.

The syntax of generator expressions is similar to list derivation, except that square brackets are replaced by parentheses.

>>> symbols = '$¢£¥€¤'
>>> tuple(ord(x) for x in symbols)
>>> import array
>>> array.array('I', (ord(x) for x in symbols))		# The first parameter I indicates that the type of value stored in the array is long

Note: parentheses can be omitted when the generator expression is the only parameter of the function call.

2.4 tuples

The first thing to be clear about tuples is that tuples are not just immutable lists.

Tuples can be used to record data

Each element in a tuple has its corresponding position, which enables us to use the tuple as a record, that is, give each position of the tuple a specific meaning.

>>> lax_coordinates = (33.9425, -118.4080)
>>> city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)
>>> traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567'), ('ESP', 'XDZ208556')]
>>> for country, _ in traveler_ids:
... 	print(country)

Note:_ Is a placeholder.

Tuple unpacking

In the above example, we assign the elements in ('Tokyo', 2003, 32450, 0.66, 8014) to the variables City, year, pop, CHG and area respectively, and all the assignments are completed in one line of code. Let's take another look at the example of tuple unpacking:

>>> lax_coordinates = (33.9425, -118.408056)
>>> lattitude, longitude = lax_coordinates

In Python, the values of two variables can be exchanged as follows:

>>> b, a = a, b

The above two are examples of parallel assignment. The so-called parallel assignment is to assign the elements in an iterative object to the tuple composed of corresponding variables.

In addition, you can also use * to disassemble an iteratable object as the parameter of the function:

>>> divmod(20, 8)  # The divmod(a, b) function combines the results of divisor and remainder operations and returns a tuple containing quotient and remainder (A / / B, a% B).
(2, 4)
>>> t = (20, 8)
>>> divmod(*t)
(2, 4)
>>> quotient, remainder = divmod(*t)
>>> quotient, remainder
(2, 4)

Or use * to process the remaining elements:

>>> a, b, *rest = range(5)
>>> a, b, rest	# Why is rest a list?
(0, 1, [2, 3, 4])	
>>> a, b, *rest = range(2)
>>> a, b, rest
(0, 1, [])

Note: in parallel assignment, * prefix can only be used in front of a variable, but this variable can appear anywhere in the assignment expression:

>>> a, *body, c, d = range(5)
>>> a, body, c, d
(0, [1, 2], 3, 4)

Nested tuple unpacking

Tuples that accept expressions can themselves be nested, such as (a, b, (c, d)). As long as the nested structure that accepts tuples conforms to the nested structure of the expression itself, Python can make the correct correspondence.

>>> metro_area = ('Tokyo', 'Japan', 36.933, (35.689, 139.691))
>>> name, cc, pop, (lat, lon) = metro_area
('Tokyo', 'Japan', 36.933, (35.689, 139.691))

named tuple

As mentioned earlier, tuples can be used to record data, and each position can have an expressed meaning. In Python, we can name the fields in the record, which requires collections.namedtuple, which is a factory function. It can be used to create a tuple with a field name and a named class:

>>> from collections import namedtuple
>>> City = namedtuple('City', 'name country population coordinates')
>>> tokyo = City('Tokyo', 'JP', 36.933, (35.689, 139.6916))
>>> tokyo
City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689, 139.6916))
>>> tokyo.country
'JP'
>>> tokyo[3]
(35.689, 139.6916)
  • Creating a named tuple requires two parameters: the class name and the name of each field of the class. The latter can be an iteratable object composed of several strings, or a string composed of field names separated by spaces.

  • You can get the corresponding element by field name or location

Named tuples have their own attributes:_ fields class attribute, class method_ make(iterable), and instance methods_ asdict().

>>> City._fields
('name', 'country', 'population', 'coordinates')
>>> delhi_data = ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889))
>>> delhi = City._make(delhi_data)
>>> delhi._asdict()
{'name': 'Delhi NCR',
 'country': 'IN',
 'population': 21.935,
 'coordinates': (28.613889, 77.208889)}
  • _ The fields class attribute is a tuple that includes all the field names of this class
  • Use_ make() generates an instance of the class by accepting an iteratable object, which is equivalent to City(*delhi_data)
  • _ asdict() returns the information of named tuples in the form of dict

Similarity between tuples and lists

Tuples support all methods of lists except those related to increasing or decreasing elements.

2.5 slicing

In python, sequence types such as list, tuple and string all support slicing.

Slicing and interval operations ignore the last element: This is in line with the tradition of C language with 0 as the starting subscript, that is, the interval is closed on the left and open on the right.

In fact, there are several advantages:

  • When there is only one location information, it is easy to see that there are several elements. For example, both range(3) and mylist[:3] have three elements.
  • When the starting position is known, end start can be used to quickly calculate the interval length.
  • You can use any subscript to divide the sequence into two non overlapping parts, just write mylist[:x] and mylist[x:].

Specific slicing operations

You can take c as the interval between a and b from s in the form of s[a:b:c]. At the same time, the value of c can be negative, which means the reverse value.

>>> s = 'bicycle'
>>> s[::3]
'bye'
>>> s[::-1]
'elcycib'
>>> s[::-2]
'eccb'

a:b:c this usage can only be used as an index or subscript to return a slice object: slice(a, b, c) in []. Of course, we can also name the slice manually, for example:

>>> SKU = slice(0, 2)
>>> s[SKU]
'bi'
Multidimensional slicing and omission (...)
Assign value to slice

If the slice is placed on the left of the assignment statement or as the object of del operation, the sequence can be grafted, cut, modified in place, etc.

>>> l = list(range(10))
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> l[2:5] = [20, 30]
>>> l
[0, 1, 20, 30, 5, 6, 7, 8, 9]
>>> del l[5:7]
>>> l
[0, 1, 20, 30, 5, 8, 9]

Note: if the assigned object is a slice, the right side of the assignment statement must be an iterative object. Even a single value needs to be converted into an iterative object.

2.6 use + and for sequences*

>>> l = [1, 2, 3]
>>> l * 5
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> 5 * 'abcd'
'abcdabcdabcdabcdabcd'

+And * will not modify the original sequence, but copy several copies and splice them together. That is to build a new sequence.

However, it should be noted that when using a*n statements, if the elements in sequence a are references to other variable objects, the results may be unexpected.

2.7 incremental assignment of sequences

The performance of the incremental assignment operators + = and * = depends on their first operand.

+=Behind it is a special method__ iadd__ (local addition). But if a class does not implement__ iadd__, Python will step back and call__ add__.

>>> a += b

For the above expression, if a is a variable object, it will be changed in place. But if a is not implemented__ iadd__, Then the effect of this expression becomes the same as a = a+b: first calculate a+b, get a new object, and then assign it to a. In general, variable sequences are implemented__ iadd__ Method, for immutable sequences, this operation is not supported at all.

There is a puzzle worth exploring about incremental assignment:

>>> t = (1, 2, [30, 40])
>>> t[2] += [50, 60]
Traceback (most recent call last):
  File "<ipython-input-53-d877fb0e9d36>", line 1, in <module>
    t[2] += [50, 60]
TypeError: 'tuple' object does not support item assignment
>>> t
(1, 2, [30, 40, 50, 60])
  • Don't put mutable objects in tuples
  • Incremental assignment is not an atomic operation
  • You can use dis.dis (sense) to view the bytecode after the statement

2.8 list.sort method and built-in function sorted

list.sort will sort the list in place, and the return value is None

sorted can accept any iteratable object as a parameter and create a new list as the return value.

Both functions have two optional keyword parameters reverse and key.

2.9 use bisect to manage sorted sequences

The bisect module mainly contains two main functions, bisect and import. Both functions use the binary search algorithm to find or insert elements in an ordered sequence.

2.10 the list is not preferred

array

When we need a list containing only numbers, array.array is more efficient than list. Because the float object in Python is not stored behind the array, but the byte representation.

Memory view

memoryview is a built-in class that allows users to manipulate different slices of the same array without copying the content. The personal feeling is that the content in the memory remains unchanged and the way of reading it (such as reading with unsigned integer or signed integer) is changed. The reading results are completely different.

NumPy and SciPy

No need to say more.

Bidirectional queues and other forms of queues

collections.deque, queue, multiprocessing, etc

This paper only summarizes the main contents of the book, and only mentions the contents that are not commonly used. Check them when you need to use them in the future. If you want to know more about it, you are strongly recommended to read the original book. The sample code in the book is full of python, which is very enjoyable and refreshing.

Tags: Python list

Posted on Sat, 25 Sep 2021 06:10:56 -0400 by banacan