1 Introduction to Matplotlib
1.1 what is Matplotlib
 Dedicated to developing 2D charts (including 3D charts)
 Very simple to use
 Data visualization in a gradual and interactive way
The name of matplotlib comes from:
 Matrix matrix (2D data 2D chart)
 plot  draw
 lib  library Library Library
matlab matrix lab:
 mat  matrix
 lab  Laboratory
1.2 function of Matplotlib
Visualization is the key assistant tool in the whole data mining, which can clearly understand the data and adjust our analysis methods.
 It can visualize the data and present it more intuitively
 Make data more objective and persuasive
Dynamic visualization Library of web page: js Library  D3 ecrats
1.3 first Matplotlib diagram
import matplotlib.pyplot as plt %matplotlib inline plt.figure() # Create canvas plt.plot([1, 0, 9], [4, 5, 6]) # Drawing plt.show() # Figure display
1.4 three layer structure of Matplotlib
1.4.1 container layer
The container layer is mainly composed of Canvas, Figure and Axes.
 Canvas is the lowest system layer, which acts as a Sketchpad in the process of drawing, that is, a tool to place the canvas.
 Figure is the first layer above Canvas, and also the first layer of application layer that needs users to operate. It acts as the role of Canvas in the process of drawing.
 Axes is the second layer of application layer, which is equivalent to the role of coordinate system / drawing area on canvas during drawing.
explain:
 Figure: refers to the whole figure (you can use plt.figure() set the size and resolution of the canvas)
 Drawing area for Axes data, plt.subplot()
 Axis: an axis in a coordinate system that contains size limits, scales, and scale labels
Features are:
 A figure (canvas) can contain multiple axes (coordinate system / drawing area), but an axe can only belong to one figure
 An axes (coordinate system / drawing area) can contain multiple axes (coordinate axes), including two 2d coordinate systems and three 3d coordinate systems
1.4.2 auxiliary display layer
The auxiliary display layer is the content of Axes (drawing area) except the image drawn according to the data, mainly including Axes appearance (facecolor), spines, axis, axis label, axis scale (tick), axis scale label (tick) Label, grid, legend, title, etc.
The setting of this layer can make the image display more intuitive and easy for users to understand, but it will not have a substantial impact on the image.
1.4.3 image layer
Image layer refers to the image drawn by plot, scatter, bar, histogram, pie and other functions in Axes according to the data.
Summary:
 Canvas (Sketchpad) is located at the bottom layer, which is usually not accessible to users
 Figure based on Canvas
 Axes (drawing area) is based on Figure
 axis, legend and other auxiliary display layers and image layers are all built on Axes
2. Plot and basic drawing function
2.1 drawing and saving of line chart
In order to better understand all basic mapping functions, we integrate all basic API usage through mapping of weather and temperature changes
2.1.1 matplotlib.pyplot modular
matplotlib.pyplot It includes a series of drawing functions similar to matlab. Its function acts on the current coordinate system (axes) of the current figure.
import matplotlib.pyplot as plt
2.1.2 line drawing and display
Show the weather in Shanghai for a week. For example, the weather temperature from Monday to Sunday is as follows
# 1. Create canvas # plt.figure() plt.figure(figsize = (20, 8), dpi = 80) # 2. Drawing an image plt.plot([1, 2, 3, 4, 5, 6, 7], [17, 17, 18, 15, 11, 11, 13]) # Save image plt.savefig("test.png") # 3. Display image plt.show()
be careful: plt.show figure resources will be released. If you save an image after it is displayed, you can only save an empty image, so plt.savefig(“ xx.png ”)To be in plt.show Before ().
2.1.3 setting canvas properties and image saving
plt.figure(figsize = )， dpi = ) figsize: specify the length, width and canvas size of the graph dpi: the sharpness of image, the sharpness of dot per inch image Return fig object plt.savefig(path) Path: image storage path
2.2 improve the original line chart 1 (auxiliary display layer)
Case: display temperature change demand: draw a line chart of temperature change every minute from 11:00 to 12:00 in a city, and the temperature range is 15 ℃ ∼ \ sim ∼ 18 ℃
2.2.1 prepare data and draw initial line chart
import random import matplotlib.pyplot as plt # 1. Prepare data x y x = range(60) y_shanghai = [random.uniform(15, 18) for i in x] # 2. Create canvas plt.figure(figsize=(20, 8), dpi=80) # 3. Drawing an image plt.plot(x, y_shanghai) # 4. Display diagram plt.show()
2.2.2 add custom x,y scale
plt.xticks(x, [labels], **kwargs) x: Position of scale value to display [labels]: Display label used to set each interval **kwargs: Used to set appearance properties such as label font tilt and color plt.yticks(y, [labels], **kwargs) y: Position of scale value to display [labels]: Display label used to set each interval **kwargs: Used to set appearance properties such as label font tilt and color
Example:
# Modify x, y scale # Prepare scale description for x x_label = ["11 spot{}branch".format(i) for i in x] plt.xticks(x[::5], x_label[::5]) # Scale description of preparation y plt.yticks(range(0, 40, 5)) # plt.yticks(range(40)[::5])
If the Chinese problem has not been solved, it will be displayed like this
2.2.3 Chinese display problem solving
Add two lines of code:
plt.rcParams['font.sansserif'] = ['SimHei'] # Used to display Chinese labels normally plt.rcParams['axes.unicode_minus'] = False # Used to display negative sign normally
2.2.4 add grid display
To see more clearly the values corresponding to the graph:
plt.grid(linestyle="", alpha=0.5)
2.2.5 add description information
Add xaxis and yaxis description information and Title:
plt.xlabel("Time change") plt.ylabel("temperature variation ") plt.title("Temperature change of a city from 11:00 to 12:00 every minute")
2.3 improve the original line chart 2 (image layer)
Demand: add another city's temperature change
The temperature changes of the day in Beijing were collected, ranging from 1 ℃ to 3 ℃.
2.3.1 multiple plot
How to add another different figure in the same coordinate system is very simple. You only need to plot again, but you need to distinguish lines, as shown below
# 1. Prepare data x y x = range(60) y_shanghai = [random.uniform(15, 18) for i in x] y_beijing = [random.uniform(1, 3) for i in x] # Increase temperature data in Beijing # 2. Create canvas plt.figure(figsize=(20, 8), dpi=80) # 3. Drawing an image plt.plot(x, y_shanghai, color="r", linestyle=".", label="Shanghai") plt.plot(x, y_beijing, color="b", label="Beijing") # You can draw multiple polylines with multiple plot s # Show Legend plt.legend() # Description of the upper right corner of the image # Modify x, y scale # Prepare scale description for x x_label = ["11 spot{}branch".format(i) for i in x] plt.xticks(x[::5], x_label[::5]) plt.yticks(range(0, 40, 5)) # Add grid display plt.grid(linestyle="", alpha=0.5) # Add description plt.xlabel("Time change") plt.ylabel("temperature variation ") plt.title("Temperature changes in Shanghai and Beijing from 11:00 to 12:00 every minute") # 4. Display diagram plt.show()
Two new places are used, one is for different display effect of broken line, the other is to add legend.
2.3.2 setting graphic style
Color character  Style character 

r red  Solid line 
g green   dashed line 
b blue  . dash 
w White  : dotted line 
c cyan  ’’Leave blank, blank 
m magenta  
y yellow  
k black 
2.3.3 display legend
Note: if only plt.plot Setting the label in () can't display the legend finally. You need to display the legend through plt. legend().
plt.legend() # Default best plt.legend(loc="best") plt.legend(loc=0) loc: Show location of legend
Location String  Location Code 

'best' (default)  0 
'upper right'  1 
'upper left'  2 
'lower left'  3 
'lower right'  4 
'right'  5 
'center left'  6 
'center right'  7 
'lower center'  8 
'upper center'  9 
'center'  10 
2.4 multiple coordinate system display plt.subplots (object oriented drawing method)
Demand: display weather maps of Shanghai and Beijing in different coordinate systems of the same map
It can be implemented through the subplots function (the old version has subplot, which is inconvenient to use). The subplots function is recommended
 matplotlib.pyplot.subplots(nrows=1, ncols=1, **fig_kw) create a graph with multiple axes
Parameters: nrows,ncols: int, optional, default:1 , number of rows / columns of the plot grid **fig_kw : All additional keyword arguments are passed to the figure() call. Returns (return object): fig: figure object ax: drawing area The methods of setting the title are different: set_xticks set_yticks set_xlabel set_ylabel
More methods on axes sub coordinate system: Reference https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes
figure, axes = plt.subplot(nrows=1, ncols=2, **fig_kw) # 1 row and 2 columns axes[0].set_Method name(): First picture axes[1]: Second picture
Note: the plt. Function name () is equivalent to the procedure oriented drawing method, axes.set_ The method name () is equivalent to the objectoriented drawing method.
# 1. Prepare data x y x = range(60) y_shanghai = [random.uniform(15, 18) for i in x] y_beijing = [random.uniform(1, 3) for i in x] # 2. Create canvas # plt.figure(figsize=(20, 8), dpi=80) figure, axes = plt.subplots(nrows=1, ncols=2, figsize=(20, 8), dpi=80) # 3. Drawing an image axes[0].plot(x, y_shanghai, color="r", linestyle=".", label="Shanghai") axes[1].plot(x, y_beijing, color="b", label="Beijing") # Show Legend axes[0].legend() axes[1].legend() # Modify x, y scale # Prepare scale description for x x_label = ["11 spot{}branch".format(i) for i in x] # axes[0].set_xticks(x[::5]，x_label[::5]) replaces the following two lines axes[0].set_xticks(x[::5]) axes[0].set_xticklabels(x_label) # Specific time display of xaxis scale axes[0].set_yticks(range(0, 40, 5)) axes[1].set_xticks(x[::5]) axes[1].set_xticklabels(x_label) axes[1].set_yticks(range(0, 40, 5)) # Add grid display axes[0].grid(linestyle="", alpha=0.5) axes[1].grid(linestyle="", alpha=0.5) # Add description axes[0].set_xlabel("Time change") axes[0].set_ylabel("temperature variation ") axes[0].set_title("The temperature change of every minute from 11:00 to 12:00 in Shanghai") axes[1].set_xlabel("Time change") axes[1].set_ylabel("temperature variation ") axes[1].set_title("The temperature change of every minute from 11:00 to 12:00 in Beijing") # 4. Display diagram plt.show()
2.5 application scenario of line chart
An index changes with time:
 Show the number of active users of the company's products (different regions) every day
 Number of app s downloaded per day
 Show the change of the number of user clicks over time after the new product functions go online
 Expansion: drawing various mathematical function images
be careful: plt.plot() in addition to drawing line graphs, it can also be used to draw various mathematical function graphs
Drawing mathematical function image
import numpy as np # 1. Prepare x, y data # Sine function data # x = np.linspace(10, 10, 1000) # Generate ( 10,10) equal spacing numbers y = np.sin(x) # Image data of quadratic function x = np.linspace(1, 1, 1000) y = 2 * x * x # 2. Create canvas plt.figure(figsize=(20, 8), dpi=80) # 3. Drawing an image plt.plot(x, y) # Add grid display plt.grid(linestyle="", alpha=0.5) # 4. Display image plt.show()
3 types and significance of common figures
3.1 line chart
 Line chart: a chart showing the increase or decrease of statistical quantity by the rise or fall of a line
Features: it can display the trend of data change and reflect the change of things. (change)
3.2 scatter diagram
 Scatter diagram: use two sets of data to form multiple coordinate points, inspect the distribution of coordinate points, and judge whether there is some association between two variables or summarize the distribution mode of coordinate points
Features: judge whether there is quantitative correlation trend between variables, and display outliers (distribution law)
3.3 histogram
 Histogram: data arranged in columns or rows of a worksheet can be drawn into a histogram.
Features: drawing continuous and discrete data, you can see the size of each data at a glance, and compare the differences between the data. (Statistics / comparison)
3.4 histogram
 Histogram: the distribution of data represented by a series of vertical stripes or line segments with different heights. Generally, the horizontal axis is used to represent the data range, and the vertical axis is used to represent the distribution.
characteristic:
 Draw continuous data to show the distribution of one or more groups of data (Statistics)
 The histogram can also be used to observe and estimate which data are relatively concentrated and where the abnormal or isolated data are distributed
3.5 pie chart
 Pie chart: used to represent the proportion of different classifications, and compare various classifications by radian size.
Characteristics: percentage of classified data (percentage)
4 scatter
Demand: exploring the relationship between housing area and housing price
# 1. Prepare data x = [225.98, 247.07, 253.14, 457.85, 241.58, 301.01, 20.67, 288.64, 163.56, 120.06, 207.83, 342.75, 147.9 , 53.06, 224.72, 29.51, 21.61, 483.21, 245.25, 399.25, 343.35] # Housing area data y = [196.63, 203.88, 210.75, 372.74, 202.41, 247.61, 24.9 , 239.34, 140.32, 104.15, 176.84, 288.23, 128.79, 49.64, 191.74, 33.1 , 30.74, 400.02, 205.35, 330.64, 283.45] # Housing price data # 2. Create canvas plt.figure(figsize=(20, 8), dpi=80) # 3. Drawing an image plt.scatter(x, y) # 4. Display image plt.show()
5 bar
matplotlib.pyplot.bar(x, y, width, align='center', **kwargs)
Parameters: x: sequence of scalars，Center point of horizontal axis of histogram y: Ordinate width: scalar or arraylike, optional(Width of histogram) align: {'center'，'edge'}，optional, default: 'center' Alignment of the bars to the x coordinates 'center': Center the base on the x positions 'edge': Align the left edges of the bars with the x positions(Position alignment of each histogram) **kwargs: color: Choose the color of the histogram Returns: '.BarContainer' Container with all the bars and optionally errorbars
5.1 demand 1  compare box office revenue of each film
# 1. Prepare data movie_names = ['Raytheon 3: dusk of gods','Justice League: Injustice for All','Murder of Orient Express','Journey to dream seeking circle','Global Storm', 'Demon subduing biography','chase','Seventy seven days','Secret War','Berserker','other'] tickets = [73853,57767,22354,15969,14839,8725,8716,8318,7916,6764,52222] # 2. Create canvas plt.figure(figsize=(20, 8), dpi=80) # Used to display Chinese labels normally plt.rcParams['font.sansserif'] = ['SimHei' # 3. Draw histogram x_ticks = range(len(movie_names)) plt.bar(x_ticks, tickets, color=['b','r','g','y','c','m','y','k','c','g','b']) # Modify x scale plt.xticks(x_ticks, movie_names) # Add title plt.title("Box office revenue comparison") # Add grid display plt.grid(linestyle="", alpha=0.5) # 4. Display image plt.show()
5.2 demand 2  how can I be more persuasive than the box office?
Compare box office for the same days
Sometimes, to be fair, we need to compare the box office of the first day and the first week of different films
# 1. Prepare data movie_name = ['Raytheon 3: dusk of gods','Justice League: Injustice for All','Journey to dream seeking circle'] first_day = [10587.6,10062.5,1275.7] first_weekend=[36224.9,34479.6,11830] x = range(len(movie_name)) # 2. Create canvas plt.figure(figsize=(20, 8), dpi=80) # 3. Draw histogram plt.bar(x, first_day, width=0.2, label="First day box office") # plt.bar([0.2, 1.2, 2.2], first_weekend, width=0.2, label = "first week box office") plt.bar([i+0.2 for i in x], first_weekend, width=0.2, label="First week box office") # Show Legend plt.legend() # Modify scale plt.xticks([0.1, 1.1, 2.1], movie_name) # 4. Display image plt.show()
6 histogram
6.1 histogram introduction
Histogram, which is similar in shape to histogram, has a completely different meaning from histogram. Histogram involves the concept of statistics. Firstly, data should be grouped, and then the number of data elements in each group should be counted. In the coordinate system, the horizontal axis marks the endpoint of each group, the vertical axis represents the frequency, and the height of each rectangle represents the corresponding frequency. Such a statistical chart is called the frequency distribution histogram.
Example:
 The frequency distribution histogram of the height of 36 students in class 1, grade 3 of a school is shown in the figure below
(1) Which group has the most students in height?
(2) How many students are over 160.5cm tall?
Related concepts:
 Group number: in statistics, we divide data into groups according to different ranges. The number of groups divided is called group number
 Group spacing: the difference between two endpoints of each group
6.2 comparison between histogram and histogram
 Histogram: rectangle length → \ to → frequency or quantity of each group, width (representing category) → \ to → fixed, which is conducive to small dataset analysis.
 Histogram: describes the frequency distribution of a group of data. The length of the rectangle is → \ to → the frequency or quantity of each group, and the width is → \ to → the group distance of each group. Therefore, its height and width are meaningful, which is conducive to displaying the statistical results of a large number of data sets.
 Histograms help to understand the distribution of data, such as the mode, the approximate location of the median, whether there are gaps or outliers in the data.
1. Histogram shows the distribution of data, and histogram compares the size of data → \ to → the most fundamental difference.

Histogram shows the distribution of a group of data in the divided interval, but it can not see the specific size of a single data in an interval.

In the column chart, you can see the size of each data and compare it.
2. The xaxis of histogram is quantitative data, and the xaxis of histogram is classified data.
 In the histogram, the variables on the xaxis are continuous intervals, which are usually expressed as numbers, such as "010g, 1020g..." representing Apple weight , representing the time length of "010min, 1020min
 In the histogram, the variables on the xaxis are classified data, such as different country names and different game types.
 Each column on the histogram is immovable, and the interval on the xaxis is continuous and fixed.
 Each column on the histogram can be sorted at will. In some cases, it needs to be arranged according to the name of the classification data, and in some cases, it needs to be arranged according to the size of the value.
3. Histogram column has no interval and histogram column has interval
 Because the intervals in the histogram are continuous. The interval of histogram is discrete.
4. The column width of histogram can be different, and the column width of histogram must be the same
 The width of a column in a histogram must be the same because it has no numerical meaning.
 In the histogram, the width of the column represents the length of the interval. According to the different interval, the width of the column can be different, but in theory it should be a multiple of the unit length.
For example, the U.S. Census Bureau surveyed 12.4 billion people's commuting time. Because the number of people who commuted in 45150 minutes was too small, the interval was changed to 4560 minutes, 6090 minutes, 90150 minutes, and the other intervals were all 5.
 It can be seen that the data of Y axis is "number of people / group distance". In this case, the sum of the area of each column is equal to the total number of people investigated, and the area of the column is meaningful.
 When the Yaxis of the figure above represents "interval number / total number / group distance", this histogram is the "frequency distribution histogram" of our junior high school learning, and the frequency refers to "interval number / total number". In such a histogram, the sum of the areas of all columns is equal to 1.
6.3 histogram drawing
Demand: film duration distribution
Now there are 250 movie durations. I want to count the distribution of these movie durations, such as the number of movies with durations ranging from 100 minutes to 120 minutes, and the frequency of their occurrence. How do you present these data?
6.3.1 histogram drawing api
matplotlib.pyplot.hist(x, bins=None, normed=None, **kwargs)
Parameters: x:(n，) array or sequence of(n，)arrays，data bins: integer or sequence or 'auto'，optional(Number of groups) normed: Display frequency or not, default to frequency
6.3.2 drawing
 Set group spacing
 Set the number of groups (usually for the case of less data, it is divided into 512 groups, with more data, change the graphic display mode)
 Generally, there is a corresponding formula for the number of groups: number of groups (bins) = range / group distance = (max min) / group distance (rounding / /)
# 1. Prepare data time = [131, 98, 125, 131, 124, 139, 131, 117, 128, 108, 135, 138, 131, 102, 107, 114, 119, 128, 121, 142, 127, 130, 124, 101, 110, 116, 117, 110, 128, 128, 115, 99, 136, 126, 134, 95, 138, 117, 111,78, 132, 124, 113, 150, 110, 117, 86, 95, 144, 105, 126, 130,126, 130, 126, 116, 123, 106, 112, 138, 123, 86, 101, 99, 136,123, 117, 119, 105, 137, 123, 128, 125, 104, 109, 134, 125, 127,105, 120, 107, 129, 116, 108, 132, 103, 136, 118, 102, 120, 114,105, 115, 132, 145, 119, 121, 112, 139, 125, 138, 109, 132, 134,156, 106, 117, 127, 144, 139, 139, 119, 140, 83, 110, 102,123,107, 143, 115, 136, 118, 139, 123, 112, 118, 125, 109, 119, 133,112, 114, 122, 109, 106, 123, 116, 131, 127, 115, 118, 112, 135,115, 146, 137, 116, 103, 144, 83, 123, 111, 110, 111, 100, 154,136, 100, 118, 119, 133, 134, 106, 129, 126, 110, 111, 109, 141,120, 117, 106, 149, 122, 122, 110, 118, 127, 121, 114, 125, 126,114, 140, 103, 130, 141, 117, 106, 114, 121, 114, 133, 137, 92,121, 112, 146, 97, 137, 105, 98, 117, 112, 81, 97, 139, 113,134, 106, 144, 110, 137, 137, 111, 104, 117, 100, 111, 101, 110,105, 129, 137, 112, 120, 113, 133, 112, 83, 94, 146, 133, 101,131, 116, 111, 84, 137, 115, 122, 106, 144, 109, 123, 116, 111,111, 133, 150] # 2. Create canvas plt.figure(figsize=(20, 8), dpi=80) # 3. Draw histogram distance = 2 group_num = int((max(time)  min(time)) / distance) # Rounding plt.hist(time, bins=group_num, density=True) # Modify xaxis scale plt.xticks(range(min(time), max(time) + 2, distance)) # Add grid plt.grid(linestyle="", alpha=0.5) # 4. Display image plt.show()
6.3.3 histogram points for attention
 Pay attention to group spacing
Group spacing can affect the data distribution presented by histogram, so it needs to change group spacing many times when drawing histogram.
 Note the variables represented by the Yaxis
The variables on the Y axis can be frequency (how many times the data appears), frequency (frequency / total times), frequency / group distance. Different variables will make the data distribution described by the histogram have different meanings.
7 pie
7.1 pie chart introduction
Pie chart is used to represent the proportion of different classifications, and compare various classifications by radian size.
Pie chart is divided into several blocks according to the proportion of classification. The whole pie represents the total amount of data. Each block (ARC) represents the proportion of the classification to the total. The sum of all blocks (ARC) is equal to 100%.
7.2 pie drawing
Pie api introduction: pay attention to the number of displayed percentages
pit.pie(x, labels= , autopct= , colors)
 x: Quantity, auto percentage
 labels: name of each part
 autopct proportion display specified% 1.2f%%
 %1.2f%%: display percentage,%  floatingpoint number, 1.2foccupy one position, keep one decimal place,%  escape character,%  percentage sign output
 colors: each part of the color
Demand: display the arrangement proportion of different films
# 1. Prepare data movie_name = ['Raytheon 3: dusk of gods','Justice League: Injustice for All','Murder of Orient Express','Journey to dream seeking circle','Global Storm','Demon subduing biography','chase','Seventy seven days','Secret War','Berserker','other'] place_count = [60605,54546,45819,28243,13270,9945,7679,6799,6101,4621,20105] # 2. Create canvas plt.figure(figsize=(20, 8), dpi=80) # 3. Draw pie chart plt.pie(place_count, labels=movie_name, colors=['b','r','g','y','c','m','y','k','c','g','y'], autopct="%1.2f%%") # Show Legend plt.legend() # The displayed pie chart remains round plt.axis('equal') # 4. Display image plt.show()
7.3 add axis
In order to keep the displayed pie chart round, axis needs to be added to ensure the same length and width plt.axis('equal '), otherwise the output pie chart is oval.
8 summary
Video: four days of Python tutorial of black horse start Python data mining quickly https://www.bilibili.com/video/BV1xt411v7z9?from=search&seid=1374736475069929050