matplotlib, a common module of python

See video: Python Tutorial 4 days to start Python data mining quickly

1 Introduction to Matplotlib

1.1 what is Matplotlib

  • Dedicated to developing 2D charts (including 3D charts)
  • Very simple to use
  • Data visualization in a gradual and interactive way

The name of matplotlib comes from:

  • Matrix matrix (2D data 2D chart)
  • plot - draw
  • lib - library Library Library

matlab matrix lab:

  • mat - matrix
  • lab - Laboratory

1.2 function of Matplotlib

Visualization is the key assistant tool in the whole data mining, which can clearly understand the data and adjust our analysis methods.

  • It can visualize the data and present it more intuitively
  • Make data more objective and persuasive

Dynamic visualization Library of web page: js Library - D3 ecrats

1.3 first Matplotlib diagram

import matplotlib.pyplot as plt
%matplotlib inline

plt.figure()          # Create canvas
plt.plot([1, 0, 9], [4, 5, 6])    # Drawing
plt.show()           # Figure display

1.4 three layer structure of Matplotlib

1.4.1 container layer

The container layer is mainly composed of Canvas, Figure and Axes.

  1. Canvas is the lowest system layer, which acts as a Sketchpad in the process of drawing, that is, a tool to place the canvas.
  2. Figure is the first layer above Canvas, and also the first layer of application layer that needs users to operate. It acts as the role of Canvas in the process of drawing.
  3. Axes is the second layer of application layer, which is equivalent to the role of coordinate system / drawing area on canvas during drawing.

explain:

  • Figure: refers to the whole figure (you can use plt.figure() set the size and resolution of the canvas)
  • Drawing area for Axes data, plt.subplot()
  • Axis: an axis in a coordinate system that contains size limits, scales, and scale labels

Features are:

  • A figure (canvas) can contain multiple axes (coordinate system / drawing area), but an axe can only belong to one figure
  • An axes (coordinate system / drawing area) can contain multiple axes (coordinate axes), including two 2d coordinate systems and three 3d coordinate systems

1.4.2 auxiliary display layer

The auxiliary display layer is the content of Axes (drawing area) except the image drawn according to the data, mainly including Axes appearance (facecolor), spines, axis, axis label, axis scale (tick), axis scale label (tick) Label, grid, legend, title, etc.

The setting of this layer can make the image display more intuitive and easy for users to understand, but it will not have a substantial impact on the image.

1.4.3 image layer

Image layer refers to the image drawn by plot, scatter, bar, histogram, pie and other functions in Axes according to the data.

Summary:

  • Canvas (Sketchpad) is located at the bottom layer, which is usually not accessible to users
  • Figure based on Canvas
  • Axes (drawing area) is based on Figure
  • axis, legend and other auxiliary display layers and image layers are all built on Axes

2. Plot and basic drawing function

2.1 drawing and saving of line chart

In order to better understand all basic mapping functions, we integrate all basic API usage through mapping of weather and temperature changes

2.1.1 matplotlib.pyplot modular

matplotlib.pyplot It includes a series of drawing functions similar to matlab. Its function acts on the current coordinate system (axes) of the current figure.

import matplotlib.pyplot as plt

 

2.1.2 line drawing and display

Show the weather in Shanghai for a week. For example, the weather temperature from Monday to Sunday is as follows

# 1. Create canvas
# plt.figure()
plt.figure(figsize = (20, 8), dpi = 80)

# 2. Drawing an image
plt.plot([1, 2, 3, 4, 5, 6, 7], [17, 17, 18, 15, 11, 11, 13])

# Save image
plt.savefig("test.png")

# 3. Display image
plt.show()

be careful: plt.show figure resources will be released. If you save an image after it is displayed, you can only save an empty image, so plt.savefig(“ xx.png ”)To be in plt.show Before ().

2.1.3 setting canvas properties and image saving

plt.figure(figsize = ), dpi = )
    figsize: specify the length, width and canvas size of the graph
    dpi: the sharpness of image, the sharpness of dot per inch image
    Return fig object

plt.savefig(path)
    Path: image storage path

2.2 improve the original line chart 1 (auxiliary display layer)

Case: display temperature change demand: draw a line chart of temperature change every minute from 11:00 to 12:00 in a city, and the temperature range is 15 ℃ ∼ \ sim ∼ 18 ℃

2.2.1 prepare data and draw initial line chart

import random
import matplotlib.pyplot as plt

# 1. Prepare data x y
x = range(60)
y_shanghai = [random.uniform(15, 18) for i in x]

# 2. Create canvas
plt.figure(figsize=(20, 8), dpi=80)

# 3. Drawing an image
plt.plot(x, y_shanghai)

# 4. Display diagram
plt.show()

2.2.2 add custom x,y scale

plt.xticks(x, [labels], **kwargs)
	x: Position of scale value to display
	[labels]: Display label used to set each interval
	**kwargs: Used to set appearance properties such as label font tilt and color
plt.yticks(y, [labels], **kwargs) 
	y: Position of scale value to display
	[labels]: Display label used to set each interval
	**kwargs: Used to set appearance properties such as label font tilt and color

Example:

# Modify x, y scale
# Prepare scale description for x
x_label = ["11 spot{}branch".format(i) for i in x]
plt.xticks(x[::5], x_label[::5])
# Scale description of preparation y
plt.yticks(range(0, 40, 5))
# plt.yticks(range(40)[::5])

If the Chinese problem has not been solved, it will be displayed like this

2.2.3 Chinese display problem solving

Add two lines of code:

plt.rcParams['font.sans-serif'] = ['SimHei']  # Used to display Chinese labels normally
plt.rcParams['axes.unicode_minus'] = False  # Used to display negative sign normally

2.2.4 add grid display

To see more clearly the values corresponding to the graph:

plt.grid(linestyle="--", alpha=0.5)

2.2.5 add description information

Add x-axis and y-axis description information and Title:

plt.xlabel("Time change")
plt.ylabel("temperature variation ")
plt.title("Temperature change of a city from 11:00 to 12:00 every minute")

2.3 improve the original line chart 2 (image layer)

Demand: add another city's temperature change

The temperature changes of the day in Beijing were collected, ranging from 1 ℃ to 3 ℃.

2.3.1 multiple plot

How to add another different figure in the same coordinate system is very simple. You only need to plot again, but you need to distinguish lines, as shown below

# 1. Prepare data x y
x = range(60)
y_shanghai = [random.uniform(15, 18) for i in x]
y_beijing = [random.uniform(1, 3) for i in x]   # Increase temperature data in Beijing

# 2. Create canvas
plt.figure(figsize=(20, 8), dpi=80)

# 3. Drawing an image
plt.plot(x, y_shanghai, color="r", linestyle="-.", label="Shanghai")
plt.plot(x, y_beijing, color="b", label="Beijing")    # You can draw multiple polylines with multiple plot s

# Show Legend 
plt.legend()		# Description of the upper right corner of the image

# Modify x, y scale
# Prepare scale description for x
x_label = ["11 spot{}branch".format(i) for i in x]
plt.xticks(x[::5], x_label[::5])
plt.yticks(range(0, 40, 5))

# Add grid display
plt.grid(linestyle="--", alpha=0.5)

# Add description
plt.xlabel("Time change")
plt.ylabel("temperature variation ")
plt.title("Temperature changes in Shanghai and Beijing from 11:00 to 12:00 every minute")

# 4. Display diagram
plt.show()

Two new places are used, one is for different display effect of broken line, the other is to add legend.

2.3.2 setting graphic style

Color character Style character
r red -Solid line
g green -- dashed line
b blue -. dash
w White : dotted line
c cyan ’’Leave blank, blank
m magenta
y yellow
k black

2.3.3 display legend

Note: if only plt.plot Setting the label in () can't display the legend finally. You need to display the legend through plt. legend().

plt.legend()	# Default best
plt.legend(loc="best")
plt.legend(loc=0)
	loc: Show location of legend
Location String Location Code
'best' (default) 0
'upper right' 1
'upper left' 2
'lower left' 3
'lower right' 4
'right' 5
'center left' 6
'center right' 7
'lower center' 8
'upper center' 9
'center' 10

2.4 multiple coordinate system display- plt.subplots (object oriented drawing method)

Demand: display weather maps of Shanghai and Beijing in different coordinate systems of the same map

It can be implemented through the subplots function (the old version has subplot, which is inconvenient to use). The subplots function is recommended

  • matplotlib.pyplot.subplots(nrows=1, ncols=1, **fig_kw) create a graph with multiple axes
Parameters: 
nrows,ncols: int, optional,  default:1 , number of rows / columns of the plot grid
**fig_kw : All additional keyword arguments are passed to the figure() call.

Returns (return object):
	fig: figure object
	ax: drawing area
		The methods of setting the title are different:
		set_xticks
		set_yticks
		set_xlabel
		set_ylabel

More methods on axes sub coordinate system: Reference https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes

figure, axes = plt.subplot(nrows=1, ncols=2, **fig_kw)	# 1 row and 2 columns
	axes[0].set_Method name(): First picture
        axes[1]: Second picture

Note: the plt. Function name () is equivalent to the procedure oriented drawing method, axes.set_ The method name () is equivalent to the object-oriented drawing method.

# 1. Prepare data x y
x = range(60)
y_shanghai = [random.uniform(15, 18) for i in x]
y_beijing = [random.uniform(1, 3) for i in x]

# 2. Create canvas
# plt.figure(figsize=(20, 8), dpi=80)
figure, axes = plt.subplots(nrows=1, ncols=2, figsize=(20, 8), dpi=80)

# 3. Drawing an image
axes[0].plot(x, y_shanghai, color="r", linestyle="-.", label="Shanghai")
axes[1].plot(x, y_beijing, color="b", label="Beijing")

# Show Legend 
axes[0].legend()
axes[1].legend()

# Modify x, y scale
# Prepare scale description for x
x_label = ["11 spot{}branch".format(i) for i in x]
# axes[0].set_xticks(x[::5],x_label[::5]) replaces the following two lines
axes[0].set_xticks(x[::5])
axes[0].set_xticklabels(x_label)	# Specific time display of x-axis scale
axes[0].set_yticks(range(0, 40, 5))
axes[1].set_xticks(x[::5])
axes[1].set_xticklabels(x_label)
axes[1].set_yticks(range(0, 40, 5))

# Add grid display
axes[0].grid(linestyle="--", alpha=0.5)
axes[1].grid(linestyle="--", alpha=0.5)

# Add description
axes[0].set_xlabel("Time change")
axes[0].set_ylabel("temperature variation ")
axes[0].set_title("The temperature change of every minute from 11:00 to 12:00 in Shanghai")
axes[1].set_xlabel("Time change")
axes[1].set_ylabel("temperature variation ")
axes[1].set_title("The temperature change of every minute from 11:00 to 12:00 in Beijing")

# 4. Display diagram
plt.show()

2.5 application scenario of line chart

An index changes with time:

  • Show the number of active users of the company's products (different regions) every day
  • Number of app s downloaded per day
  • Show the change of the number of user clicks over time after the new product functions go online
  • Expansion: drawing various mathematical function images

be careful: plt.plot() in addition to drawing line graphs, it can also be used to draw various mathematical function graphs

 
Drawing mathematical function image

import numpy as np
# 1. Prepare x, y data
# Sine function data
# x = np.linspace(-10, 10, 1000) # Generate (- 10,10) equal spacing numbers
y = np.sin(x)
# Image data of quadratic function
x = np.linspace(-1, 1, 1000)
y = 2 * x * x

# 2. Create canvas
plt.figure(figsize=(20, 8), dpi=80)

# 3. Drawing an image
plt.plot(x, y)

# Add grid display
plt.grid(linestyle="--", alpha=0.5)

# 4. Display image
plt.show()

3 types and significance of common figures

3.1 line chart

  • Line chart: a chart showing the increase or decrease of statistical quantity by the rise or fall of a line

Features: it can display the trend of data change and reflect the change of things. (change)

3.2 scatter diagram

  • Scatter diagram: use two sets of data to form multiple coordinate points, inspect the distribution of coordinate points, and judge whether there is some association between two variables or summarize the distribution mode of coordinate points

Features: judge whether there is quantitative correlation trend between variables, and display outliers (distribution law)

3.3 histogram

  • Histogram: data arranged in columns or rows of a worksheet can be drawn into a histogram.

Features: drawing continuous and discrete data, you can see the size of each data at a glance, and compare the differences between the data. (Statistics / comparison)

3.4 histogram

  • Histogram: the distribution of data represented by a series of vertical stripes or line segments with different heights. Generally, the horizontal axis is used to represent the data range, and the vertical axis is used to represent the distribution.

characteristic:

  1. Draw continuous data to show the distribution of one or more groups of data (Statistics)
  2. The histogram can also be used to observe and estimate which data are relatively concentrated and where the abnormal or isolated data are distributed

3.5 pie chart

  • Pie chart: used to represent the proportion of different classifications, and compare various classifications by radian size.

Characteristics: percentage of classified data (percentage)

4 scatter

Demand: exploring the relationship between housing area and housing price

# 1. Prepare data
x = [225.98, 247.07, 253.14, 457.85, 241.58, 301.01,  20.67, 288.64,
       163.56, 120.06, 207.83, 342.75, 147.9 ,  53.06, 224.72,  29.51,
        21.61, 483.21, 245.25, 399.25, 343.35]	# Housing area data

y = [196.63, 203.88, 210.75, 372.74, 202.41, 247.61,  24.9 , 239.34,
       140.32, 104.15, 176.84, 288.23, 128.79,  49.64, 191.74,  33.1 ,
        30.74, 400.02, 205.35, 330.64, 283.45]	# Housing price data

# 2. Create canvas
plt.figure(figsize=(20, 8), dpi=80)

# 3. Drawing an image
plt.scatter(x, y)

# 4. Display image
plt.show()

5 bar

matplotlib.pyplot.bar(x, y, width, align='center', **kwargs)

Parameters: 
x: sequence of scalars,Center point of horizontal axis of histogram
y: Ordinate

width: scalar or array-like, optional(Width of histogram)

align: {'center','edge'},optional, default: 'center'
Alignment of the bars to the x coordinates
'center': Center the base on the x positions
'edge': Align the left edges of the bars with the x positions(Position alignment of each histogram)

**kwargs: 
color: Choose the color of the histogram

Returns: 
'.BarContainer'
Container with all the bars and optionally errorbars

5.1 demand 1 - compare box office revenue of each film

# 1. Prepare data
movie_names = ['Raytheon 3: dusk of gods','Justice League: Injustice for All','Murder of Orient Express','Journey to dream seeking circle','Global Storm', 'Demon subduing biography','chase','Seventy seven days','Secret War','Berserker','other']
tickets = [73853,57767,22354,15969,14839,8725,8716,8318,7916,6764,52222]

# 2. Create canvas
plt.figure(figsize=(20, 8), dpi=80)

# Used to display Chinese labels normally
plt.rcParams['font.sans-serif'] = ['SimHei'

# 3. Draw histogram
x_ticks = range(len(movie_names))
plt.bar(x_ticks, tickets, color=['b','r','g','y','c','m','y','k','c','g','b'])

# Modify x scale
plt.xticks(x_ticks, movie_names)

# Add title
plt.title("Box office revenue comparison")

# Add grid display
plt.grid(linestyle="--", alpha=0.5)

# 4. Display image
plt.show()

5.2 demand 2 - how can I be more persuasive than the box office?

Compare box office for the same days

Sometimes, to be fair, we need to compare the box office of the first day and the first week of different films

# 1. Prepare data
movie_name = ['Raytheon 3: dusk of gods','Justice League: Injustice for All','Journey to dream seeking circle']

first_day = [10587.6,10062.5,1275.7]
first_weekend=[36224.9,34479.6,11830]
x = range(len(movie_name))

# 2. Create canvas
plt.figure(figsize=(20, 8), dpi=80)

# 3. Draw histogram
plt.bar(x, first_day, width=0.2, label="First day box office")
# plt.bar([0.2, 1.2, 2.2], first_weekend, width=0.2, label = "first week box office")
plt.bar([i+0.2 for i in x], first_weekend, width=0.2, label="First week box office")

# Show Legend 
plt.legend()

# Modify scale
plt.xticks([0.1, 1.1, 2.1], movie_name)

# 4. Display image
plt.show()

6 histogram

6.1 histogram introduction

Histogram, which is similar in shape to histogram, has a completely different meaning from histogram. Histogram involves the concept of statistics. Firstly, data should be grouped, and then the number of data elements in each group should be counted. In the coordinate system, the horizontal axis marks the endpoint of each group, the vertical axis represents the frequency, and the height of each rectangle represents the corresponding frequency. Such a statistical chart is called the frequency distribution histogram.

Example:

  • The frequency distribution histogram of the height of 36 students in class 1, grade 3 of a school is shown in the figure below

    (1) Which group has the most students in height?
    (2) How many students are over 160.5cm tall?

Related concepts:

  • Group number: in statistics, we divide data into groups according to different ranges. The number of groups divided is called group number
  • Group spacing: the difference between two endpoints of each group

6.2 comparison between histogram and histogram

  • Histogram: rectangle length → \ to → frequency or quantity of each group, width (representing category) → \ to → fixed, which is conducive to small dataset analysis.
  • Histogram: describes the frequency distribution of a group of data. The length of the rectangle is → \ to → the frequency or quantity of each group, and the width is → \ to → the group distance of each group. Therefore, its height and width are meaningful, which is conducive to displaying the statistical results of a large number of data sets.
  • Histograms help to understand the distribution of data, such as the mode, the approximate location of the median, whether there are gaps or outliers in the data.

1. Histogram shows the distribution of data, and histogram compares the size of data → \ to → the most fundamental difference.

  • Histogram shows the distribution of a group of data in the divided interval, but it can not see the specific size of a single data in an interval.

  • In the column chart, you can see the size of each data and compare it.

2. The x-axis of histogram is quantitative data, and the x-axis of histogram is classified data.

  • In the histogram, the variables on the x-axis are continuous intervals, which are usually expressed as numbers, such as "0-10g, 10-20g..." representing Apple weight , representing the time length of "0-10min, 10-20min
  • In the histogram, the variables on the x-axis are classified data, such as different country names and different game types.
  • Each column on the histogram is immovable, and the interval on the x-axis is continuous and fixed.
  • Each column on the histogram can be sorted at will. In some cases, it needs to be arranged according to the name of the classification data, and in some cases, it needs to be arranged according to the size of the value.

3. Histogram column has no interval and histogram column has interval

  • Because the intervals in the histogram are continuous. The interval of histogram is discrete.

4. The column width of histogram can be different, and the column width of histogram must be the same

  • The width of a column in a histogram must be the same because it has no numerical meaning.
  • In the histogram, the width of the column represents the length of the interval. According to the different interval, the width of the column can be different, but in theory it should be a multiple of the unit length.


For example, the U.S. Census Bureau surveyed 12.4 billion people's commuting time. Because the number of people who commuted in 45-150 minutes was too small, the interval was changed to 45-60 minutes, 60-90 minutes, 90-150 minutes, and the other intervals were all 5.

  • It can be seen that the data of Y axis is "number of people / group distance". In this case, the sum of the area of each column is equal to the total number of people investigated, and the area of the column is meaningful.
  • When the Y-axis of the figure above represents "interval number / total number / group distance", this histogram is the "frequency distribution histogram" of our junior high school learning, and the frequency refers to "interval number / total number". In such a histogram, the sum of the areas of all columns is equal to 1.

6.3 histogram drawing

Demand: film duration distribution

Now there are 250 movie durations. I want to count the distribution of these movie durations, such as the number of movies with durations ranging from 100 minutes to 120 minutes, and the frequency of their occurrence. How do you present these data?

6.3.1 histogram drawing api

matplotlib.pyplot.hist(x, bins=None, normed=None, **kwargs)

Parameters: 
	x:(n,) array or sequence of(n,)arrays,data
	bins: integer or sequence or 'auto',optional(Number of groups)
	normed: Display frequency or not, default to frequency

6.3.2 drawing

  • Set group spacing
  • Set the number of groups (usually for the case of less data, it is divided into 5-12 groups, with more data, change the graphic display mode)
    • Generally, there is a corresponding formula for the number of groups: number of groups (bins) = range / group distance = (max min) / group distance (rounding / /)
# 1. Prepare data
time = [131,  98, 125, 131, 124, 139, 131, 117, 128, 108, 135, 138, 131, 102, 107, 114, 119, 128, 121, 142, 127, 130, 124, 101, 110, 116, 117, 110, 128, 128, 115,  99, 136, 126, 134,  95, 138, 117, 111,78, 132, 124, 113, 150, 110, 117,  86,  95, 144, 105, 126, 130,126, 130, 126, 116, 123, 106, 112, 138, 123,  86, 101,  99, 136,123, 117, 119, 105, 137, 123, 128, 125, 104, 109, 134, 125, 127,105, 120, 107, 129, 116, 108, 132, 103, 136, 118, 102, 120, 114,105, 115, 132, 145, 119, 121, 112, 139, 125, 138, 109, 132, 134,156, 106, 117, 127, 144, 139, 139, 119, 140,  83, 110, 102,123,107, 143, 115, 136, 118, 139, 123, 112, 118, 125, 109, 119, 133,112, 114, 122, 109, 106, 123, 116, 131, 127, 115, 118, 112, 135,115, 146, 137, 116, 103, 144,  83, 123, 111, 110, 111, 100, 154,136, 100, 118, 119, 133, 134, 106, 129, 126, 110, 111, 109, 141,120, 117, 106, 149, 122, 122, 110, 118, 127, 121, 114, 125, 126,114, 140, 103, 130, 141, 117, 106, 114, 121, 114, 133, 137,  92,121, 112, 146,  97, 137, 105,  98, 117, 112,  81,  97, 139, 113,134, 106, 144, 110, 137, 137, 111, 104, 117, 100, 111, 101, 110,105, 129, 137, 112, 120, 113, 133, 112,  83,  94, 146, 133, 101,131, 116, 111,  84, 137, 115, 122, 106, 144, 109, 123, 116, 111,111, 133, 150]

# 2. Create canvas
plt.figure(figsize=(20, 8), dpi=80)

# 3. Draw histogram
distance = 2
group_num = int((max(time) - min(time)) / distance)	# Rounding

plt.hist(time, bins=group_num, density=True)

# Modify x-axis scale
plt.xticks(range(min(time), max(time) + 2, distance))

# Add grid
plt.grid(linestyle="--", alpha=0.5)

# 4. Display image
plt.show()

6.3.3 histogram points for attention

  1. Pay attention to group spacing
    Group spacing can affect the data distribution presented by histogram, so it needs to change group spacing many times when drawing histogram.
  2. Note the variables represented by the Y-axis
    The variables on the Y axis can be frequency (how many times the data appears), frequency (frequency / total times), frequency / group distance. Different variables will make the data distribution described by the histogram have different meanings.

7 pie

7.1 pie chart introduction

Pie chart is used to represent the proportion of different classifications, and compare various classifications by radian size.
Pie chart is divided into several blocks according to the proportion of classification. The whole pie represents the total amount of data. Each block (ARC) represents the proportion of the classification to the total. The sum of all blocks (ARC) is equal to 100%.

7.2 pie drawing

Pie api introduction: pay attention to the number of displayed percentages

pit.pie(x, labels= , autopct= , colors)

  • x: Quantity, auto percentage
  • labels: name of each part
  • autopct proportion display specified% 1.2f%%
    • %1.2f%%: display percentage,% - floating-point number, 1.2f-occupy one position, keep one decimal place,% - escape character,% - percentage sign output
  • colors: each part of the color

Demand: display the arrangement proportion of different films

# 1. Prepare data
movie_name = ['Raytheon 3: dusk of gods','Justice League: Injustice for All','Murder of Orient Express','Journey to dream seeking circle','Global Storm','Demon subduing biography','chase','Seventy seven days','Secret War','Berserker','other']

place_count = [60605,54546,45819,28243,13270,9945,7679,6799,6101,4621,20105]

# 2. Create canvas
plt.figure(figsize=(20, 8), dpi=80)

# 3. Draw pie chart
plt.pie(place_count, labels=movie_name, colors=['b','r','g','y','c','m','y','k','c','g','y'], autopct="%1.2f%%")

# Show Legend 
plt.legend()

# The displayed pie chart remains round
plt.axis('equal')

# 4. Display image
plt.show()

7.3 add axis

In order to keep the displayed pie chart round, axis needs to be added to ensure the same length and width plt.axis('equal '), otherwise the output pie chart is oval.

8 summary

Video: four days of Python tutorial of black horse start Python data mining quickly https://www.bilibili.com/video/BV1xt411v7z9?from=search&seid=1374736475069929050

Tags: Python MATLAB less

Posted on Tue, 16 Jun 2020 04:00:58 -0400 by PhilipXVIII18