Data science-05

Data analysis-05

Data analysis-05

Basic functions of matplotlib

Basic drawing

1) Drawing core API

Case study: drawing a simple line

[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-XluIK42b-1593310852588)(C:/Users/xuming/Desktop/20.05 R & D / new data analysis courseware / images /% E7% 9b% B4% E7% Ba% BF.png ]

import numpy as np
import matplotlib.pyplot as plt

# Draw a simple line
x = np.array([1, 2, 3, 4, 5])
y = np.array([3, 6, 9, 12, 15])

# Draw horizontal and vertical lines
plt.axhline(y=6, ls=":", c="blue")  # Add horizontal line
plt.axvline(x=4, ls="-", c="red")  # Add vertical line

# Draw a polyline
plt.vlines([2, 3, 3.5],  # x coordinate value of vertical line
           [10, 20, 30],  # Starting y coordinate of each vertical line
           [25, 35, 45])  # End y coordinate of each vertical line

plt.plot(x, y)
plt.show() # Show picture, blocking method
2) Set linetype and lineweight

[the external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-ERaDjH36-1593310852591)(C:/Users/xuming/Desktop/20.05 R & D / new data analysis courseware / images / sin)_ cos%E6%9B%B2%E7%BA% BF.png ]

linestyle: set the line type. Common values include solid line ('-'), dotted line ('-'), dotted line ('-.'), dotted line (':')

linewidth: lineweights

color: red, blue, green

alpha: set transparency (between 0 and 1)

Case: draw sine and cosine curves, and set linetype, lineweight, color, transparency

# Draw sine curve
import numpy as np
import matplotlib.pyplot as plt
import math

x = np.arange(0, 2 * np.pi, 0.1)  # Generate 0-6 data in 0.1
print(x)
y1 = np.sin(x)
y2 = np.cos(x)

# Draw a figure
plt.plot(x, y1, label="sin", linewidth=2)  # Solid line, line width 2 pixels
plt.plot(x, y2, label="cos", linestyle="--", linewidth=4)  # Dashed line, line width 4 pixels

plt.xlabel("x")  # x-axis text
plt.ylabel("y")  # y-axis text

# Set axis range
plt.xlim(0, 2 * math.pi)
plt.ylim(-1, 2)

plt.title("sin & cos")  # Icon title
plt.legend()  # legend
plt.show()
3) Set axis range

Syntax:

#x_limt_min: < float > minimum value of x-axis range
#x_limit_max: < float > Max X-axis range
plt.xlim(x_limt_min, x_limit_max)
#y_limt_min: < float > minimum value of y-axis range
#y_limit_max: < float > maximum value of y-axis range
plt.ylim(y_limt_min, y_limit_max)
4) Set coordinate scale

[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-japh3dx-1593310852593) (C: / users / Xuming / desktop / 20.05 R & D / new data analysis courseware / images /% E4% B8% 80% E5% 85% 83% E4% Ba% 8C% E6% AC% A1% E6% 9b% B2% E7% Ba% BF.png ]

Syntax:

#x_val_list: x-axis scale value sequence
#x_text_list: x-axis scale label text sequence [optional]
plt.xticks(x_val_list , x_text_list )
#y_val_list: y-axis scale value sequence
#y_text_list: y-axis scale label text sequence [optional]
plt.yticks(y_val_list , y_text_list )

Case: drawing quadratic function curve

# Drawing quadratic function curve
import numpy as np
import matplotlib.pyplot as plt
import math

x = np.arange(-5, 5, 0.1)  # Generate - 5 ~ 5 data in 0.1
print(x)
y = x ** 2

# Draw a figure
plt.plot(x, y, label="$y = x ^ 2$",
         linewidth=2,  # Line width 2 pixels
         color="red",  # colour
         alpha=0.5)  # transparency

plt.xlabel("x")  # x-axis text
plt.ylabel("y")  # y-axis text

# Set axis range
plt.xlim(-10, 10)
plt.ylim(-1, 30)

# Set scale
x_tck = np.arange(-10, 10, 2)
x_txt = x_tck.astype("U")
plt.xticks(x_tck, x_txt)

y_tck = np.arange(-1, 30, 5)
y_txt = y_tck.astype("U")
plt.yticks(y_tck, y_txt)

plt.title("square")  # Icon title
plt.legend(loc="upper right")  # Legend upper right, center
plt.show()

Special syntax for scale text - LaTex typesetting syntax string

r'$x^n+y^n=z^n$',   r'$\int\frac{1}{x} dx = \ln |x| + C$',     r'$-\frac{\pi}{2}$'

xn+yn=zn,∫1xdx=ln⁡∣x∣+C,−π2 x^n+y^n=z^n, \int\frac{1}{x} dx = \ln |x| + C, -\frac{\pi}{2} xn+yn=zn,∫x1​dx=ln∣x∣+C,−2π​

5) Set axis

[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-unV8yWdh-1593310852596)(C:/Users/xuming/Desktop/20.05 R & D / new data analysis courseware / images/%E5%9D%90%E6%A0%87%E8%BD%B4%E6%A0%BC%E5%BC%8F.png))

Axis Name: left / right / bottom / top

# Get the current coordinate axis dictionary, {'left': left axis, 'right': right axis, 'bottom': lower axis, 'top': upper axis}
ax = plt.gca()
# Get one of the axes
axis = ax.spines['Axis name']
# Sets the position of the axis. The method needs to pass in tuples of 2 elements as parameters
# Type: < STR > the reference type of the moving coordinate axis is generally 'data' (take the value of data as the moving reference value)
# val: reference value
axis.set_position(('data', val))
# Set the color of the axis
# Color: < STR > color value string
axis.set_color(color)

Case: format axis

# Set axis
import matplotlib.pyplot as plt

ax = plt.gca()
axis_b = ax.spines['bottom']  # Get lower axis
axis_b.set_position(('data', 0))  # Set the position of the lower axis with data as the reference value

axis_l = ax.spines['left']  # Get left axis
axis_l.set_position(('data', 0))  # Set the left axis position with data as reference value

ax.spines['top'].set_color('none')  # Set top axis colorless
ax.spines['right'].set_color('none')  # Set right axis colorless

plt.show()
6) Legend

Show a legend for both curves and test the loc attribute.

# label defining curve when drawing curve again
# Label: < keyword parameter STR > supports LaTex typesetting syntax string
plt.plot(xarray, yarray ... label='', ...)
# Set the location of the legend
# loc: < keyword parameter > set the display position of Legend (if loc is not set, the default position will be displayed)
#	 ===============   =============
#    Location String   Location Code
#    ===============   =============
#    'best'            0
#    'upper right'     1
#    'upper left'      2
#    'lower left'      3
#    'lower right'     4
#    'right'           5
#    'center left'     6
#    'center right'    7
#    'lower center'    8
#    'upper center'    9
#    'center'          10
#    ===============   =============
plt.legend(loc='')
7) Special points

[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-TBAE9aFM-1593310852598)(C:/Users/xuming/Desktop/20.05 R & D / new version of data analysis courseware / images/%E7%89%B9%E6%AE%8A%E7%82%B9.png))

Syntax:

# Xarray: < sequence > sequence composed of horizontal coordinates of all points to be marked
# Yarray: < sequence > the sequence consisting of the vertical coordinates of all the points to be marked
plt.scatter(xarray, yarray, 
           marker='', 		#Point type~ matplotlib.markers
           s='', 			#size
           edgecolor='', 	#Edge color
           facecolor='',	#Fill color
           zorder=3			#Draw layer number (the larger the number, the higher the layer)
)

Example: adding special points to a quadratic function image

# Draw special points
plt.scatter(x_tck,  # x coordinate array
            x_tck ** 2,  # y coordinate array
            marker="s",  # Point shape s:square
            s=40,  # size
            facecolor="blue",  # Fill color
            zorder=3)  # Layer number

marker dot type can refer to: help( matplotlib.markers )

See also appendix: matplotlib point style

8) Remarks

[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-DFLnmz70-1593310852599)(C:/Users/xuming/Desktop/20.05 R & D / new data analysis courseware / images/%E6%B7%BB%E5%8A%A0%E5%A4%87%E6%B3%A8.png))

Syntax:

# Add notes to a point in the chart. Including the settings of note text, note arrow and other images.
plt.annotate(
    r'$\frac{\pi}{2}$',			#Text content displayed in notes
    xycoords='data',			#Note the coordinate system used by the target point (data represents the data coordinate system)
    xy=(x, y),	 				#Note the coordinates of the target point
    textcoords='offset points',	#Note the coordinate system used by the text (offset points refers to the offset coordinate system of the reference point)
    xytext=(x, y),				#Coordinates of note text
    fontsize=14,				#Font size of note text
    arrowprops=dict()			#Use the dictionary to define the arrow style of the text pointing to the target point
)

The arrowprops parameter uses a dictionary to define the arrow style to the target point

#Common key of arrowprops dictionary parameter
arrowprops=dict(
	arrowstyle='',		#Define arrow style
    connectionstyle=''	#Define the style of the connector
)

The arrow style string is as follows

============   =============================================
Name           Attrs
============   =============================================
  '-'          None
  '->'         head_length=0.4,head_width=0.2
  '-['         widthB=1.0,lengthB=0.2,angleB=None
  '|-|'        widthA=1.0,widthB=1.0
  '-|>'        head_length=0.4,head_width=0.2
  '<-'         head_length=0.4,head_width=0.2
  '<->'        head_length=0.4,head_width=0.2
  '<|-'        head_length=0.4,head_width=0.2
  '<|-|>'      head_length=0.4,head_width=0.2
  'fancy'      head_length=0.4,head_width=0.4,tail_width=0.4
  'simple'     head_length=0.5,head_width=0.5,tail_width=0.2
  'wedge'      tail_width=0.3,shrink_factor=0.5
============   =============================================


The connection style string is as follows

============   =============================================
Name           Attrs
============   =============================================
  'angle' 		angleA=90,angleB=0,rad=0.0
  'angle3' 		angleA=90,angleB=0`   
  'arc'			angleA=0,angleB=0,armA=None,armB=None,rad=0.0
  'arc3' 		rad=0.0
  'bar' 		armA=0.0,armB=0.0,fraction=0.3,angle=None
============   =============================================



Example: adding a note to a quadratic function image

# Set notes
plt.annotate(
    r'$y = x ^ 2$',			#Text content displayed in notes
    xycoords='data',			#Note the coordinate system used by the target point (data represents the data coordinate system)
    xy=(4, 16),	 				#Note the coordinates of the target point (4,16)
    textcoords='offset points',	#Note the coordinate system used by the text (offset points refers to the offset coordinate system of the reference point)
    xytext=(20, 30),				#Coordinates of note text
    fontsize=14,				#Font size of note text
    arrowprops=dict(
        arrowstyle="->", connectionstyle="angle3"
    )			#Use the dictionary to define the arrow style of the text pointing to the target point
)

Advanced drawing

Syntax: draw two windows and display them together.

# Build matplotlib window manually
plt.figure(
    'sub-fig',					#Window title bar text 
    figsize=(4, 3),		#Window size < tuple >
	facecolor=''		#Chart background color
)
plt.show()

plt.figure Method can not only build a new window, but also use figure method to build a window with title='xxx', mp will not create a new window, but set the window with title='xxx' as the current operation window.

Set parameters for the current window

Syntax: test window related parameters

# Set chart title to display above chart
plt.title(title, fontsize=12)
# Set text for horizontal axis
plt.xlabel(x_label_str, fontsize=12)
# Set text for vertical axis
plt.ylabel(y_label_str, fontsize=12)
# Set the scale parameter labelsize to set the scale font size
plt.tick_params(..., labelsize=8, ...)
# Set chart grid linestyle set gridline style
	#	-or solid thick line
	#   --or dashed
	#   -. or dashdot dotted line
	#   : or dotted dotted line
plt.grid(linestyle='')
# Set up a compact layout and display the relevant parameters of the chart in the window
plt.tight_layout() 

Example: drawing two image windows

# Draw two image windows
import matplotlib.pyplot as plt

plt.figure("FigureA", facecolor="lightgray")
plt.grid(linestyle="-.")  # Set gridlines

plt.figure("FigureB", facecolor="gray")
plt.xlabel("Date", fontsize=14)
plt.ylabel("Price", fontsize=14)
plt.grid(linestyle="--")  # Set gridlines
plt.tight_layout()  # Set compact layout

plt.show()

Execution result:

1) Subgraph

Matrix layout

Draw matrix sub graph layout related API:

plt.figure('Subplot Layout', facecolor='lightgray')
# Split matrix
	# Rows: number of rows
    # cols: number of columns
    # num: number
plt.subplot(rows, cols, num)
	#	1 2 3
	#	4 5 6
	#	7 8 9 
plt.subplot(3, 3, 5)		#Sub figure No. 5 in matrix of operation 3 * 3
plt.subplot(335)			#Shorthand

Case: draw 9-house matrix subgraph, write a number in each subgraph.

plt.figure('Subplot Layout', facecolor='lightgray')

for i in range(9):
	plt.subplot(3, 3, i+1)
	plt.text(
		0.5, 0.5, i+1, 
		ha='center',
		va='center',
		size=36,
		alpha=0.5,
		withdash=False
	)
	plt.xticks([])
	plt.yticks([])

plt.tight_layout()
plt.show()

Execution result:

[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-TEdnknrO-1593310852600)(C:/Users/xuming/Desktop/20.05 R & D / new data analysis courseware / images / 9% E4% B8% AA% E5% ad% 90% E5% 9b%) BE.png ]

Net layout (rarely used)

Grid layout supports cell merging.

API for drawing layout of network format sub graph:

import matplotlib.gridspec as mg
plt.figure('Grid Layout', facecolor='lightgray')
# Call GridSpec method to split grid format layout
# Rows: number of rows
# cols: number of columns
# gs = mg.GridSpec(rows, cols) split into three rows and three columns
gs = mg.GridSpec(3, 3)	
# Merge 0 rows with 0, 1 columns as a child chart
plt.subplot(gs[0, :2])
plt.text(0.5, 0.5, '1', ha='center', va='center', size=36)
plt.show()

Case: draw a custom grid layout.

import matplotlib.gridspec as mg
plt.figure('GridLayout', facecolor='lightgray')
gridsubs = plt.GridSpec(3, 3)
# Merge 0 row and 0 / 1 column as a sub graph
plt.subplot(gridsubs[0, :2])
plt.text(0.5, 0.5, 1, ha='center', va='center', size=36)
plt.tight_layout()
plt.xticks([])
plt.yticks([])

Freestyle layout (rarely used)

API related to free layout:

plt.figure('Flow Layout', facecolor='lightgray')
# Set the position of the icon, give the coordinates of the lower left corner and the width and height
# left_bottom_x: X coordinate of sitting corner
# left_bottom_x: y coordinate of sitting corner
# Width: width
# Height: height
# plt.axes([left_bottom_x, left_bottom_y, width, height])
plt.axes([0.03, 0.03, 0.94, 0.94])
plt.text(0.5, 0.5, '1', ha='center', va='center', size=36)
plt.show()

Case: test the freestyle layout and locate the subgraph.

plt.figure('FlowLayout', facecolor='lightgray')

plt.axes([0.1, 0.2, 0.5, 0.3])
plt.text(0.5, 0.5, 1, ha='center', va='center', size=36)
plt.show()

2) Scatter diagram

Different feature values can be represented by the coordinates, colors, sizes, and shapes of each point.

height weight Gender age group race
180 80 male middle age Asia
160 50 female Young people America

API for drawing scatter diagram:

plt.scatter(
    x, 					# x-axis coordinate array
    y,					# y coordinate array
    marker='', 			# Point type
    s=10,				# size
    color='',			# colour
    edgecolor='', 		# Edge color
    facecolor='',		# Fill color
    zorder=''			# Layer ordinal
)

numpy.random The normal function is provided to generate random numbers that conform to the normal distribution

n = 100
# 172: expected value
# 10: Standard deviation
# n: Number of digital generation
x = np.random.normal(172, 20, n)
y = np.random.normal(60, 10, n)

Case: draw a plan scatter diagram.

# Scatter diagram example
import matplotlib.pyplot as plt
import numpy as np

n = 40
# Expected value: the expected value is the average of the output value of the variable
# Standard deviation: it is the most commonly used quantitative form to reflect the dispersion degree of a group of data and an important indicator to express the accuracy
x = np.random.normal(172, 20 ,n ) # Expected value, standard deviation, generated quantity
y = np.random.normal(60, 10, n) # Expected value, standard deviation, generated quantity

x2 = np.random.normal(180, 20 ,n ) # Expected value, standard deviation, generated quantity
y2 = np.random.normal(70, 10, n) # Expected value, standard deviation, generated quantity

plt.figure("scatter", facecolor="lightgray")
plt.title("Scatter Demo")
plt.scatter(x, y, c="red", marker="D")
plt.scatter(x2, y2, c="blue", marker="v")

plt.xlim(100, 240)
plt.ylim(0, 100)
plt.show()

cmap color mapping table reference attachment: cmap color mapping table

3) Fill

Automatically fills the closed area of two curves with a certain color.

plt.fill_between(
	x,				# Horizontal coordinate of x axis
    sin_x,			# Vertical coordinates of points on the lower boundary curve
    cos_x,			# Vertical coordinates of points on the upper boundary curve
    sin_x<cos_x, 	# Fill condition, fill when True
    color='', 		# fill color
    alpha=0.2		# transparency
)

Case: drawing two curves: sin_x = sin(x) cos_x = cos(x / 2) / 2 [0-8π]

import matplotlib.pyplot as plt
import numpy as np

n = 1000
x = np.linspace(0, 8 * np.pi, n)  # Returns an equidistant number at a specified interval

sin_y = np.sin(x)  # Calculate sin function value
cos_y = np.cos(x / 2) / 2  # Calculating the value of cos function

plt.figure('Fill', facecolor='lightgray')
plt.title('Fill', fontsize=20)
plt.xlabel('x', fontsize=14)  # x-axis label
plt.ylabel('y', fontsize=14)  # y axis
plt.tick_params(labelsize=10)  # scale
plt.grid(linestyle=':')

plt.plot(x, sin_y, c='dodgerblue', label=r'$y=sin(x)$')
plt.plot(x, cos_y, c='orangered', label=r'$y=\frac{1}{2}cos(\frac{x}{2})$')

# Filling cos_ y < sin_ Part of Y
plt.fill_between(x, cos_y, sin_y, cos_y < sin_y, color='dodgerblue', alpha=0.5)
# Filling cos_ y > sin_ Part of Y
plt.fill_between(x, cos_y, sin_y, cos_y > sin_y, color='orangered', alpha=0.5)

plt.legend()
plt.show()
4) Bar chart (bar chart)

API related to histogram drawing:

# Set Chinese display complete
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
plt.figure('Bar', facecolor='lightgray')
plt.bar(
	x,				# Horizontal coordinate array
    y,				# Column height array
    width,			# Width of column
    color='', 		# fill color
    label='',		#
    alpha=0.2		#
)

Case: first, use histogram to draw Apple's 12-month sales volume, and then draw orange sales volume.

import matplotlib.pyplot as plt
import numpy as np

apples = np.array([30, 25, 22, 36, 21, 29, 20, 24, 33, 19, 27, 15])
oranges = np.array([24, 33, 19, 27, 35, 20, 15, 27, 20, 32, 20, 22])

plt.figure('Bar', facecolor='lightgray')
plt.title('Bar', fontsize=20)
plt.xlabel('Month', fontsize=14)
plt.ylabel('Price', fontsize=14)
plt.tick_params(labelsize=10)
plt.grid(axis='y', linestyle=':')
plt.ylim((0, 40))

x = np.arange(len(apples))  # Produce a uniform array with the same length as apples

plt.bar(x - 0.2,  # Horizontal axis data
       apples,  # Vertical data
       0.4,  # Column width
       color='dodgerblue',
       label='Apple')
plt.bar(x + 0.2,  # Horizontal axis data
       oranges,  # Vertical data
       0.4,  # Column width
       color='orangered', label='Orange', alpha=0.75)

plt.xticks(x, ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])

plt.legend()
plt.show()
5) Histogram

Execution result:

[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-8vclouz9-1593310852601) (C: / users / Xuming / desktop / 20.05% E7% A0% 94% E5% 8F% 91 /% E6% 96% B0% E7% 89% 88% E6% 95% B0% E6% 8D% AE% E5% 88% 86% E6% 9E% 90% E8% AF% be% E4% BB% B6 / images/ hist.png ]

Draw histogram related API:

plt.hist(
    x, 					# List of values		
    bins, 				# Number of square columns
    color, 				# colour
    edgecolor 			# Edge color
)

Case: draw a statistical histogram to display the pixel brightness distribution of the picture:

import numpy as np
import matplotlib.pyplot as plt
import scipy.misc as sm

img = sm.imread('../data/forest.jpg', True)
print(img.shape)

pixes = img.ravel()
plt.figure('Image Hist', facecolor='lightgray')
plt.title('Image Hist', fontsize=18)
plt.xticks(np.linspace(0, 255, 11))
plt.hist(x=pixes, bins=10, color='dodgerblue', range=(0, 255), edgecolor='white', normed=False)
plt.show()
Extension: random number module and probability distribution

numpy provides random module to generate random number sequence which obeys specific statistical laws.

A set of random numbers may present the following distribution:

Statistics of class weight: [63.2, 76.5, 65.7, 68.9, 59.4...]
Statistics of class height: [163.2, 176.5, 165.7, 168.9, 159.4...]
Statistics of class arrival time: ['07:20:22 ',' 07:30:48 ',' 07:21:23 ',' 07:24:58 '...]

Or the distribution is as follows:

Statistics of class weight level: [light, medium, heavy, overweight, medium, heavy, overweight, medium, heavy...]
Count the height level of students in the class: [low, medium, medium, medium, medium, high, medium, medium, high...]
Count the number of students who are late for class (10 in total): [0, 1, 3, 0, 0, 1, 2, 0, 0,...]

binomial distribution

The binomial distribution is the Bernoulli experiment that repeats n independent events. There are only two possible results in each experiment, and the two results are opposite to each other, and are independent of each other. The probability of the event remains the same in each independent experiment, such as coin tossing.

# Generate size random numbers. Each random number comes from the number of successful attempts in n attempts, where the probability of success in each attempt is p
np.random.binomial(n, p, size)

Binomial distribution can be used to approximate the probability of the following scenarios:

  1. A person's shooting percentage is 0.3, the probability of 10 shots and 5 goals.
sum(np.random.binomial(10, 0.3, 200000) == 5) / 200000
  1. Someone made a customer service call, the customer service connection rate was 0.6, a total of three times, no one answered the probability.
sum(np.random.binomial(3, 0.6, 200000) == 0) / 200000

Example: simulate someone shooting at 30% shooting rate, 10 shots at a time, calculate and play the probability of each goal

# Binomial distribution example
import numpy as np
import matplotlib.pyplot as mp

# Binomial: sampling from binomial distribution
# n: Number of attempts p: probability
r = np.random.binomial(10, 0.5, 200000)
mp.hist(r, 11, edgecolor='white')
mp.legend()
mp.show()

Execution result:

[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-meptjqlx-1593310852602) (C: / users / Xuming / desktop / code1911 / images)/ binomial.png ]

Hypergeometric distribution

Hypergeometric distribution is a discrete probability distribution in statistics. It describes the number of times (not put back) that an object of a specified kind is successfully extracted from a limited number of objects (including M objects of a specified kind). Here is an example of a set of hypergeometric distributions:

(1) There are 3 defective products in 10 products, 4 of which are randomly selected, and the number of defective products is subject to hypergeometric distribution;
(2) There are 8 red balls and 4 white balls in the bag, from which 5 balls are randomly touched, and the number of red balls is subject to hypergeometric distribution;
(3) There are 45 students and 20 girls in a class. Now 7 of them are chosen as representatives. The number of girls in the representatives is hypergeometric distribution;
(4) There are 5 cards with the word "award" in the 15 cards, and 3 products are randomly selected from them. The number of cards with the word "award" in the cards is hypergeometric distribution;

(5) Among the 10 delegates, 5 support candidate A, and 3 are interviewed randomly, among which the number of support candidate A follows hypergeometric distribution;
(6) There are 10 zongzi, 2 Dousha zongzi, 3 meat zongzi and 5 white zongzi in the dish. Three of them are selected randomly. The number of Dousha zongzi is hypergeometric.

API introduction:

# Generate size random numbers. Each random number t is the number of good samples after randomly sampling nsample samples in the total sample, which is composed of ngood good samples and NBA bad samples
np.random.hypergeometric(ngood, nbad, nsample, size)

Example 1: extract 3 apples from 6 good apples and 4 bad apples, and return the number of good apples (execute 10 times)

import numpy as np

# Take 3 balls from 6 good balls and 4 bad balls and return the number of good balls (10 times)
n = np.random.hypergeometric(6, 4, 3, 10)
print(n)
print(n.mean())

Execution result:

[2 2 3 1 2 2 1 3 2 2]
2.0


Normal distribution
# size random numbers are generated, which obey the standard normal (expectation = 0, standard deviation = 1) distribution.
np.random.normal(size)
# size random numbers are generated, which obey normal distribution (expectation = 1, standard deviation = 10).
np.random.normal(loc=1, scale=10, size)

Probability density of standard normal distribution: e − x222 π Standard normal distribution probability density: frac {e ^ {- \ frac {x ^ 2} {2}} {\ sqrt {2 \ PI}}} Probability density of standard normal distribution: 2 π e − 2x2

Case: generate 10000 random numbers that obey normal distribution and draw frequency histogram of random values.

import numpy as np
import matplotlib.pyplot as mp

samples = np.random.normal(size=10000)

mp.figure('Normal Distribution',facecolor='lightgray')
mp.title('Normal Distribution', fontsize=20)
mp.xlabel('Sample', fontsize=14)
mp.ylabel('Occurrence', fontsize=14)
mp.tick_params(labelsize=12)
mp.grid(axis='y', linestyle=':')
mp.hist(samples, 100, edgecolor='steelblue',
        facecolor='deepskyblue', label='Normal')
mp.legend()
mp.show()

Execution result:

[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-fh5slfen-1593310852603) (C: / users / Xuming / desktop / code1911 / images / normal)_ distribution.png ]

6) Pie chart

[external link picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-6VLl02VY-1593310852604)(C:/Users/xuming/Desktop/20.05 R & D / new data analysis courseware / images/%E9%A5%BC%E5%9B%BE%E7%A4%BA%E4%BE%8B.png))

Basic API for drawing pie chart:

plt.pie(
    values, 		# List of values		
    spaces, 		# List of gaps between sectors
    labels, 		# Label list
    colors, 		# Color list
    '%d%%',			# Scale format of labels
	shadow=True, 	# Show shadows or not
    startangle=90	# The starting angle when drawing a pie chart anticlockwise
    radius=1		# radius
)

Case: draw pie chart to show the popularity of 6 programming languages:

import matplotlib.pyplot as plt
import numpy as np

plt.figure('pie', facecolor='lightgray')
plt.title('Pie', fontsize=20)
# Organize data
values = [15, 13.3, 8.5, 7.3, 4.62, 51.28]
spaces = [0.05, 0.01, 0.01, 0.01, 0.01, 0.01]
labels = ['Java', 'C', 'Python', 'C++', 'VB', 'Other']
colors = ['dodgerblue', 'orangered', 'limegreen', 'violet', 'gold','blue']
# Equiaxed scale
plt.axis('equal')
plt.pie(
    values,  # List of values
    spaces,  # List of gaps between sectors
    labels,  # Label list
    colors,  # Color list
    '%d%%',  # Scale format of labels
    shadow=True,  # Show shadows or not
    startangle=90,  # The starting angle when drawing a pie chart anticlockwise
    radius=1  # radius
)
plt.legend()
plt.show()

pandas visualization

Basic drawing

Series data visualization

Series provides a plot method to visualize data with index as x and value as y

ts = pd.Series(np.random.randn(10),
        index=pd.date_range('1/1/2000', periods=10))
ts.plot()
DataFrame data visualization

DataFrame provides plot method to specify a column as x and a column as y to complete data visualization:

df3 = pd.DataFrame(np.random.randn(10, 2), 
                   columns=['B', 'C'])
df3['A'] = np.arange(len(df3))
df3.plot(x='A', y=['B', 'C'])

Advanced drawing

The plot() method can provide different image types through the kind keyword parameter, including:

type explain
bar or barh Histogram
hist histogram
box Box line diagram
scatter Scatter diagram
pie Pie chart

The relevant API is as follows:

# Histogram
series.plot.bar()
dataFrame.plot.bar()
dataFrame.plot.barh()

histogram

# histogram
series.plot.hist(alpha=0.5, bins=5)
dataFrame.plot.hist(alpha=0.5, bins=5)

Scatter diagram

# Scatter diagram
df.plot.scatter(x='a', y='b', c=col, colormap='');

Pie chart

# Pie chart
series.plot.pie(figsize=(6, 6))
dataFrame.plot.pie(subplots=True, figsize=(6, 6), layout=(2, 2))

Box line diagram

E7% Ba% BF% E5% 9b% BE.png ]

# Box line diagram
# First, find out the upper edge, lower edge, median and two quartiles of a set of data; then, connect the two quartiles to draw the box; then connect the upper edge and lower edge with the box, and the median is in the middle of the box
df.plot.box()
# Group box line diagram
df.boxplot(by='X')

The box line chart reflects the centralized trend of a group of data, and the difference of quartiles can reflect the dispersion of a group of data:

  1. If the median is high, it means the average level is high; otherwise, it means the average level is low.
  2. The box is short, which means the data set; the box is long, which means the data is scattered.

Tags: Windows Attribute network Programming

Posted on Sun, 28 Jun 2020 02:35:47 -0400 by MikeA