preface
After learning from the underlying architecture and basic drawing steps of matplotlib module, we have learned the drawing methods of line chart and histogram.
When analyzing the data, we will select the corresponding chart to display according to the characteristics of the data. We need to express the concept of quality and use histogram.
In this issue, we will learn the attributes and methods related to histogram drawing in matplotlib module, Let's go~
1. Histogram overview
-
What is histogram?
- Histogram is a visual representation of the distribution of data in continuous intervals or specific time periods
- Histogram, also known as quality distribution diagram, belongs to a kind of bar graph
- The x-axis of the histogram represents the data type, and the vertical axis represents the distribution. Each data width can be changed arbitrarily
-
Histogram usage scenario
- Histogram is used for probability distribution to show the occurrence probability of a group of data within a specified range
- It can be used to show the frequency of data distribution
- Used for the position of mode and median
- There are gaps or outliers in the concerned data
-
Histogram drawing steps
- Import the matplotlib.pyplot module
- To prepare the data, you can use numpy/pandas to organize the data
- Call pyplot.hist() to draw the histogram
-
Case display
In this case, let's analyze the height distribution of the company's employees
-
Case data preparation, using numpy to randomly generate 200 elevated data
import numpy as np x_value = np.random.randint(140,180,200) Copy code
-
Draw histogram
import matplotlib.pyplot as plt plt.hist(x_value,bins=10) plt.title("data analyze") plt.xlabel("height") plt.ylabel("rate") plt.show() Copy code
-
2. Histogram attribute
-
Set color
-
Set bar color keyword: facecolor
-
Set border color keyword: edgecolor
-
Color selection value
- English words using color, such as red and yellow
- Use color abbreviations: Red "r", blue "b"
- Use rgb: format (r,g,b), value range: 0 ~ 1
-
-
Sets the number of long bars
- Keywords: bins
- Optional. The default value is 10
-
Set transparency
- Keyword: alpha
- The default value is 0, and the value range is 0 ~ 1
-
Set style
-
Keyword: histtype
-
Value description
Attribute value explain 'bar' Column data side by side, default 'barstacked' Column data overlaps side by side 'step' Columnar color not filled 'stepfilled' Filled linear
-
-
We add a column shape to the histogram of the first section, which is not filled, and the border color is red
plt.hist(x_value,bins=10,edgecolor="r",histtype="step") Copy code
-
The border is set to red and the transparency is 0.5
plt.hist(x_value,bins=10,edgecolor="r",histtype="bar",alpha=0.5) Copy code
3. Add polyline histogram
In the histogram, we can also add a line chart to help us check the data changes
-
First, create the Axes object through pyplot.subplot()
-
Use the Axes object to call the hist() method to draw the histogram and return the lower X and Y data required by the line graph
-
The Axes object then calls plot() to draw a line graph
-
Let's modify the first section of the code
fig,ax = plt.subplots() n,bins_num,pat = ax.hist(x_value,bins=10,alpha=0.75) ax.plot(bins_num[:10],n,marker = 'o',color="yellowgreen",linestyle="--") Copy code
4. Stack histogram
We sometimes compare the data collected by two groups of different target groups under the same data range
-
Prepare two sets of data:
import numpy as np x_value = np.random.randint(140,180,200) x2_value = np.random.randint(140,180,200) Copy code
-
Histogram attribute data: two sets of data are passed in as a list
-
Set histogram stacked: to True to allow data coverage
plt.hist([x_value,x2_value],bins=10,stacked=True) Copy code
5. Non equidistant histogram
The histograms drawn above are equidistant. We can specify a set of data to pass into the bins attribute
-
bins keyword: Specifies the number of columns in the histogram
-
After changing the above code, see the effect
bin_num = [140,155,160,170,175,180] plt.hist([x_value,x2_value],bins=bin_num,alpha=0.75,stacked=True) Copy code
6. Multi class histogram
When we use the square chart to check the frequency of data, we sometimes check the frequency of various types of data.
-
At this time, we can pass in a variety of data in the form of a list to the x data of the hist() method
x_value = [np.random.randint(140,180,i) for i in [100,200,300]] plt.hist(x_value,bins=10,edgecolor="r",histtype="bar",alpha=0.5,label=["A company","B company","C company"]) Copy code
summary
In this issue, we will learn the attributes and methods related to drawing various square icons in detail in the matplotlib module. When we need to view the data distribution frequency, we can use hist() method to draw the histogram, and we can also add polylines to assist in viewing
The above is the content of this issue. You are welcome to praise and comment. See you in the next issue ~