Super detailed Python matplotlib drawing histogram collection

preface

After learning from the underlying architecture and basic drawing steps of matplotlib module, we have learned the drawing methods of line chart and histogram.

When analyzing the data, we will select the corresponding chart to display according to the characteristics of the data. We need to express the concept of quality and use histogram.

In this issue, we will learn the attributes and methods related to histogram drawing in matplotlib module, Let's go~

1. Histogram overview

  • What is histogram?

    • Histogram is a visual representation of the distribution of data in continuous intervals or specific time periods
    • Histogram, also known as quality distribution diagram, belongs to a kind of bar graph
    • The x-axis of the histogram represents the data type, and the vertical axis represents the distribution. Each data width can be changed arbitrarily
  • Histogram usage scenario

    • Histogram is used for probability distribution to show the occurrence probability of a group of data within a specified range
    • It can be used to show the frequency of data distribution
    • Used for the position of mode and median
    • There are gaps or outliers in the concerned data
  • Histogram drawing steps

    1. Import the matplotlib.pyplot module
    2. To prepare the data, you can use numpy/pandas to organize the data
    3. Call pyplot.hist() to draw the histogram
  • Case display

    In this case, let's analyze the height distribution of the company's employees

    • Case data preparation, using numpy to randomly generate 200 elevated data

      import numpy as np
      
      x_value = np.random.randint(140,180,200)
      Copy code
    • Draw histogram

      import matplotlib.pyplot as plt
      
      plt.hist(x_value,bins=10)
      
      plt.title("data analyze")
      plt.xlabel("height")
      plt.ylabel("rate")
      
      plt.show()
      Copy code

       

2. Histogram attribute

  • Set color

    • Set bar color keyword: facecolor

    • Set border color keyword: edgecolor

    • Color selection value

      • English words using color, such as red and yellow
      • Use color abbreviations: Red "r", blue "b"
      • Use rgb: format (r,g,b), value range: 0 ~ 1
  • Sets the number of long bars

    • Keywords: bins
    • Optional. The default value is 10
  • Set transparency

    • Keyword: alpha
    • The default value is 0, and the value range is 0 ~ 1
  • Set style

    • Keyword: histtype

    • Value description

      Attribute valueexplain
      'bar'Column data side by side, default
      'barstacked'Column data overlaps side by side
      'step'Columnar color not filled
      'stepfilled'Filled linear
  • We add a column shape to the histogram of the first section, which is not filled, and the border color is red

    plt.hist(x_value,bins=10,edgecolor="r",histtype="step")
    Copy code

     

  • The border is set to red and the transparency is 0.5

    plt.hist(x_value,bins=10,edgecolor="r",histtype="bar",alpha=0.5)
    Copy code

     

3. Add polyline histogram

In the histogram, we can also add a line chart to help us check the data changes

  • First, create the Axes object through pyplot.subplot()

  • Use the Axes object to call the hist() method to draw the histogram and return the lower X and Y data required by the line graph

  • The Axes object then calls plot() to draw a line graph

  • Let's modify the first section of the code

    fig,ax = plt.subplots()
    
    n,bins_num,pat = ax.hist(x_value,bins=10,alpha=0.75)
    
    ax.plot(bins_num[:10],n,marker = 'o',color="yellowgreen",linestyle="--")
    Copy code

     

4. Stack histogram

We sometimes compare the data collected by two groups of different target groups under the same data range

  • Prepare two sets of data:

    import numpy as np
    
    x_value = np.random.randint(140,180,200)
    x2_value = np.random.randint(140,180,200)
    Copy code
  • Histogram attribute data: two sets of data are passed in as a list

  • Set histogram stacked: to True to allow data coverage

    plt.hist([x_value,x2_value],bins=10,stacked=True)
    Copy code

     

5. Non equidistant histogram

The histograms drawn above are equidistant. We can specify a set of data to pass into the bins attribute

  • bins keyword: Specifies the number of columns in the histogram

  • After changing the above code, see the effect

    bin_num = [140,155,160,170,175,180]
    plt.hist([x_value,x2_value],bins=bin_num,alpha=0.75,stacked=True)
    Copy code

     

6. Multi class histogram

When we use the square chart to check the frequency of data, we sometimes check the frequency of various types of data.

  • At this time, we can pass in a variety of data in the form of a list to the x data of the hist() method

    x_value = [np.random.randint(140,180,i) for i in [100,200,300]]
    
    plt.hist(x_value,bins=10,edgecolor="r",histtype="bar",alpha=0.5,label=["A company","B company","C company"])
    Copy code

     

summary

In this issue, we will learn the attributes and methods related to drawing various square icons in detail in the matplotlib module. When we need to view the data distribution frequency, we can use hist() method to draw the histogram, and we can also add polylines to assist in viewing

The above is the content of this issue. You are welcome to praise and comment. See you in the next issue ~

Tags: Python Back-end Programmer Data Analysis

Posted on Thu, 18 Nov 2021 03:06:00 -0500 by M4F