Artificial intelligence ----- > the fourth day, deep learning, artificial neural network, convolutional neural network, opencv, audio acquisition and playback, and the use of Baidu AI platform

Deep learning

     Machine learning based on neural network algorithm

     Deep learning algorithm:
         BP neural network
         Convolutional neural network
         Cyclic neural network
         Attention based recurrent neural network
         Countermeasure neural network

   neural network
         Brain neural network

     artificial neural network

         A network of functions


Convolutional neural network

         The neural network using convolution operation as neuron function is called convolution neural network


         Scope of use:
             Feature extraction of 2D graphics

         Structure of convolutional neural network
             Common layer
                 Input layer: data input
                 Association weight layer (* can be added or not)
                 Convolution layer: realize the convolution process
                 Pool layer: dimensionality reduction of main features
                 Full connection layer: fully connect the feature map
                 Output layer: get output results

             Other layers
                 Regularization layer
                 Advanced layer (activation function / other functions)

         Hyperparameter: in machine learning, the parameters that need to be obtained repeatedly are called hyperparameters.

         Convolution layer:
             Convolution operation:
                     1. Feature extraction
                     2. Feature dimensionality reduction

             Convolution kernel:
                 Convolution kernel: also called filter

                 Movement of convolution kernel
                     The number of lattices that the convolution kernel moves in the input matrix is called step size.
                     Step > = 1

                 Size of convolution kernel
                     The size of convolution kernel is also a super parameter, and odd rows and columns are generally selected.

                 Content of convolution kernel
                     The content of convolution kernel also needs to be obtained repeatedly.

                 Common convolution kernel
                    -> Horizontal edge detection filter
                    -> Vertical edge detection filter
                    -> Enhanced picture center filter

             The input data does not match the convolution kernel:
                 The size of the fill is also a super parameter.
                 Filling process:
                     Fill a circle of 0 around the input matrix

                     The fill size is p = (f - 1)/2

                     f is the size of convolution kernel

             After one convolution, the size of the output matrix (characteristic diagram) obtained by convolution:
                 s: Stride
                 f: Size of convolution kernel
                 n: Enter the size of the matrix
                 p: Fill

                 The size of the output matrix is: (n + 2p -f) / s + 1

                n    f
                5x5 3x3 s=1 p=0         (5+0-3)/1 + 1 =    3

                5x5 3x3 s=2 p=1         (5+2-3)/2 + 1 = 3 
             How to realize multi-channel convolution:
                 In daily life, pictures are in color. It's an RGB picture.
                 RGB24: one pixel occupies 24bit:
                    R:G:B : 8:8:8

                320 * 240 * 3

                 According to different RGB channels, three pixel information matrices can be obtained.

                 However, the convolution of color images becomes the convolution of matrices of three different channels. We call this convolution method multichannel convolution.

                 How to handle multichannel convolution:
                     At this time, we need to use different convolution kernels for different channels. When we convolute multiple convolution kernels on the same level, we usually put them into the same convolution kernel group.
                     The real situation is that different convolution kernel groups will be used to convolute multi-channel data respectively.

                 Operation of multi-channel convolution results:
                     The output of each convolution kernel group after convolution is equivalent to single channel convolution
                     However, multiple characteristic graphs are output, and the number of characteristic graphs depends on the number of convolution kernel groups.

                 Single channel convolution case:
                    n: 32*32 = 1024
                    s : 1
                    f : 5
                    p : 0

                    (n + 2p -f)/s + 1 = 27/1 + 1  = 28

                    28 * 28 = 784
                     2 * 2 sampling control
                    14 * 14 = 196

         Causes of model failure:
                 The eigenvalues required by the model are too detailed, resulting in the lack of generalization ability of the model.
                     White horse is not a horse
                     A white swan is not a swan

             Under fit:
                 The feature values extracted by the model are too few, resulting in recognition errors.
                     deliberately misrepresent

         Pool layer: (sampling)
             Role of pooling:
                 1. In CNN, the pooling layer can be used to reduce the size, improve the operation speed and reduce the impact of noise, so as to make each feature more robust.
                 2. Reduce the over fitting degree of network training parameters and models
             What is pooling:
                 Pooling is also called down sampling. After obtaining the features of the image through the convolution layer, these features can be directly used to train the classifier (such as softmax) in theory. However, this will face the challenge of huge amount of calculation, and it is easy to produce the phenomenon of over fitting.
             Pooling means:
                -> Maximum pooling
                     In the pool area, the maximum value is taken to represent the characteristic value of the area
                -> Mean pooling
                     In the pool area, the average value is taken to represent the characteristic value of the area
                -> Random pooling
                     In the pool area, a value is randomly selected to represent the characteristic value of the area

             Size of pool area: it is obtained by repeated iteration.

         Activation function:
             What is an activation function:
                 Tanh function
                 ReLU function

             Activation function:
                 The convolution neural network is similar to the standard neural network. In order to ensure its nonlinearity, the activation function also needs to be used, that is, after the convolution operation, the output value plus the offset is input to the activation function, and then used as the input of the next layer
         Full connection:
             After multi-channel convolution / single channel convolution, some tensor matrices will be obtained. Connecting multi-dimensional tensors into one-dimensional tensors is called full connection.

         Common convolutional neural network structures:
             Le-Net5: Series convolutional neural network

             GoogLeNet: Inception model: parallel convolutional neural network

         Purpose of convolutional neural network:
             1. Get the optimal neural network structure
             2. Get the optimal set of hyperparameters
             3. Solve the problems of feature extraction and feature dimensionality reduction


     Graphics and image processing library, C + +, Python and other programming language interfaces

     How to install:
        pip install opencv-python -i  

Use opencv to collect video and detect and locate face  

        import cv2 as cv

        #Open the first camera in the system by default, which is similar to opening / dev/video0 under linux
        cap = cv.VideoCapture(0)
        path = "D:\\Program Files\\Python36\\Lib\\site-packages\\cv2\\data\\"

        face_class = cv.CascadeClassifier(path+"haarcascade_frontalface_default.xml")
        while True:
            #Read the data collected by the camera
            ret,img =
            #Convert the collected color image into gray image
            img_gray = cv.cvtColor(img, cv.COLOR_RGB2GRAY)
            faces = face_class.detectMultiScale(img_gray, 1.3, 5)
            for (x,y,w,h) in faces:
            #To ensure that there is no problem with the picture display, it is recommended to create a namedWindow first
            #Determine whether to press the q key
            if cv.waitKey(1) & 0xFF == ord('q'):
        #Destroy all displayed forms
        #Release the created camera object

Using python to realize recording function

     Library: pyaudio
     Installation: pip install pyaudio     


    import pyaudio
    import wave

    #A description of managing the PyAudio instance (size of each frame)
    CHUNK = 1024
    #Bit depth of acquisition sample
    FORMAT = pyaudio.paInt16
    #Number of channels
    CHANNELS = 2
    #Rate of samples
    RATE = 44100
    #Recording time
    WAVE_OUTPUT_FILENAME = "output.wav"

    p = pyaudio.PyAudio()

    stream =,

    print("* recording")

    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data =

    print("* done recording")


    wf =, 'wb')

python voice playback

     Voice playback package pygame
     Package installation: pip install pygame

        import pygame
        import time

        #Audio initialization
        #Load audio"auido.mp3")
        #Start playing
        #Waiting to play (the process of playing)
        #stop playing

        #Note: there must be a delay, otherwise you can't hear the sound

Use of Baidu AI platform

     Face contrast


EasyDL Image classification call model public cloud API Python3 realization

import json
import base64
import requests
use requests Library send request
 use pip(perhaps pip3)Check my python3 Whether the library is installed in the environment, execute the command
  pip freeze | grep requests
 If the return value is empty, the library will be installed
  pip install requests

# The local file path of the target image, which supports jpg/png/bmp format

# Optional request parameters
# top_num: the number of returned categories. If it is not declared, it is 6 by default
PARAMS = {"top_num": 2}

# Interface address in service details

# Access is required to call API_ TOKEN.  If access already exists_ Token, fill in the string below
# Otherwise, leave access blank_ Token, fill in the API deployed by the model below_ Key and SECRET_KEY, the new access will be automatically applied for and displayed_ TOKEN
API_KEY = "gPhZUzA3yk70zSplKKhw5Itb"

print("1. Read target picture '{}'".format(IMAGE_FILEPATH))
with open(IMAGE_FILEPATH, 'rb') as f:
    base64_data = base64.b64encode(
    base64_str = base64_data.decode('UTF8')
print("take BASE64 Fill in the string of the encoded picture PARAMS of 'image' field")
PARAMS["image"] = base64_str

    print("2. ACCESS_TOKEN If it is empty, call the authentication interface to obtain TOKEN")
    auth_url = ""               "&client_id={}&client_secret={}".format(API_KEY, SECRET_KEY)
    auth_resp = requests.get(auth_url)
    auth_resp_json = auth_resp.json()
    ACCESS_TOKEN = auth_resp_json["access_token"]
    print("new ACCESS_TOKEN: {}".format(ACCESS_TOKEN))
    print("2. Use existing ACCESS_TOKEN")

print("3. Interface to model 'MODEL_API_URL' Send request")
request_url = "{}?access_token={}".format(MODEL_API_URL, ACCESS_TOKEN)
response =, json=PARAMS)
response_json = response.json()
response_str = json.dumps(response_json, indent=4, ensure_ascii=False)

speech synthesis

from aip import AipSpeech

""" Yours APPID AK SK """
APP_ID = '24873305'
API_KEY = 'uzWDokZaiYxGTH5Sn1UKnN85'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

result  = client.synthesis('You black fellow, are very arrogant. I don't know whether you are a kiln maker or a charcoal seller', 'zh', 1, {
    'vol': 8, 'per':3, 'spd':4,

# If the recognition is correct and the voice binary error is returned, dict will be returned. Refer to the following error code
if not isinstance(result, dict):
    with open('audio.mp3', 'wb') as f:

Language recognition

from aip import AipSpeech

""" Yours APPID AK SK """
APP_ID = '24873305'
API_KEY = 'uzWDokZaiYxGTH5Sn1UKnN85'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

# read file
def get_file_content(filePath):
    with open(filePath, 'rb') as fp:

# Identify local files
ret = client.asr(get_file_content('audio.wav'), 'wav', 16000, {
    'dev_pid': 1737,


Tags: OpenCV AI neural networks

Posted on Sun, 19 Sep 2021 00:35:41 -0400 by giannisptr