# Deep learning

Machine learning based on neural network algorithm

Deep learning algorithm:
BP neural network
Convolutional neural network
Cyclic neural network
Attention based recurrent neural network
Countermeasure neural network

neural network
Brain neural network

# artificial neural network

A network of functions # Convolutional neural network

The neural network using convolution operation as neuron function is called convolution neural network Scope of use:
Feature extraction of 2D graphics

Structure of convolutional neural network
Common layer
Input layer: data input
Association weight layer (* can be added or not)
Convolution layer: realize the convolution process
Pool layer: dimensionality reduction of main features
Full connection layer: fully connect the feature map
Output layer: get output results

Other layers
Regularization layer
Advanced layer (activation function / other functions)

Hyperparameter: in machine learning, the parameters that need to be obtained repeatedly are called hyperparameters.

Convolution layer:
Convolution operation:
effect:
1. Feature extraction
2. Feature dimensionality reduction

Convolution kernel:
Convolution kernel: also called filter

Movement of convolution kernel
The number of lattices that the convolution kernel moves in the input matrix is called step size.
Step > = 1

Size of convolution kernel
The size of convolution kernel is also a super parameter, and odd rows and columns are generally selected.

Content of convolution kernel
The content of convolution kernel also needs to be obtained repeatedly.

Common convolution kernel
-> Horizontal edge detection filter
-> Vertical edge detection filter
-> Enhanced picture center filter

The input data does not match the convolution kernel:
The size of the fill is also a super parameter.
Filling process:
Fill a circle of 0 around the input matrix

The fill size is p = (f - 1)/2

f is the size of convolution kernel

After one convolution, the size of the output matrix (characteristic diagram) obtained by convolution:
s: Stride
f: Size of convolution kernel
n: Enter the size of the matrix
p: Fill

The size of the output matrix is: (n + 2p -f) / s + 1

n    f
5x5 3x3 s=1 p=0         (5+0-3)/1 + 1 =    3

5x5 3x3 s=2 p=1         (5+2-3)/2 + 1 = 3
How to realize multi-channel convolution:
In daily life, pictures are in color. It's an RGB picture.
320*240
RGB24: one pixel occupies 24bit:
R:G:B : 8:8:8

320 * 240 * 3

According to different RGB channels, three pixel information matrices can be obtained.

However, the convolution of color images becomes the convolution of matrices of three different channels. We call this convolution method multichannel convolution.

How to handle multichannel convolution:
At this time, we need to use different convolution kernels for different channels. When we convolute multiple convolution kernels on the same level, we usually put them into the same convolution kernel group.
The real situation is that different convolution kernel groups will be used to convolute multi-channel data respectively.

Operation of multi-channel convolution results:
The output of each convolution kernel group after convolution is equivalent to single channel convolution
However, multiple characteristic graphs are output, and the number of characteristic graphs depends on the number of convolution kernel groups.

Single channel convolution case:
n: 32*32 = 1024
s : 1
f : 5
p : 0

(n + 2p -f)/s + 1 = 27/1 + 1  = 28

28 * 28 = 784
2 * 2 sampling control
14 * 14 = 196

Causes of model failure:
Overfitting:
The eigenvalues required by the model are too detailed, resulting in the lack of generalization ability of the model.
Case:
White horse is not a horse
A white swan is not a swan

Under fit:
The feature values extracted by the model are too few, resulting in recognition errors.
Case:
deliberately misrepresent

Pool layer: (sampling)
Role of pooling:
1. In CNN, the pooling layer can be used to reduce the size, improve the operation speed and reduce the impact of noise, so as to make each feature more robust.
2. Reduce the over fitting degree of network training parameters and models
What is pooling:
Pooling is also called down sampling. After obtaining the features of the image through the convolution layer, these features can be directly used to train the classifier (such as softmax) in theory. However, this will face the challenge of huge amount of calculation, and it is easy to produce the phenomenon of over fitting.
Pooling means:
-> Maximum pooling
In the pool area, the maximum value is taken to represent the characteristic value of the area
-> Mean pooling
In the pool area, the average value is taken to represent the characteristic value of the area
-> Random pooling
In the pool area, a value is randomly selected to represent the characteristic value of the area

Size of pool area: it is obtained by repeated iteration.

Activation function:
What is an activation function:
sigmod
Tanh function
ReLU function

Activation function:
The convolution neural network is similar to the standard neural network. In order to ensure its nonlinearity, the activation function also needs to be used, that is, after the convolution operation, the output value plus the offset is input to the activation function, and then used as the input of the next layer

Full connection:
After multi-channel convolution / single channel convolution, some tensor matrices will be obtained. Connecting multi-dimensional tensors into one-dimensional tensors is called full connection.

Common convolutional neural network structures:
Le-Net5: Series convolutional neural network

GoogLeNet: Inception model: parallel convolutional neural network

Purpose of convolutional neural network:
1. Get the optimal neural network structure
2. Get the optimal set of hyperparameters
3. Solve the problems of feature extraction and feature dimensionality reduction

# opencv

Graphics and image processing library, C + +, Python and other programming language interfaces

How to install:
pip install opencv-python -i https://mirrors.aliyun.com/pypi/simple/

## Use opencv to collect video and detect and locate face

```        import cv2 as cv

#Open the first camera in the system by default, which is similar to opening / dev/video0 under linux
cap = cv.VideoCapture(0)
path = "D:\\Program Files\\Python36\\Lib\\site-packages\\cv2\\data\\"

while True:
#Read the data collected by the camera
#Convert the collected color image into gray image
img_gray = cv.cvtColor(img, cv.COLOR_RGB2GRAY)
faces = face_class.detectMultiScale(img_gray, 1.3, 5)
for (x,y,w,h) in faces:
cv.rectangle(img,(x,y),(x+w,y+h),(0,0,255),3)
#To ensure that there is no problem with the picture display, it is recommended to create a namedWindow first
cv.namedWindow("pic",cv.WINDOW_AUTOSIZE)
cv.imshow("video",img)
#Determine whether to press the q key
if cv.waitKey(1) & 0xFF == ord('q'):
break
#Destroy all displayed forms
cv.destroyAllWindows()
#Release the created camera object
cap.release()```

## Using python to realize recording function

Library: pyaudio
Installation: pip install pyaudio

example:

```    import pyaudio
import wave

#A description of managing the PyAudio instance (size of each frame)
CHUNK = 1024
#Bit depth of acquisition sample
FORMAT = pyaudio.paInt16
#Number of channels
CHANNELS = 2
#Rate of samples
RATE = 44100
#Recording time
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
frames.append(data)

print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()```

## python voice playback

Voice playback package pygame
Package installation: pip install pygame
example:

```        import pygame
import time

#Audio initialization
pygame.mixer.init()
#Start playing
pygame.mixer.music.play()
#Waiting to play (the process of playing)
time.sleep(3)
#stop playing
pygame.mixer.music.stop()

#Note: there must be a delay, otherwise you can't hear the sound```

# Use of Baidu AI platform

## Face contrast

EasyDL

```"""
EasyDL Image classification call model public cloud API Python3 realization
"""

import json
import base64
import requests
"""
use requests Library send request
use pip(perhaps pip3)Check my python3 Whether the library is installed in the environment, execute the command
pip freeze | grep requests
If the return value is empty, the library will be installed
pip install requests
"""

# The local file path of the target image, which supports jpg/png/bmp format
IMAGE_FILEPATH = "1.jpg"

# Optional request parameters
# top_num: the number of returned categories. If it is not declared, it is 6 by default
PARAMS = {"top_num": 2}

# Interface address in service details
MODEL_API_URL = "https://aip.baidubce.com/rpc/2.0/ai_custom/v1/classification/mycheckface"

# Access is required to call API_ TOKEN.  If access already exists_ Token, fill in the string below
# Otherwise, leave access blank_ Token, fill in the API deployed by the model below_ Key and SECRET_KEY, the new access will be automatically applied for and displayed_ TOKEN
ACCESS_TOKEN = ""
API_KEY = "gPhZUzA3yk70zSplKKhw5Itb"
SECRET_KEY = "GlSQaRcgDlmALq2CkTUD1XbAA9QanCYb"

with open(IMAGE_FILEPATH, 'rb') as f:
base64_str = base64_data.decode('UTF8')
print("take BASE64 Fill in the string of the encoded picture PARAMS of 'image' field")
PARAMS["image"] = base64_str

if not ACCESS_TOKEN:
print("2. ACCESS_TOKEN If it is empty, call the authentication interface to obtain TOKEN")
auth_url = "https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials"               "&client_id={}&client_secret={}".format(API_KEY, SECRET_KEY)
auth_resp = requests.get(auth_url)
auth_resp_json = auth_resp.json()
ACCESS_TOKEN = auth_resp_json["access_token"]
print("new ACCESS_TOKEN: {}".format(ACCESS_TOKEN))
else:
print("2. Use existing ACCESS_TOKEN")

print("3. Interface to model 'MODEL_API_URL' Send request")
request_url = "{}?access_token={}".format(MODEL_API_URL, ACCESS_TOKEN)
response = requests.post(url=request_url, json=PARAMS)
response_json = response.json()
response_str = json.dumps(response_json, indent=4, ensure_ascii=False)
print("result:{}".format(response_str))
print(response_json['results']['name'])```

## speech synthesis

```from aip import AipSpeech

""" Yours APPID AK SK """
APP_ID = '24873305'
API_KEY = 'uzWDokZaiYxGTH5Sn1UKnN85'
SECRET_KEY = 'H3l5DBSfGbq7QsFgHAfPri04azPWVITs'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

result  = client.synthesis('You black fellow, are very arrogant. I don't know whether you are a kiln maker or a charcoal seller', 'zh', 1, {
'vol': 8, 'per':3, 'spd':4,
})

# If the recognition is correct and the voice binary error is returned, dict will be returned. Refer to the following error code
if not isinstance(result, dict):
with open('audio.mp3', 'wb') as f:
f.write(result)```

## Language recognition

```from aip import AipSpeech

""" Yours APPID AK SK """
APP_ID = '24873305'
API_KEY = 'uzWDokZaiYxGTH5Sn1UKnN85'
SECRET_KEY = 'H3l5DBSfGbq7QsFgHAfPri04azPWVITs'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)