# 1, Introduction

## (1) Linear criterion (LDA)

Linear discriminant analysis (LDA), also known as Fisher linear discriminant analysis, is a dimension reduction technology of supervised learning, that is, each sample of its data set has category output, which is different from PCA (unsupervised learning). LDA is widely used in pattern recognition (such as face recognition, ship recognition and other graphic and image recognition fields), so it is necessary for us to understand its algorithm principle.

1.Fisher criterion

• Basic idea: for the problem of linear classification of two categories, select the appropriate threshold to make the vector of Fisher criterion function reaching the extreme value as the best projection direction. The hyperplane perpendicular to the projection direction is the classification surface of the two categories, so that the maximum inter class dispersion and the minimum intra class dispersion can be achieved after the samples are projected in this direction.
• Fisher linear discriminant does not make any assumptions about the distribution of samples, but in many cases, when the sample dimension is relatively high and the number of samples is relatively large, the samples are close to the normal distribution after being projected into one-dimensional space. At this time, the samples can be used to fit the normal distribution in one-dimensional space and the obtained parameters can be used to determine the classification threshold.

2. Perceptron criteria

• Basic idea: for the linear discriminant function, when the dimension of the pattern is known, the form of the discriminant function has actually been determined. The process of linear discriminant is to determine the weight vector. Perceptron is a neural network model, which is characterized by randomly determining the initial value of the discriminant function. In the process of sample classification training, the weight vector value is determined by continuously correcting the weight of the samples with wrong classification and iterating step by step until the final classification meets the predetermined standard. It can be proved that the perceptron is a convergent algorithm. As long as the pattern category is linearly separable, the solution of the weight vector can be obtained in a limited number of iterative steps.
• Advantages: simple and easy to implement.
Disadvantages: the result is not unique and does not converge when the linearity is inseparable.

3. Least squares criterion

• Least squares criterion is a basic principle of least squares adjustment calculation. It is an additional condition for solving indefinite linear equations. In any adjustment calculation, the number of equations listed is always less than the number of unknowns contained in the equation, so its solution is not unique. A set of unique solutions can be obtained by solving under the least square criterion.
• If only the observed value is a random quantity in the adjustment, the least square criterion is VTPV=min.
• If not only the observed value is a random quantity, but also the parameter is a random quantity, the least squares criterion is VTPV+.xTP}.x = min, and it is called the generalized least squares criterion.
• V is the correction vector of the observation vector, P is the weight matrix of the observation vector, x is the correction vector of the parameter vector, and P 2 is the weight matrix of the parameter vector.

## (2) Linear classification algorithm (support vector machine, SVM)

Support vector machines (SVM) is a binary classification model. Its basic model is the linear classifier with the largest interval defined in the feature space, which makes it different from the perceptron; SVM also includes kernel techniques, which makes it a substantially nonlinear classifier. The learning strategy of SVM is interval maximization, which can be formalized as a problem of solving convex quadratic programming, which is also equivalent to the minimization of regularized hinge loss function. The learning algorithm of SVM is the optimization algorithm for solving convex quadratic programming.

The basic idea of SVM learning is to solve the separation hyperplane which can correctly divide the training data set and has the largest geometric interval. As shown in the figure below, w · x+b=0 is the separation hyperplane. For linearly separable data sets, there are infinite hyperplanes (i.e. perceptron), but the separation hyperplane with the largest geometric spacing is unique. # Two, simulated dataset LDA algorithm exercises

1. Import package

```from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as lda#Import LDA Algorithm
from sklearn.datasets._samples_generator import make_classification #Import classification builder
import matplotlib.pyplot as plt #Import tools for drawing
import numpy as np
import pandas as pd
```

2. Obtain data sets and conduct training

```x,y=make_classification(n_samples=500,n_features=2,n_redundant=0,n_classes=2,n_informative=1,n_clusters_per_class=1,class_sep=0.5,random_state=100)
"""
n_features :Number of features= n_informative() + n_redundant + n_repeated
n_informative: Number of multi information features
n_redundant: Redundant information, informative Random linear combination of features
n_repeated : Duplicate information, random extraction n_informative and n_redundant features
n_classes: Classification category
n_clusters_per_class : A category consists of several cluster Constitutive

"""
plt.scatter(x[:,0],x[:,1], marker='o', c=y)
plt.show()
x_train=x[:60, :60]
y_train=y[:60]
x_test=x[40:, :]
y_test=y[40:]
```

3. The data set is divided into training set and test set, and the classification ratio is 6:4. After training, the accuracy is obtained by using the test set

```#It is divided into training set and test set for model training and testing
x_train=x[:300, :300]
y_train=y[:300]
x_test=x[200:, :]
y_test=y[200:]
lda_test=lda()
lda_test.fit(x_train,y_train)
predict_y=lda_test.predict(x_test)#Get predicted results
count=0
for i in range(len(predict_y)):
if predict_y[i]==y_test[i]:
count+=1
print("The number of accurate forecasts is"+str(count))
print("The accuracy is"+str(count/len(predict_y)))
```

# 3, SVM classification of lunar dataset

## (1) Linear kernel

1. Import package

```# Importing moon dataset and svm method
#This is linear svm
from sklearn import datasets #Import dataset
from sklearn.svm import LinearSVC #Import linear svm
from matplotlib.colors import ListedColormap
from sklearn.preprocessing import StandardScaler
```

2. Obtain data

```data_x,data_y=datasets.make_moons(noise=0.15,random_state=777)#Generate moon dataset
# random_state is a random seed and nosie is a square
plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1])
plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1])
data_x=data_x[data_y<2,:2]#Data only_ Y is less than 2, and only the first two features are taken
plt.show()
```

3. Standardize the data

```scaler=StandardScaler()# Standardization
scaler.fit(data_x)#Calculate the mean and variance of training data
data_x=scaler.transform(data_x) #Then use the mean and variance in scaler to convert X and standardize X
liner_svc=LinearSVC(C=1e9,max_iter=100000)#For linear svm classifier, iter is the number of iterations, and the value of c determines the fault tolerance. The larger c is, the smaller the fault tolerance is
liner_svc.fit(data_x,data_y)
```

4. Boundary drawing function

```# Boundary drawing function
def plot_decision_boundary(model,axis):
x0,x1=np.meshgrid(
np.linspace(axis,axis,int((axis-axis)*100)).reshape(-1,1),
np.linspace(axis,axis,int((axis-axis)*100)).reshape(-1,1))
# The meshgrid function returns a coordinate matrix from a coordinate vector
x_new=np.c_[x0.ravel(),x1.ravel()]
y_predict=model.predict(x_new)#Get predicted value
zz=y_predict.reshape(x0.shape)
custom_cmap=ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
plt.contourf(x0,x1,zz,cmap=custom_cmap)
```

5. Draw and display parameters and intercept

```#Draw and display parameters and intercept
plot_decision_boundary(liner_svc,axis=[-3,3,-3,3])
plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1],color='red')
plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1],color='blue')
plt.show()
print('Parameter weight')
print(liner_svc.coef_)
print('Model intercept')
print(liner_svc.intercept_)
```

## (2) Polynomial kernel

1. Import package

```# Importing moon dataset and svm method
#This is polynomial kernel svm
from sklearn import datasets #Import dataset
from sklearn.svm import LinearSVC #Import linear svm
from sklearn.pipeline import Pipeline #Import pipes in python
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler,PolynomialFeatures #Import polynomial regression and standardization
```

2. Get data set

```data_x,data_y=datasets.make_moons(noise=0.15,random_state=777)#Generate moon dataset
# random_state is a random seed and nosie is a square
plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1])
plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1])
data_x=data_x[data_y<2,:2]#Data only_ Y is less than 2, and only the first two features are taken
plt.show()
```

3. Integrated programming with pipeline

``` def PolynomialSVC(degree,c=10):#Polynomial svm
return Pipeline([
# Mapping source data to third-order polynomials
("poly_features", PolynomialFeatures(degree=degree)),
# Standardization
("scaler", StandardScaler()),
# SVC linear classifier
("svm_clf", LinearSVC(C=10, loss="hinge", random_state=42,max_iter=10000))
])
```

4. Model training and drawing

```# Model training and drawing
poly_svc=PolynomialSVC(degree=3)
poly_svc.fit(data_x,data_y)
plot_decision_boundary(poly_svc,axis=[-1.5,2.5,-1.0,1.5])#Draw boundary
plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1],color='red')#Draw point
plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1],color='blue')
plt.show()
print('Parameter weight')
print(poly_svc.named_steps['svm_clf'].coef_)
print('Model intercept')
print(poly_svc.named_steps['svm_clf'].intercept_)
```

## (3) Gaussian kernel

1. Import package

```## Import package
from sklearn import datasets #Import dataset
from sklearn.svm import SVC #Import svm
from sklearn.pipeline import Pipeline #Import pipes in python
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler#Import standardization
```

2. Obtain data

```def RBFKernelSVC(gamma=2.0):
return Pipeline([
('std_scaler',StandardScaler()),
('svc',SVC(kernel='rbf',gamma=gamma))
])
```

3. Carry out model training and draw graphics

```svc=RBFKernelSVC(gamma=100)#Gamma parameter is very important. The larger the gamma parameter, the smaller the support vector
svc.fit(data_x,data_y)
plot_decision_boundary(svc,axis=[-1.5,2.5,-1.0,1.5])
plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1],color='red')#Draw point
plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1],color='blue')
plt.show()
```

# 4, Summary

LDA benefits

• In the process of dimensionality reduction, category prior knowledge experience can be used, while unsupervised learning such as PCA can not use category prior knowledge.
• LDA is better than PCA when the sample classification information depends on the mean rather than variance.