# 1, Introduction

## (1) Linear criterion (LDA)

Linear discriminant analysis (LDA), also known as Fisher linear discriminant analysis, is a dimension reduction technology of supervised learning, that is, each sample of its data set has category output, which is different from PCA (unsupervised learning). LDA is widely used in pattern recognition (such as face recognition, ship recognition and other graphic and image recognition fields), so it is necessary for us to understand its algorithm principle.

1.Fisher criterion

- Basic idea: for the problem of linear classification of two categories, select the appropriate threshold to make the vector of Fisher criterion function reaching the extreme value as the best projection direction. The hyperplane perpendicular to the projection direction is the classification surface of the two categories, so that the maximum inter class dispersion and the minimum intra class dispersion can be achieved after the samples are projected in this direction.
- Fisher linear discriminant does not make any assumptions about the distribution of samples, but in many cases, when the sample dimension is relatively high and the number of samples is relatively large, the samples are close to the normal distribution after being projected into one-dimensional space. At this time, the samples can be used to fit the normal distribution in one-dimensional space and the obtained parameters can be used to determine the classification threshold.

2. Perceptron criteria

- Basic idea: for the linear discriminant function, when the dimension of the pattern is known, the form of the discriminant function has actually been determined. The process of linear discriminant is to determine the weight vector. Perceptron is a neural network model, which is characterized by randomly determining the initial value of the discriminant function. In the process of sample classification training, the weight vector value is determined by continuously correcting the weight of the samples with wrong classification and iterating step by step until the final classification meets the predetermined standard. It can be proved that the perceptron is a convergent algorithm. As long as the pattern category is linearly separable, the solution of the weight vector can be obtained in a limited number of iterative steps.
- Advantages: simple and easy to implement.

Disadvantages: the result is not unique and does not converge when the linearity is inseparable.

3. Least squares criterion

- Least squares criterion is a basic principle of least squares adjustment calculation. It is an additional condition for solving indefinite linear equations. In any adjustment calculation, the number of equations listed is always less than the number of unknowns contained in the equation, so its solution is not unique. A set of unique solutions can be obtained by solving under the least square criterion.
- If only the observed value is a random quantity in the adjustment, the least square criterion is VTPV=min.
- If not only the observed value is a random quantity, but also the parameter is a random quantity, the least squares criterion is VTPV+.xTP}.x = min, and it is called the generalized least squares criterion.
- V is the correction vector of the observation vector, P is the weight matrix of the observation vector, x is the correction vector of the parameter vector, and P 2 is the weight matrix of the parameter vector.

## (2) Linear classification algorithm (support vector machine, SVM)

Support vector machines (SVM) is a binary classification model. Its basic model is the linear classifier with the largest interval defined in the feature space, which makes it different from the perceptron; SVM also includes kernel techniques, which makes it a substantially nonlinear classifier. The learning strategy of SVM is interval maximization, which can be formalized as a problem of solving convex quadratic programming, which is also equivalent to the minimization of regularized hinge loss function. The learning algorithm of SVM is the optimization algorithm for solving convex quadratic programming.

The basic idea of SVM learning is to solve the separation hyperplane which can correctly divide the training data set and has the largest geometric interval. As shown in the figure below, w · x+b=0 is the separation hyperplane. For linearly separable data sets, there are infinite hyperplanes (i.e. perceptron), but the separation hyperplane with the largest geometric spacing is unique.

# Two, simulated dataset LDA algorithm exercises

1. Import package

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as lda#Import LDA Algorithm from sklearn.datasets._samples_generator import make_classification #Import classification builder import matplotlib.pyplot as plt #Import tools for drawing import numpy as np import pandas as pd

2. Obtain data sets and conduct training

x,y=make_classification(n_samples=500,n_features=2,n_redundant=0,n_classes=2,n_informative=1,n_clusters_per_class=1,class_sep=0.5,random_state=100) """ n_features :Number of features= n_informative() + n_redundant + n_repeated n_informative: Number of multi information features n_redundant: Redundant information, informative Random linear combination of features n_repeated : Duplicate information, random extraction n_informative and n_redundant features n_classes: Classification category n_clusters_per_class : A category consists of several cluster Constitutive """ plt.scatter(x[:,0],x[:,1], marker='o', c=y) plt.show() x_train=x[:60, :60] y_train=y[:60] x_test=x[40:, :] y_test=y[40:]

3. The data set is divided into training set and test set, and the classification ratio is 6:4. After training, the accuracy is obtained by using the test set

#It is divided into training set and test set for model training and testing x_train=x[:300, :300] y_train=y[:300] x_test=x[200:, :] y_test=y[200:] lda_test=lda() lda_test.fit(x_train,y_train) predict_y=lda_test.predict(x_test)#Get predicted results count=0 for i in range(len(predict_y)): if predict_y[i]==y_test[i]: count+=1 print("The number of accurate forecasts is"+str(count)) print("The accuracy is"+str(count/len(predict_y)))

# 3, SVM classification of lunar dataset

## (1) Linear kernel

1. Import package

# Importing moon dataset and svm method #This is linear svm from sklearn import datasets #Import dataset from sklearn.svm import LinearSVC #Import linear svm from matplotlib.colors import ListedColormap from sklearn.preprocessing import StandardScaler

2. Obtain data

data_x,data_y=datasets.make_moons(noise=0.15,random_state=777)#Generate moon dataset # random_state is a random seed and nosie is a square plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1]) plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1]) data_x=data_x[data_y<2,:2]#Data only_ Y is less than 2, and only the first two features are taken plt.show()

3. Standardize the data

scaler=StandardScaler()# Standardization scaler.fit(data_x)#Calculate the mean and variance of training data data_x=scaler.transform(data_x) #Then use the mean and variance in scaler to convert X and standardize X liner_svc=LinearSVC(C=1e9,max_iter=100000)#For linear svm classifier, iter is the number of iterations, and the value of c determines the fault tolerance. The larger c is, the smaller the fault tolerance is liner_svc.fit(data_x,data_y)

4. Boundary drawing function

# Boundary drawing function def plot_decision_boundary(model,axis): x0,x1=np.meshgrid( np.linspace(axis[0],axis[1],int((axis[1]-axis[0])*100)).reshape(-1,1), np.linspace(axis[2],axis[3],int((axis[3]-axis[2])*100)).reshape(-1,1)) # The meshgrid function returns a coordinate matrix from a coordinate vector x_new=np.c_[x0.ravel(),x1.ravel()] y_predict=model.predict(x_new)#Get predicted value zz=y_predict.reshape(x0.shape) custom_cmap=ListedColormap(['#EF9A9A','#FFF59D','#90CAF9']) plt.contourf(x0,x1,zz,cmap=custom_cmap)

5. Draw and display parameters and intercept

#Draw and display parameters and intercept plot_decision_boundary(liner_svc,axis=[-3,3,-3,3]) plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1],color='red') plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1],color='blue') plt.show() print('Parameter weight') print(liner_svc.coef_) print('Model intercept') print(liner_svc.intercept_)

## (2) Polynomial kernel

1. Import package

# Importing moon dataset and svm method #This is polynomial kernel svm from sklearn import datasets #Import dataset from sklearn.svm import LinearSVC #Import linear svm from sklearn.pipeline import Pipeline #Import pipes in python from matplotlib.colors import ListedColormap import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler,PolynomialFeatures #Import polynomial regression and standardization

2. Get data set

data_x,data_y=datasets.make_moons(noise=0.15,random_state=777)#Generate moon dataset # random_state is a random seed and nosie is a square plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1]) plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1]) data_x=data_x[data_y<2,:2]#Data only_ Y is less than 2, and only the first two features are taken plt.show()

3. Integrated programming with pipeline

def PolynomialSVC(degree,c=10):#Polynomial svm return Pipeline([ # Mapping source data to third-order polynomials ("poly_features", PolynomialFeatures(degree=degree)), # Standardization ("scaler", StandardScaler()), # SVC linear classifier ("svm_clf", LinearSVC(C=10, loss="hinge", random_state=42,max_iter=10000)) ])

4. Model training and drawing

# Model training and drawing poly_svc=PolynomialSVC(degree=3) poly_svc.fit(data_x,data_y) plot_decision_boundary(poly_svc,axis=[-1.5,2.5,-1.0,1.5])#Draw boundary plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1],color='red')#Draw point plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1],color='blue') plt.show() print('Parameter weight') print(poly_svc.named_steps['svm_clf'].coef_) print('Model intercept') print(poly_svc.named_steps['svm_clf'].intercept_)

## (3) Gaussian kernel

1. Import package

## Import package from sklearn import datasets #Import dataset from sklearn.svm import SVC #Import svm from sklearn.pipeline import Pipeline #Import pipes in python import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler#Import standardization

2. Obtain data

def RBFKernelSVC(gamma=2.0): return Pipeline([ ('std_scaler',StandardScaler()), ('svc',SVC(kernel='rbf',gamma=gamma)) ])

3. Carry out model training and draw graphics

svc=RBFKernelSVC(gamma=100)#Gamma parameter is very important. The larger the gamma parameter, the smaller the support vector svc.fit(data_x,data_y) plot_decision_boundary(svc,axis=[-1.5,2.5,-1.0,1.5]) plt.scatter(data_x[data_y==0,0],data_x[data_y==0,1],color='red')#Draw point plt.scatter(data_x[data_y==1,0],data_x[data_y==1,1],color='blue') plt.show()

# 4, Summary

LDA benefits

- In the process of dimensionality reduction, category prior knowledge experience can be used, while unsupervised learning such as PCA can not use category prior knowledge.
- LDA is better than PCA when the sample classification information depends on the mean rather than variance.

Advantages of SVM

- Not needing many samples does not mean that the absolute number of training samples is small, but that SVM needs relatively few samples under the same problem complexity compared with other training classification algorithms. And because SVM introduces kernel function, SVM can easily deal with high-dimensional samples.
- Minimum structural risk. This risk refers to the cumulative error between the classifier's approximation to the real model of the problem and the real solution of the problem.
- Nonlinear means that SVM is good at dealing with the linear indivisibility of sample data, which is mainly realized by relaxation variable (also known as penalty variable) and kernel function technology. This part is the essence of SVM.

# reference

Linear discriminant analysis (LDA) criterion

Support vector machine (SVM) -- principle

Practice of linear discriminant criterion (LDA) and linear classification programming (SVM)