# 1, Linear discriminant analysis

## (1) Introduction

linear discriminant analysis (LDA) is the induction of Fisher's linear discrimination method, which uses statistics，pattern recognition And machine learning method, trying to find a linear combination of the characteristics of two kinds of objects or events, so as to characterize or distinguish them. The resulting combination can be used as a linear classifier or, more commonly, to reduce the dimension for subsequent classification

Linear discriminant analysis is a classical linear learning method. It was first proposed by Fisher in 1936, also known as Fisher linear discriminant analysis. The idea of linear discrimination is very simple: given the training sample set, try to project the sample onto a straight line, so that the projection points of similar samples are as close as possible and the projection points of different samples are as far away as possible; When classifying new samples, project them onto the same line, and then determine the category of new samples according to the position of projection points. [2]

LDA and variance analysis (ANOVA) is closely related to regression analysis. These two analysis methods also try to represent a dependent variable through the linear combination of some characteristics or measurements. However, ANOVA uses category independent variables and continuous dependent variables, while discriminant analysis uses continuous independent variables and category dependent variables (i.e. class labels). Logistic regression and probabilistic regression are more similar to LDA than ANOVA because they also use continuous independent variables to explain category dependent variables.

The basic assumption of LDA is that the independent variables are normally distributed. When this assumption cannot be met, it is more inclined to use the other methods mentioned above in practical application. LDA is also associated with principal component analysis (PCA) and factor analysis Closely related, they are looking for the best linear combination of variables to explain the data. LDA explicitly attempts to model the differences between data classes. On the other hand, PCA does not consider any different classes, and factor analysis establishes feature combinations based on different points rather than the same points. The difference between discriminant analysis and factor analysis is that it is not an interdependent Technology: that is, it is necessary to distinguish the differences between independent variables and dependent variables (also known as criterion variables). LDA can work effectively when the measured value of each observation of the independent variable is a continuous quantity. When dealing with category independent variables, the technology corresponding to LDA is called discriminant response analysis.

## (2) Advantages

It has been nearly 70 years since Fisher proposed LDA, which is still one of the most widely used and effective methods in the field of dimensionality reduction pattern classification. Its typical applications include face detection, face recognition, horizon detection based on visual flight, target tracking and detection, credit card fraud detection, image retrieval, speech recognition and so on. The main reason why LDA is so widely used is that LDA (including its multi class generalization) has the following advantages: it can directly obtain the analytical solution based on the generalized eigenvalue problem, so as to avoid the local minimum problem often encountered in the construction of general nonlinear algorithms, such as multilayer perceptron, and there is no need to encode the output category of the mode, Thus, LDA shows particularly obvious advantages in dealing with unbalanced mode classes. Compared with neural network method, LDA does not need to adjust parameters, so there are no learning parameters, optimization weights and neuron Selection of activation function; It is not sensitive to normalization or randomization of patterns, which is more prominent in various algorithms based on gradient descent [3]. In some practical cases, LDA has the same advantages as that based on the principle of structural risk minimization Support vector machine (SVM) has equivalent or even better generalization performance, but its computational efficiency is much better than SVM. regular Discriminant analysis (CDA) find the coordinate axis of the optimal classification (k-1 regular coordinates, K is the number of categories). These linear functions are irrelevant. In fact, they define an optimal k-1 space through n-dimensional data cloud, which can optimally distinguish K classes (through their projection in space)..

Multi class LDA: when there are more than two classes, the analysis method derived from Fisher discriminant can be used, which extends to find a subspace that retains the variability of all classes. This is summarized by C.R.Rao. Suppose that each of the C classes has a mean and the same covariance.

To realize the typical LDA technology, the premise is that all samples must be prepared in advance. However, in some cases, there is no ready-made complete data set or the input observation data is in the form of stream. In this way, the feature extraction of LDA is required to be able to update the features of LDA with the increase of new observation samples, rather than running the algorithm on the whole data set. For example, in real-time applications such as mobile robots or real-time face recognition, it is very important that the extracted LDA features can be updated in real time with the new observations. This technology that can update LDA features by simply observing new samples is called incremental LDA Algorithm, which has been widely studied in the past two decades. Catterjee and Roychowdhury proposed an incremental self-organizing LDA Algorithm to update LDA features. In addition, Demir and Ozmehmet proposed an online local learning algorithm using error correction and Heb learning rules to update LDA features. Finally, Aliyari et al. Provided a fast incremental LDA Algorithm.

## 2, Programming implementation

To work with the Orioles dataset:

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_classification class LDA(): def Train(self, X, y): """X For training data sets, y For training label""" X1 = np.array([X[i] for i in range(len(X)) if y[i] == 0]) X2 = np.array([X[i] for i in range(len(X)) if y[i] == 1]) # Find the center point mju1 = np.mean(X1, axis=0) # mju1 is an ndrray type mju2 = np.mean(X2, axis=0) # dot(a, b, out=None) calculates matrix multiplication cov1 = np.dot((X1 - mju1).T, (X1 - mju1)) cov2 = np.dot((X2 - mju2).T, (X2 - mju2)) Sw = cov1 + cov2 # Calculate w w = np.dot(np.mat(Sw).I, (mju1 - mju2).reshape((len(mju1), 1))) # Record training results self.mju1 = mju1 # Category 1 Classification Center self.cov1 = cov1 self.mju2 = mju2 # Category 2 classification Center self.cov2 = cov2 self.Sw = Sw # within-class scatter matrix self.w = w # Discriminant weight matrix def Test(self, X, y): """X For the test data set, y For testing label""" # Classification results y_new = np.dot((X), self.w) # Calculating fisher linear discriminant nums = len(y) c1 = np.dot((self.mju1 - self.mju2).reshape(1, (len(self.mju1))), np.mat(self.Sw).I) c2 = np.dot(c1, (self.mju1 + self.mju2).reshape((len(self.mju1), 1))) c = 1/2 * c2 # 2 classified centers h = y_new - c # distinguish y_hat = [] for i in range(nums): if h[i] >= 0: y_hat.append(0) else: y_hat.append(1) # Calculate classification accuracy count = 0 for i in range(nums): if y_hat[i] == y[i]: count += 1 precise = count / nums # display information print("Number of test samples:", nums) print("Predict the number of correct samples:", count) print("Test accuracy:", precise) return precise if '__main__' == __name__: # Generate classified data n_samples = 500 X, y = make_classification(n_samples=n_samples, n_features=2, n_redundant=0, n_classes=2,n_informative=1, n_clusters_per_class=1, class_sep=0.5, random_state=10) # LDA linear discriminant analysis (two classification) lda = LDA() # 60% for training and 40% for testing Xtrain = X[:299, :] Ytrain = y[:299] Xtest = X[300:, :] Ytest = y[300:] lda.Train(Xtrain, Ytrain) precise = lda.Test(Xtest, Ytest) # raw data plt.scatter(X[:, 0], X[:, 1], marker='o', c=y) plt.xlabel("x1") plt.ylabel("x2") plt.title("Test precise:" + str(precise)) plt.show()

The following problems occurred while running the code:

Method 1: import make directly from sklearn.datasets_ blobs

from sklearn.datasets.samples_generator import make_blobs is modified to:

from sklearn.datasets import make_blobs

Solved

Method 2: version problem, reduce the version

Enter at the current terminal:

pip install scikit-learn==0.22.1

Although there are warnings, it can at least run.

result:

Process the moon dataset:

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_moons class LDA(): def Train(self, X, y): """X For training data sets, y For training label""" X1 = np.array([X[i] for i in range(len(X)) if y[i] == 0]) X2 = np.array([X[i] for i in range(len(X)) if y[i] == 1]) # Find the center point mju1 = np.mean(X1, axis=0) # mju1 is an ndrray type mju2 = np.mean(X2, axis=0) # dot(a, b, out=None) calculates matrix multiplication cov1 = np.dot((X1 - mju1).T, (X1 - mju1)) cov2 = np.dot((X2 - mju2).T, (X2 - mju2)) Sw = cov1 + cov2 # Calculate w w = np.dot(np.mat(Sw).I, (mju1 - mju2).reshape((len(mju1), 1))) # Record training results self.mju1 = mju1 # Category 1 Classification Center self.cov1 = cov1 self.mju2 = mju2 # Category 1 Classification Center self.cov2 = cov2 self.Sw = Sw # within-class scatter matrix self.w = w # Discriminant weight matrix def Test(self, X, y): """X For the test data set, y For testing label""" # Classification results y_new = np.dot((X), self.w) # Calculating fisher linear discriminant nums = len(y) c1 = np.dot((self.mju1 - self.mju2).reshape(1, (len(self.mju1))), np.mat(self.Sw).I) c2 = np.dot(c1, (self.mju1 + self.mju2).reshape((len(self.mju1), 1))) c = 1/2 * c2 # 2 classified centers h = y_new - c # distinguish y_hat = [] for i in range(nums): if h[i] >= 0: y_hat.append(0) else: y_hat.append(1) # Calculate classification accuracy count = 0 for i in range(nums): if y_hat[i] == y[i]: count += 1 precise = count / (nums+0.000001) # display information print("Number of test samples:", nums) print("Predict the number of correct samples:", count) print("Test accuracy:", precise) return precise if '__main__' == __name__: # Generate classified data X, y = make_moons(n_samples=100, noise=0.15, random_state=42) # LDA linear discriminant analysis (two classification) lda = LDA() # 60% for training and 40% for testing Xtrain = X[:60, :] Ytrain = y[:60] Xtest = X[40:, :] Ytest = y[40:] lda.Train(Xtrain, Ytrain) precise = lda.Test(Xtest, Ytest) # raw data plt.scatter(X[:, 0], X[:, 1], marker='o', c=y) plt.xlabel("x1") plt.ylabel("x2") plt.title("Test precise:" + str(precise)) plt.show()

Operation results:

SVM classification of moon dataset:

import matplotlib.pyplot as plt from sklearn.pipeline import Pipeline import numpy as np import matplotlib as mpl from sklearn.datasets import make_moons from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import StandardScaler from sklearn.svm import LinearSVC # To display Chinese mpl.rcParams['font.sans-serif'] = [u'SimHei'] mpl.rcParams['axes.unicode_minus'] = False#rc configuration or rc parameters. The default properties can be modified through rc parameters, including form size, points per inch, line width, color, style, coordinate axis, coordinate and network properties, text, font, etc. X, y = make_moons(n_samples=100, noise=0.15, random_state=42)#Generate moon dataset def plot_dataset(X, y, axes):#Drawing graphics plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs") plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^") plt.axis(axes) plt.grid(True, which='both') plt.xlabel(r"$x_1$", fontsize=20) plt.ylabel(r"$x_2$", fontsize=20, rotation=0) plt.title("Moon data",fontsize=20) plot_dataset(X, y, [-1.5, 2.5, -1, 1.5]) plt.show()

result:

Iris dataset SVM algorithm two classification

# Iris dataset SVM algorithm two classification import numpy as np import matplotlib.pyplot as plt from sklearn import datasets, svm import pandas as pd from pylab import * mpl.rcParams['font.sans-serif'] = ['SimHei'] iris = datasets.load_iris() iris = datasets.load_iris() X = iris.data y = iris.target X = X[y != 0, :2] # Select the first two properties of X y = y[y != 0] n_sample = len(X) np.random.seed(0) order = np.random.permutation(n_sample) # Permutation X = X[order] y = y[order].astype(np.float) X_train = X[:int(.9 * n_sample)] y_train = y[:int(.9 * n_sample)] X_test = X[int(.9 * n_sample):] y_test = y[int(.9 * n_sample):] #Appropriate model for fig_num, kernel in enumerate(('linear', 'rbf','poly')): # Radial Basis Function (RBF) is commonly used as Gaussian basis function clf = svm.SVC(kernel=kernel, gamma=10) # gamma is the kernel coefficient of "rbf", "poly" and "sigmoid". clf.fit(X_train, y_train) plt.figure(str(kernel)) plt.xlabel('x1') plt.ylabel('x2') plt.scatter(X[:, 0], X[:, 1], c=y, zorder=10, cmap=plt.cm.Paired, edgecolor='k', s=20) # Zorder: the order is arranged in the Z direction. The larger the value, it will be displayed above # Paired two color similar outputs (paired) # Circle the test data plt.scatter(X_test[:, 0], X_test[:, 1], s=80, facecolors='none',zorder=10, edgecolor='k') plt.axis('tight') #Change the x - and y-axis limits so that all data is displayed x_min = X[:, 0].min() x_max = X[:, 0].max() y_min = X[:, 1].min() y_max = X[:, 1].max() XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j] Z = clf.decision_function(np.c_[XX.ravel(), YY.ravel()]) # Distance from sample X to separation hyperplane Z = Z.reshape(XX.shape) plt.contourf(XX,YY,Z>0,cmap=plt.cm.Paired) plt.contour(XX, YY, Z, colors=['r', 'k', 'b'], linestyles=['--', '-', '--'], levels=[-0.5, 0, 0.5]) # Range plt.title(kernel) plt.show()

result:

SVM binary classification of lunar dataset

# SVM binary classification of lunar dataset import matplotlib.pyplot as plt from sklearn.pipeline import Pipeline import numpy as np import matplotlib as mpl from sklearn.datasets import make_moons from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import StandardScaler from sklearn.svm import LinearSVC # To display Chinese mpl.rcParams['font.sans-serif'] = [u'SimHei'] mpl.rcParams['axes.unicode_minus'] = False X, y = make_moons(n_samples=100, noise=0.15, random_state=42) def plot_dataset(X, y, axes): plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs") plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^") plt.axis(axes) plt.grid(True, which='both') plt.xlabel(r"$x_1$", fontsize=20) plt.ylabel(r"$x_2$", fontsize=20, rotation=0) plt.title("Moon data",fontsize=20) plot_dataset(X, y, [-1.5, 2.5, -1, 1.5]) polynomial_svm_clf = Pipeline([ # Mapping source data to third-order polynomials ("poly_features", PolynomialFeatures(degree=3)), # Standardization ("scaler", StandardScaler()), # SVC linear classifier ("svm_clf", LinearSVC(C=10, loss="hinge", random_state=42)) ]) polynomial_svm_clf.fit(X, y) def plot_predictions(clf, axes): # charge by the meter x0s = np.linspace(axes[0], axes[1], 100) x1s = np.linspace(axes[2], axes[3], 100) x0, x1 = np.meshgrid(x0s, x1s) X = np.c_[x0.ravel(), x1.ravel()] y_pred = clf.predict(X).reshape(x0.shape) y_decision = clf.decision_function(X).reshape(x0.shape) # print(y_pred) # print(y_decision) plt.contourf(x0, x1, y_pred, cmap=plt.cm.brg, alpha=0.2) plt.contourf(x0, x1, y_decision, cmap=plt.cm.brg, alpha=0.1) plot_predictions(polynomial_svm_clf, [-1.5, 2.5, -1, 1.5]) plot_dataset(X, y, [-1.5, 2.5, -1, 1.5]) plt.show()

give the result as follows

from sklearn.svm import SVC gamma1, gamma2 = 0.1, 5 C1, C2 = 0.001, 1000 hyperparams = (gamma1, C1), (gamma1, C2) svm_clfs = [] for gamma, C in hyperparams: rbf_kernel_svm_clf = Pipeline([ ("scaler", StandardScaler()), ("svm_clf", SVC(kernel="rbf", gamma=gamma, C=C)) ]) rbf_kernel_svm_clf.fit(X, y) svm_clfs.append(rbf_kernel_svm_clf) plt.figure(figsize=(11, 7)) for i, svm_clf in enumerate(svm_clfs): plt.subplot(221 + i) plot_predictions(svm_clf, [-1.5, 2.5, -1, 1.5]) plot_dataset(X, y, [-1.5, 2.5, -1, 1.5]) gamma, C = hyperparams[i] plt.title(r"$\gamma = {}, C = {}$".format(gamma, C), fontsize=16) plt.tight_layout() plt.show()

result