Linear discriminant criterion and linear classification programming practice

1, Linear discriminant analysis

(1) Introduction

  linear discriminant analysis (LDA) is the induction of Fisher's linear discrimination method, which uses statisticspattern recognition And machine learning method, trying to find a linear combination of the characteristics of two kinds of objects or events, so as to characterize or distinguish them. The resulting combination can be used as a linear classifier or, more commonly, to reduce the dimension for subsequent classification

Linear discriminant analysis is a classical linear learning method. It was first proposed by Fisher in 1936, also known as Fisher linear discriminant analysis. The idea of linear discrimination is very simple: given the training sample set, try to project the sample onto a straight line, so that the projection points of similar samples are as close as possible and the projection points of different samples are as far away as possible; When classifying new samples, project them onto the same line, and then determine the category of new samples according to the position of projection points. [2]

LDA and variance analysis (ANOVA) is closely related to regression analysis. These two analysis methods also try to represent a dependent variable through the linear combination of some characteristics or measurements. However, ANOVA uses category independent variables and continuous dependent variables, while discriminant analysis uses continuous independent variables and category dependent variables (i.e. class labels). Logistic regression and probabilistic regression are more similar to LDA than ANOVA because they also use continuous independent variables to explain category dependent variables.

The basic assumption of LDA is that the independent variables are normally distributed. When this assumption cannot be met, it is more inclined to use the other methods mentioned above in practical application. LDA is also associated with principal component analysis (PCA) and factor analysis Closely related, they are looking for the best linear combination of variables to explain the data. LDA explicitly attempts to model the differences between data classes. On the other hand, PCA does not consider any different classes, and factor analysis establishes feature combinations based on different points rather than the same points. The difference between discriminant analysis and factor analysis is that it is not an interdependent Technology: that is, it is necessary to distinguish the differences between independent variables and dependent variables (also known as criterion variables). LDA can work effectively when the measured value of each observation of the independent variable is a continuous quantity. When dealing with category independent variables, the technology corresponding to LDA is called discriminant response analysis.

(2) Advantages

 

It has been nearly 70 years since Fisher proposed LDA, which is still one of the most widely used and effective methods in the field of dimensionality reduction pattern classification. Its typical applications include face detection, face recognition, horizon detection based on visual flight, target tracking and detection, credit card fraud detection, image retrieval, speech recognition and so on. The main reason why LDA is so widely used is that LDA (including its multi class generalization) has the following advantages: it can directly obtain the analytical solution based on the generalized eigenvalue problem, so as to avoid the local minimum problem often encountered in the construction of general nonlinear algorithms, such as multilayer perceptron, and there is no need to encode the output category of the mode, Thus, LDA shows particularly obvious advantages in dealing with unbalanced mode classes. Compared with neural network method, LDA does not need to adjust parameters, so there are no learning parameters, optimization weights and neuron Selection of activation function; It is not sensitive to normalization or randomization of patterns, which is more prominent in various algorithms based on gradient descent [3]. In some practical cases, LDA has the same advantages as that based on the principle of structural risk minimization Support vector machine (SVM) has equivalent or even better generalization performance, but its computational efficiency is much better than SVM. regular Discriminant analysis (CDA) find the coordinate axis of the optimal classification (k-1 regular coordinates, K is the number of categories). These linear functions are irrelevant. In fact, they define an optimal k-1 space through n-dimensional data cloud, which can optimally distinguish K classes (through their projection in space)..

Multi class LDA: when there are more than two classes, the analysis method derived from Fisher discriminant can be used, which extends to find a subspace that retains the variability of all classes. This is summarized by C.R.Rao. Suppose that each of the C classes has a mean and the same covariance.

To realize the typical LDA technology, the premise is that all samples must be prepared in advance. However, in some cases, there is no ready-made complete data set or the input observation data is in the form of stream. In this way, the feature extraction of LDA is required to be able to update the features of LDA with the increase of new observation samples, rather than running the algorithm on the whole data set. For example, in real-time applications such as mobile robots or real-time face recognition, it is very important that the extracted LDA features can be updated in real time with the new observations. This technology that can update LDA features by simply observing new samples is called incremental LDA Algorithm, which has been widely studied in the past two decades. Catterjee and Roychowdhury proposed an incremental self-organizing LDA Algorithm to update LDA features. In addition, Demir and Ozmehmet proposed an online local learning algorithm using error correction and Heb learning rules to update LDA features. Finally, Aliyari et al. Provided a fast incremental LDA Algorithm.

2, Programming implementation

  To work with the Orioles dataset:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_classification

class LDA():
    def Train(self, X, y):
        """X For training data sets, y For training label"""
        X1 = np.array([X[i] for i in range(len(X)) if y[i] == 0])
        X2 = np.array([X[i] for i in range(len(X)) if y[i] == 1])

        # Find the center point
        mju1 = np.mean(X1, axis=0)  # mju1 is an ndrray type
        mju2 = np.mean(X2, axis=0)

        # dot(a, b, out=None) calculates matrix multiplication
        cov1 = np.dot((X1 - mju1).T, (X1 - mju1))
        cov2 = np.dot((X2 - mju2).T, (X2 - mju2))
        Sw = cov1 + cov2
        # Calculate w
        w = np.dot(np.mat(Sw).I, (mju1 - mju2).reshape((len(mju1), 1)))
        # Record training results
        self.mju1 = mju1  # Category 1 Classification Center
        self.cov1 = cov1
        self.mju2 = mju2  # Category 2 classification Center
        self.cov2 = cov2
        self.Sw = Sw  # within-class scatter matrix 
        self.w = w  # Discriminant weight matrix
    def Test(self, X, y):
        """X For the test data set, y For testing label"""
        # Classification results
        y_new = np.dot((X), self.w)
        # Calculating fisher linear discriminant
        nums = len(y)
        c1 = np.dot((self.mju1 - self.mju2).reshape(1, (len(self.mju1))), np.mat(self.Sw).I)
        c2 = np.dot(c1, (self.mju1 + self.mju2).reshape((len(self.mju1), 1)))
        c = 1/2 * c2  # 2 classified centers
        h = y_new - c
        # distinguish
        y_hat = []
        for i in range(nums):
            if h[i] >= 0:
                y_hat.append(0)
            else:
                y_hat.append(1)
        # Calculate classification accuracy
        count = 0
        for i in range(nums):
            if y_hat[i] == y[i]:
                count += 1
        precise = count / nums
        # display information
        print("Number of test samples:", nums)
        print("Predict the number of correct samples:", count)
        print("Test accuracy:", precise)
        return precise
if '__main__' == __name__:
    # Generate classified data
    n_samples = 500
    X, y = make_classification(n_samples=n_samples, n_features=2, n_redundant=0, n_classes=2,n_informative=1, n_clusters_per_class=1, class_sep=0.5, random_state=10)
    # LDA linear discriminant analysis (two classification)
    lda = LDA()
    # 60% for training and 40% for testing
    Xtrain = X[:299, :]
    Ytrain = y[:299]
    Xtest = X[300:, :]
    Ytest = y[300:]
    lda.Train(Xtrain, Ytrain)
    precise = lda.Test(Xtest, Ytest)
    # raw data
    plt.scatter(X[:, 0], X[:, 1], marker='o', c=y)
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.title("Test precise:" + str(precise))
    plt.show()

The following problems occurred while running the code:

Method 1: import make directly from sklearn.datasets_ blobs
from sklearn.datasets.samples_generator import make_blobs is modified to:
from sklearn.datasets import make_blobs
Solved
Method 2: version problem, reduce the version
Enter at the current terminal:
pip install scikit-learn==0.22.1

Although there are warnings, it can at least run.

result:

  Process the moon dataset:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
class LDA():
    def Train(self, X, y):
        """X For training data sets, y For training label"""
        X1 = np.array([X[i] for i in range(len(X)) if y[i] == 0])
        X2 = np.array([X[i] for i in range(len(X)) if y[i] == 1])
        # Find the center point
        mju1 = np.mean(X1, axis=0)  # mju1 is an ndrray type
        mju2 = np.mean(X2, axis=0)
        # dot(a, b, out=None) calculates matrix multiplication
        cov1 = np.dot((X1 - mju1).T, (X1 - mju1))
        cov2 = np.dot((X2 - mju2).T, (X2 - mju2))
        Sw = cov1 + cov2
        # Calculate w
        w = np.dot(np.mat(Sw).I, (mju1 - mju2).reshape((len(mju1), 1)))
        # Record training results
        self.mju1 = mju1  # Category 1 Classification Center
        self.cov1 = cov1
        self.mju2 = mju2  # Category 1 Classification Center
        self.cov2 = cov2
        self.Sw = Sw  # within-class scatter matrix 
        self.w = w  # Discriminant weight matrix
    def Test(self, X, y):
        """X For the test data set, y For testing label"""
        # Classification results
        y_new = np.dot((X), self.w)
        # Calculating fisher linear discriminant
        nums = len(y)
        c1 = np.dot((self.mju1 - self.mju2).reshape(1, (len(self.mju1))), np.mat(self.Sw).I)
        c2 = np.dot(c1, (self.mju1 + self.mju2).reshape((len(self.mju1), 1)))
        c = 1/2 * c2  # 2 classified centers
        h = y_new - c
        # distinguish
        y_hat = []
        for i in range(nums):
            if h[i] >= 0:
                y_hat.append(0)
            else:
                y_hat.append(1)
        # Calculate classification accuracy
        count = 0
        for i in range(nums):
            if y_hat[i] == y[i]:
                count += 1
        precise = count / (nums+0.000001)
        # display information
        print("Number of test samples:", nums)
        print("Predict the number of correct samples:", count)
        print("Test accuracy:", precise)
        return precise
if '__main__' == __name__:
    # Generate classified data
    X, y = make_moons(n_samples=100, noise=0.15, random_state=42)
    # LDA linear discriminant analysis (two classification)
    lda = LDA()
    # 60% for training and 40% for testing
    Xtrain = X[:60, :]
    Ytrain = y[:60]
    Xtest = X[40:, :]
    Ytest = y[40:]
    lda.Train(Xtrain, Ytrain)
    precise = lda.Test(Xtest, Ytest)
    # raw data
    plt.scatter(X[:, 0], X[:, 1], marker='o', c=y)
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.title("Test precise:" + str(precise))
    plt.show()

Operation results:

SVM classification of moon dataset:

import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
import numpy as np
import matplotlib as mpl
from sklearn.datasets import make_moons
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
# To display Chinese
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False#rc configuration or rc parameters. The default properties can be modified through rc parameters, including form size, points per inch, line width, color, style, coordinate axis, coordinate and network properties, text, font, etc.
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)#Generate moon dataset
def plot_dataset(X, y, axes):#Drawing graphics
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs")
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^")
    plt.axis(axes)
    plt.grid(True, which='both')
    plt.xlabel(r"$x_1$", fontsize=20)
    plt.ylabel(r"$x_2$", fontsize=20, rotation=0)
    plt.title("Moon data",fontsize=20)
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.show()

  result:

Iris dataset SVM algorithm two classification  

# Iris dataset SVM algorithm two classification
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, svm
import pandas as pd
from pylab import *
mpl.rcParams['font.sans-serif'] = ['SimHei']
iris = datasets.load_iris()
iris = datasets.load_iris()
X = iris.data                  
y = iris.target                
X = X[y != 0, :2]             # Select the first two properties of X
y = y[y != 0]
n_sample = len(X)              
np.random.seed(0)
order = np.random.permutation(n_sample)  # Permutation
X = X[order]
y = y[order].astype(np.float)
X_train = X[:int(.9 * n_sample)]
y_train = y[:int(.9 * n_sample)]
X_test = X[int(.9 * n_sample):]
y_test = y[int(.9 * n_sample):]
#Appropriate model
for fig_num, kernel in enumerate(('linear', 'rbf','poly')):  # Radial Basis Function (RBF) is commonly used as Gaussian basis function
    clf = svm.SVC(kernel=kernel, gamma=10)   # gamma is the kernel coefficient of "rbf", "poly" and "sigmoid".
    clf.fit(X_train, y_train)
    plt.figure(str(kernel))
    plt.xlabel('x1')
    plt.ylabel('x2')
    plt.scatter(X[:, 0], X[:, 1], c=y, zorder=10, cmap=plt.cm.Paired, edgecolor='k', s=20)
    # Zorder: the order is arranged in the Z direction. The larger the value, it will be displayed above
    # Paired two color similar outputs (paired)
    # Circle the test data
    plt.scatter(X_test[:, 0], X_test[:, 1], s=80, facecolors='none',zorder=10, edgecolor='k')
    plt.axis('tight')  #Change the x - and y-axis limits so that all data is displayed
    x_min = X[:, 0].min()
    x_max = X[:, 0].max()
    y_min = X[:, 1].min()
    y_max = X[:, 1].max()
    XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]   
    Z = clf.decision_function(np.c_[XX.ravel(), YY.ravel()])  # Distance from sample X to separation hyperplane
    Z = Z.reshape(XX.shape)
    plt.contourf(XX,YY,Z>0,cmap=plt.cm.Paired)  
    plt.contour(XX, YY, Z, colors=['r', 'k', 'b'],
                linestyles=['--', '-', '--'], levels=[-0.5, 0, 0.5])   # Range
    plt.title(kernel)
plt.show()

  result:

  SVM binary classification of lunar dataset

# SVM binary classification of lunar dataset
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
import numpy as np
import matplotlib as mpl
from sklearn.datasets import make_moons
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
# To display Chinese
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)
def plot_dataset(X, y, axes):
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs")
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^")
    plt.axis(axes)
    plt.grid(True, which='both')
    plt.xlabel(r"$x_1$", fontsize=20)
    plt.ylabel(r"$x_2$", fontsize=20, rotation=0)
    plt.title("Moon data",fontsize=20)
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
polynomial_svm_clf = Pipeline([
        # Mapping source data to third-order polynomials
        ("poly_features", PolynomialFeatures(degree=3)),
        # Standardization
        ("scaler", StandardScaler()),
        # SVC linear classifier
        ("svm_clf", LinearSVC(C=10, loss="hinge", random_state=42))
    ])
polynomial_svm_clf.fit(X, y)
def plot_predictions(clf, axes):
    # charge by the meter
    x0s = np.linspace(axes[0], axes[1], 100)
    x1s = np.linspace(axes[2], axes[3], 100)
    x0, x1 = np.meshgrid(x0s, x1s)
    X = np.c_[x0.ravel(), x1.ravel()]
    y_pred = clf.predict(X).reshape(x0.shape)
    y_decision = clf.decision_function(X).reshape(x0.shape)
#     print(y_pred)
#     print(y_decision)  
    plt.contourf(x0, x1, y_pred, cmap=plt.cm.brg, alpha=0.2)
    plt.contourf(x0, x1, y_decision, cmap=plt.cm.brg, alpha=0.1)
plot_predictions(polynomial_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.show()

  give the result as follows

from sklearn.svm import SVC
gamma1, gamma2 = 0.1, 5
C1, C2 = 0.001, 1000
hyperparams = (gamma1, C1), (gamma1, C2)
svm_clfs = []
for gamma, C in hyperparams:
    rbf_kernel_svm_clf = Pipeline([
            ("scaler", StandardScaler()),
            ("svm_clf", SVC(kernel="rbf", gamma=gamma, C=C))
        ])
    rbf_kernel_svm_clf.fit(X, y)
    svm_clfs.append(rbf_kernel_svm_clf)
plt.figure(figsize=(11, 7))
for i, svm_clf in enumerate(svm_clfs):
    plt.subplot(221 + i)
    plot_predictions(svm_clf, [-1.5, 2.5, -1, 1.5])
    plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
    gamma, C = hyperparams[i]
    plt.title(r"$\gamma = {}, C = {}$".format(gamma, C), fontsize=16)
plt.tight_layout()
plt.show()

  result

3, Reference articles

In python, linear LDA, k-means and SVM algorithms are used for binary visual analysis of iris data set and moon data set respectively_ Whitewater blog - CSDN blog

Pychar successfully solved ModuleNotFoundError: No module named 'sklearn.datasets.samples_generator‘_ Escape delay island! Blog - CSDN blog

Tags: Machine Learning AI

Posted on Fri, 05 Nov 2021 20:22:17 -0400 by hasin