The video instructions are as follows:

Complete the Theory and Practice of Machine Learning Algorithms--Logistic Regression

Video address: https://www.bilibili.com/video/av95806420/

How to use Logical Regression for Biclassification is described above: [Machine Learning] Wu Enda Machine Learning Video Job-Logistic Regression Bi-classification 1 .Let's look at logistic regression for complex binary classification operations.

# Bitaxonomy Type 2

The program of this file is based on Wu Enda's machine learning video logistic regression operation, using regularized logistic regression, using ex2data2.txt data.The data background is to predict whether microchips from manufacturing plants pass quality assurance (QA).

## 1 Load data and view data

import numpy as np import pandas as pd import matplotlib.pyplot as plt # Graphics can be displayed without plot in jupyter's magic function %matplotlib inline

# Source data is untitled, one sample per row, the first two columns of each row are results twice, and the last column is whether or not they were accepted. Each column is separated by "," which is similar to csv format data # Read data into pandas and set column names pdData = pd.read_csv("ex2data2.txt", header=None, names=['Test1', 'Test2', 'Admitted']) pdData.head() # View the first five rows of data """ Test1 Test2 Admitted 0 0.051267 0.69956 1 1 -0.092742 0.68494 1 2 -0.213710 0.69225 1 3 -0.375000 0.50219 1 4 -0.513250 0.46564 1 """

pdData.info() # View data information """ <class 'pandas.core.frame.DataFrame'> RangeIndex: 118 entries, 0 to 117 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Test1 118 non-null float64 1 Test2 118 non-null float64 2 Admitted 118 non-null int64 dtypes: float64(2), int64(1) memory usage: 2.9 KB """

Data Visualization

positive = pdData[pdData['Admitted'] == 1] # Pick out the data you have accepted negative = pdData[pdData['Admitted'] == 0] # Pick out data that was not accepted fig, ax = plt.subplots(figsize=(5, 5)) # Getting Drawing Objects # Draw scatterplot of the data according to the two test results ax.scatter(positive['Test1'], positive['Test2'], s=30, c='b', marker='o', label='Admitted') # Scatter charts based on two exam results for data not accepted ax.scatter(negative['Test1'], negative['Test2'], s=30, c='r', marker='x', label='Not Admitted') # Add Legend ax.legend() # Set the name of the x,y axis ax.set_xlabel('Microchip Test 1') ax.set_ylabel('Microchip Test 2')

##2 Feature Mapping From the above data, we can see that it is not possible to divide it directly by a straight line. We use a method similar to polynomial linear regression in linear regression to create more features (feature mapping) from the existing data, where the polynomial is six times, and the last feature data is (1+2+...+7) 28.

def featureMapping(x1, x2, power): # x1:Feature 1 # x2:Feature 2 data = {"f_{}{}".format(i - j, j): np.power(x1, i - j)*np.power(x2, j) for i in range(power + 1) for j in range(i + 1) } return pd.DataFrame(data)

x1 = pdData['Test1'] x2 = pdData['Test2'] data = featureMapping(x1, x2, 6) print(data.shape) """ (118, 28) """ data.head(2)

data.describe() # View data description

## 3 Model Functions

Complementary to the model function, the linear function corresponding to the feature is as follows:

Expected node of symbol group type, but got node of type ordgroup

Logistic regression model functions are as follows:

hθ(x)=g(θTx)h_\theta(x) = g(\theta^T x)hθ(x)=g(θTx)

Where the g(z) expression is as follows:

g(z)=11+e−z
g(z) = \frac{1}{1+e^{-z}}
g(z)=1+e−z1

Using the model, we predict:

When h theta(x) > 0.5h_\theta(x) ge 0.5h theta(x) > 0.5 h theta(x) > 0.5, the predicted y=1y=1y=1

When h theta(x)<0.5h_\theta(x)<0.5h theta(x)<0.5h theta(x)<0.5, the predicted y=0y=0y=0

# Define a sigmoid function def sigmoid(z): # z can be a number or a matrix in np return 1.0/(1+np.exp(-z)) def model(X, theta): # Single data sample dimension One row vector n features, theta is a row vector of length n # When X is data in m rows and n columns, returns data in m rows and 1 columns theta = theta.reshape(1, -1) # Convert to Matrix Compliance Calculation return sigmoid(X.dot(theta.T))

## 4 Regularized loss function

J(θ)=1m∑i=1m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]+λ2m∑j=1nθj2J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)-\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]+2mλj=1∑nθj2

Features: There is one more lamb2m_j=1nthe j2\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}2m lambda_j=1n thej2 than ordinary loss function. Note: at this point J can be improved from the previous loss function

# loss function def cost(theta, X, y, l=1): # X is the feature dataset A single feature data is a row vector shape to be mxn # y is the label dataset as a column vector shape to be mx1 # theta is a row vector shape to be 1xn # l Regularization parameter, set to 1 by default # Return a scalar left = -y*np.log(model(X, theta)) right = (1-y)*np.log(1 - model(X, theta)) return np.sum(left - right)/len(X) + l*np.power(theta[1:], 2).sum()/(2*X.shape[0])

X = np.array(data) # There is already a constant term y = np.array(pdData['Admitted']).reshape(-1, 1) theta = np.zeros(X.shape[1]) X.shape, y.shape, theta.shape """ ((118, 28), (118, 1), (28,)) """

cost(theta, X, y) # Calculate initial loss """ 0.6931471805599454 """

## 5 Calculate Gradient

The regularized gradient formula is divided into two parts:

∂J(θ)∂θ0=1m∑i=1m(hθ(x(i))−y(i))xj(i) for j=1\frac{\partial J(\theta)}{\partial \theta_{0}}=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \quad \text { for } j = 1∂θ0∂J(θ)=m1i=1∑m(hθ(x(i))−y(i))xj(i) for j=1

∂J(θ)∂θj=(1m∑i=1m(hθ(x(i))−y(i))xj(i))+λmθj for j≥1\frac{\partial J(\theta)}{\partial \theta_{j}}=\left(\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\right)+\frac{\lambda}{m} \theta_{j} \quad \text { for } j \geq 1∂θj∂J(θ)=(m1i=1∑m(hθ(x(i))−y(i))xj(i))+mλθj for j≥1

# Gradient of parameters def gradient(theta, X, y, l=1): # X is the feature dataset mxn # y is label dataset mx1 # l Regularization parameter, default 1 # theta is a row vector 1xn grad = ((model(X, theta) - y).T@X).flatten()/ len(X) grad[1:] = grad[1:] + l*theta[1:]/X.shape[0] # Add a regular term to the non-constant parameter # Return Row Vector return grad

gradient(theta, X, y) # View gradient values under current parameters """ array([8.47457627e-03, 1.87880932e-02, 7.77711864e-05, 5.03446395e-02, 1.15013308e-02, 3.76648474e-02, 1.83559872e-02, 7.32393391e-03, 8.19244468e-03, 2.34764889e-02, 3.93486234e-02, 2.23923907e-03, 1.28600503e-02, 3.09593720e-03, 3.93028171e-02, 1.99707467e-02, 4.32983232e-03, 3.38643902e-03, 5.83822078e-03, 4.47629067e-03, 3.10079849e-02, 3.10312442e-02, 1.09740238e-03, 6.31570797e-03, 4.08503006e-04, 7.26504316e-03, 1.37646175e-03, 3.87936363e-02]) """

## 7 Use optimization function to calculate parameters

optimize module using scipy

import scipy.optimize as opt res = opt.minimize(fun=cost, x0=theta, args=(X,y,1), method='Newton-CG', jac=gradient) res

thete_pre = res.x # Get parameter values

## 8 Calculate accuracy using training parameters

# Predict data def predict(X, theta): # X Training Data # theta parameter return [1 if y_pre >= 0.5 else 0 for y_pre in model(X, theta).flatten()]

predictions = predict(X, thete_pre) correct = [1 if ((a == 1 and b ==1) or (a ==0 and b == 0)) else 0 for (a, b) in zip(predictions, y)] accuracy = 100*(sum(map(int, correct))/len(correct)) # Calculate Correctness Rate # View Correctness Rate print('accuracy = {0}%'.format(accuracy)) """ accuracy = 83.05084745762711% """

## 9 Draw decision boundary

It can also be thought of as encapsulating the entire model

It can be understood that within the data range, many points are found between the Test1 axis [-1, 1.5] and Tes2 [-1, 1.5]. When the points are very close to the curve theta T x=0\theta^T x=0theta T x=0, that is, when a threshold threshold threshold threshold is reached, the decision boundary is drawn and the decision boundary is obtained.

# Getting Decision Boundary Data def getDecisionData(data, density, power, thresh, l): # Density sample density, the larger, the thicker the line t1 = np.linspace(-1, 1.5, density) # There are density sample points between -1 and 1.5 t2 = np.linspace(-1, 1.5, density) x_or = np.array(featureMapping(data['Test1'], data['Test2'], power)) y_or = np.array(data['Admitted']).reshape(-1, 1) theta = np.zeros(x_or.shape[1]) theta = opt.minimize(fun=cost, x0=theta, args=(x_or,y_or, l), method='Newton-CG', jac=gradient).x cordinate = [(x, y) for x in t1 for y in t2] # Build coordinates x_cord, y_cord = zip(*cordinate) # Create an iterator, *for unpacking mapped_cord = featureMapping(x_cord, y_cord, power) Y = np.array(mapped_cord) @ theta res_data = mapped_cord[np.abs(Y) < thresh] # The result of the calculation - 0 < thresh is near the specified threshold of the curve # Print prediction results predictions = predict(x_or, theta) correct = [1 if ((a == 1 and b ==1) or (a ==0 and b == 0)) else 0 for (a, b) in zip(predictions, y_or)] accuracy = 100*(sum(map(int, correct))/len(correct)) # Calculate Correctness Rate # View Correctness Rate print('The regularization parameters are:{}, Model accuracy is: accuracy = {}%'.format(l,accuracy)) return res_data.f_10, res_data.f_01 # Get the corresponding x,y

def drawDecisionBound(pdData, density, thresh, l): x_cor, y_cor = getDecisionData(pdData, density, 6, thresh, l) # change positive = pdData[pdData['Admitted'] == 1] # Pick out the data you have accepted negative = pdData[pdData['Admitted'] == 0] # Pick out data that was not accepted fig, ax = plt.subplots(figsize=(5, 5)) # Getting Drawing Objects # Draw scatterplot of the data according to the two test results ax.scatter(positive['Test1'], positive['Test2'], s=30, c='b', marker='o', label='Admitted') # Scatter charts based on two exam results for data not accepted ax.scatter(negative['Test1'], negative['Test2'], s=30, c='r', marker='x', label='Not Admitted') # Add Legend ax.legend() # Set the name of the x,y axis ax.set_xlabel('Microchip Test 1') ax.set_ylabel('Microchip Test 2') ax.scatter(x_cor, y_cor, label="fitted curve", s=1) # s Set point size plt.title("origin data vs fitted curve")

Test 1 Data density 1000, point-to-decision boundary distance threshold 0.05, regular item 1

drawDecisionBound(pdData, 1000, 0.05, 1) """ The regularization parameters are: 1, and the model accuracy is: accuracy = 83.05084745762711% """

Test 2 Data density 1000, point-to-decision boundary distance threshold 0.01, regular term 0.8

drawDecisionBound(pdData, 1000, 0.01, 0.8) """ The regularization parameter is 0.8, and the model accuracy is 83.05084745762711% """

The correct rate does not change significantly when the regular item does not change much (small dataset case)

Test 3 Data density 1000, point-to-decision boundary distance threshold 0.01, regular term 0.005

drawDecisionBound(pdData, 1000, 0.01, 0.005) """ The regularization parameter is 0.005, and the model accuracy is 83.89830508474576% """

When the regularization is relatively small and the solution is not treated with regularization, the model will be easier to fit and the test will be more correct.

Test 4 Data density 1000, point-to-decision boundary distance threshold 0.01, regular term 0

drawDecisionBound(pdData, 1000, 0.01, 0.) """ The regularization parameter is 0.0, and the model accuracy is 87.28813559322035% """

The regular term is zero and the model is over-fitted.

Test 5 Data density 1000, point-to-decision boundary distance threshold 0.05, regular item 1

drawDecisionBound(pdData, 1000, 0.01, 10.) """ The regularization parameter is 10.0, and the model accuracy is 74.57627118644068% """

When the regular term is too large, it will also cause under-fitting, the bottom of the correct rate.

## summary

The above and this paper use the logistic regression function to classify the data. In practice, there are many or more classifications. The following will describe how to use the logistic regression to classify the data.Program source code can be obtained by replying to Machine Learning in subscription number AIAS Programming Channel.