# [Machine Learning] Wu Enda Machine Learning Video Job-Logistic Regression Classification II

The video instructions are as follows:

Complete the Theory and Practice of Machine Learning Algorithms--Logistic Regression

How to use Logical Regression for Biclassification is described above: [Machine Learning] Wu Enda Machine Learning Video Job-Logistic Regression Bi-classification 1 .Let's look at logistic regression for complex binary classification operations.

# Bitaxonomy Type 2

The program of this file is based on Wu Enda's machine learning video logistic regression operation, using regularized logistic regression, using ex2data2.txt data.The data background is to predict whether microchips from manufacturing plants pass quality assurance (QA).

## 1 Load data and view data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Graphics can be displayed without plot in jupyter's magic function
%matplotlib inline

# Source data is untitled, one sample per row, the first two columns of each row are results twice, and the last column is whether or not they were accepted. Each column is separated by "," which is similar to csv format data
# Read data into pandas and set column names
pdData.head()            # View the first five rows of data
"""
0	0.051267	0.69956	1
1	-0.092742	0.68494	1
2	-0.213710	0.69225	1
3	-0.375000	0.50219	1
4	-0.513250	0.46564	1
"""

pdData.info()    # View data information
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118 entries, 0 to 117
Data columns (total 3 columns):
#   Column    Non-Null Count  Dtype
---  ------    --------------  -----
0   Test1     118 non-null    float64
1   Test2     118 non-null    float64
2   Admitted  118 non-null    int64
dtypes: float64(2), int64(1)
memory usage: 2.9 KB
"""


Data Visualization

positive = pdData[pdData['Admitted'] == 1]   # Pick out the data you have accepted
negative = pdData[pdData['Admitted'] == 0]   # Pick out data that was not accepted
fig, ax = plt.subplots(figsize=(5, 5))      # Getting Drawing Objects
# Draw scatterplot of the data according to the two test results
ax.scatter(positive['Test1'], positive['Test2'], s=30, c='b', marker='o', label='Admitted')
# Scatter charts based on two exam results for data not accepted
ax.scatter(negative['Test1'], negative['Test2'], s=30, c='r', marker='x', label='Not Admitted')
ax.legend()
# Set the name of the x,y axis
ax.set_xlabel('Microchip Test 1')
ax.set_ylabel('Microchip Test 2')


##2 Feature Mapping
From the above data, we can see that it is not possible to divide it directly by a straight line. We use a method similar to polynomial linear regression in linear regression to create more features (feature mapping) from the existing data, where the polynomial is six times, and the last feature data is (1+2+...+7) 28.

def featureMapping(x1, x2, power):
# x1:Feature 1
# x2:Feature 2
data = {"f_{}{}".format(i - j, j):
np.power(x1, i - j)*np.power(x2, j)
for i in range(power + 1) for j in range(i + 1)
}
return pd.DataFrame(data)

x1 = pdData['Test1']
x2 = pdData['Test2']
data = featureMapping(x1, x2, 6)
print(data.shape)
"""
(118, 28)
"""


data.describe()  # View data description


## 3 Model Functions

Complementary to the model function, the linear function corresponding to the feature is as follows:
Expected node of symbol group type, but got node of type ordgroup
Logistic regression model functions are as follows:
hθ(x)=g(θTx)h_\theta(x) = g(\theta^T x)hθ​(x)=g(θTx)
Where the g(z) expression is as follows:
g(z)=11+e−z g(z) = \frac{1}{1+e^{-z}} g(z)=1+e−z1​
Using the model, we predict:

When h theta(x) > 0.5h_\theta(x) ge 0.5h theta(x) > 0.5 h theta(x) > 0.5, the predicted y=1y=1y=1

When h theta(x)<0.5h_\theta(x)<0.5h theta(x)<0.5h theta(x)<0.5, the predicted y=0y=0y=0

# Define a sigmoid function
def sigmoid(z):
# z can be a number or a matrix in np
return 1.0/(1+np.exp(-z))
def model(X, theta):
# Single data sample dimension One row vector n features, theta is a row vector of length n
# When X is data in m rows and n columns, returns data in m rows and 1 columns
theta = theta.reshape(1, -1)  # Convert to Matrix Compliance Calculation
return sigmoid(X.dot(theta.T))


## 4 Regularized loss function

J(θ)=1m∑i=1m[−y(i)log⁡(hθ(x(i)))−(1−y(i))log⁡(1−hθ(x(i)))]+λ2m∑j=1nθj2J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)-\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}J(θ)=m1​i=1∑m​[−y(i)log(hθ​(x(i)))−(1−y(i))log(1−hθ​(x(i)))]+2mλ​j=1∑n​θj2​
Features: There is one more lamb2m_j=1nthe j2\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}2m lambda_j=1n thej2 than ordinary loss function. Note: at this point J can be improved from the previous loss function

# loss function
def cost(theta, X, y, l=1):
# X is the feature dataset A single feature data is a row vector shape to be mxn
# y is the label dataset as a column vector shape to be mx1
# theta is a row vector shape to be 1xn
# l Regularization parameter, set to 1 by default
# Return a scalar
left = -y*np.log(model(X, theta))
right = (1-y)*np.log(1 - model(X, theta))
return np.sum(left - right)/len(X) + l*np.power(theta[1:], 2).sum()/(2*X.shape[0])

X = np.array(data)               # There is already a constant term
y = np.array(pdData['Admitted']).reshape(-1, 1)
theta = np.zeros(X.shape[1])
X.shape, y.shape, theta.shape
"""
((118, 28), (118, 1), (28,))
"""

cost(theta, X, y)  #  Calculate initial loss
"""
0.6931471805599454
"""


## 5 Calculate Gradient

The regularized gradient formula is divided into two parts:
∂J(θ)∂θ0=1m∑i=1m(hθ(x(i))−y(i))xj(i) for j=1\frac{\partial J(\theta)}{\partial \theta_{0}}=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \quad \text { for } j = 1∂θ0​∂J(θ)​=m1​i=1∑m​(hθ​(x(i))−y(i))xj(i)​ for j=1

∂J(θ)∂θj=(1m∑i=1m(hθ(x(i))−y(i))xj(i))+λmθj for j≥1\frac{\partial J(\theta)}{\partial \theta_{j}}=\left(\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\right)+\frac{\lambda}{m} \theta_{j} \quad \text { for } j \geq 1∂θj​∂J(θ)​=(m1​i=1∑m​(hθ​(x(i))−y(i))xj(i)​)+mλ​θj​ for j≥1

# Gradient of parameters
def gradient(theta, X, y, l=1):
# X is the feature dataset mxn
# y is label dataset mx1
# l Regularization parameter, default 1
# theta is a row vector 1xn
grad = ((model(X, theta) - y).T@X).flatten()/ len(X)
grad[1:] = grad[1:] + l*theta[1:]/X.shape[0]   # Add a regular term to the non-constant parameter
# Return Row Vector

gradient(theta, X, y)   # View gradient values under current parameters
"""
array([8.47457627e-03, 1.87880932e-02, 7.77711864e-05, 5.03446395e-02,
1.15013308e-02, 3.76648474e-02, 1.83559872e-02, 7.32393391e-03,
8.19244468e-03, 2.34764889e-02, 3.93486234e-02, 2.23923907e-03,
1.28600503e-02, 3.09593720e-03, 3.93028171e-02, 1.99707467e-02,
4.32983232e-03, 3.38643902e-03, 5.83822078e-03, 4.47629067e-03,
3.10079849e-02, 3.10312442e-02, 1.09740238e-03, 6.31570797e-03,
4.08503006e-04, 7.26504316e-03, 1.37646175e-03, 3.87936363e-02])
"""


## 7 Use optimization function to calculate parameters

optimize module using scipy

import scipy.optimize as opt
res = opt.minimize(fun=cost, x0=theta, args=(X,y,1), method='Newton-CG', jac=gradient)
res



thete_pre = res.x   # Get parameter values


## 8 Calculate accuracy using training parameters

# Predict data
def predict(X, theta):
# X Training Data
# theta parameter
return [1 if y_pre >= 0.5 else 0 for y_pre in model(X, theta).flatten()]

predictions = predict(X, thete_pre)
correct = [1 if ((a == 1 and b ==1) or (a ==0 and b == 0)) else 0 for (a, b) in zip(predictions, y)]
accuracy = 100*(sum(map(int, correct))/len(correct))       # Calculate Correctness Rate
# View Correctness Rate
print('accuracy = {0}%'.format(accuracy))
"""
accuracy = 83.05084745762711%
"""


## 9 Draw decision boundary

It can also be thought of as encapsulating the entire model
It can be understood that within the data range, many points are found between the Test1 axis [-1, 1.5] and Tes2 [-1, 1.5]. When the points are very close to the curve theta T x=0\theta^T x=0theta T x=0, that is, when a threshold threshold threshold threshold is reached, the decision boundary is drawn and the decision boundary is obtained.

# Getting Decision Boundary Data
def getDecisionData(data, density, power, thresh, l):
# Density sample density, the larger, the thicker the line
t1 = np.linspace(-1, 1.5, density)  # There are density sample points between -1 and 1.5
t2 = np.linspace(-1, 1.5, density)
x_or = np.array(featureMapping(data['Test1'], data['Test2'], power))
y_or = np.array(data['Admitted']).reshape(-1, 1)
theta = np.zeros(x_or.shape[1])
theta = opt.minimize(fun=cost, x0=theta, args=(x_or,y_or, l), method='Newton-CG', jac=gradient).x
cordinate = [(x, y) for x in t1 for y in t2]  # Build coordinates
x_cord, y_cord = zip(*cordinate)  # Create an iterator, *for unpacking
mapped_cord = featureMapping(x_cord, y_cord, power)
Y = np.array(mapped_cord) @ theta
res_data = mapped_cord[np.abs(Y) < thresh]  # The result of the calculation - 0 < thresh is near the specified threshold of the curve

# Print prediction results
predictions = predict(x_or, theta)
correct = [1 if ((a == 1 and b ==1) or (a ==0 and b == 0)) else 0 for (a, b) in zip(predictions, y_or)]
accuracy = 100*(sum(map(int, correct))/len(correct))       # Calculate Correctness Rate
# View Correctness Rate
print('The regularization parameters are:{}, Model accuracy is: accuracy = {}%'.format(l,accuracy))

return res_data.f_10, res_data.f_01   # Get the corresponding x,y

def drawDecisionBound(pdData, density, thresh, l):
x_cor, y_cor = getDecisionData(pdData, density, 6, thresh, l)  # change
positive = pdData[pdData['Admitted'] == 1]   # Pick out the data you have accepted
negative = pdData[pdData['Admitted'] == 0]   # Pick out data that was not accepted
fig, ax = plt.subplots(figsize=(5, 5))      # Getting Drawing Objects
# Draw scatterplot of the data according to the two test results
ax.scatter(positive['Test1'], positive['Test2'], s=30, c='b', marker='o', label='Admitted')
# Scatter charts based on two exam results for data not accepted
ax.scatter(negative['Test1'], negative['Test2'], s=30, c='r', marker='x', label='Not Admitted')
ax.legend()
# Set the name of the x,y axis
ax.set_xlabel('Microchip Test 1')
ax.set_ylabel('Microchip Test 2')
ax.scatter(x_cor, y_cor, label="fitted curve", s=1)  # s Set point size
plt.title("origin data vs fitted curve")


Test 1 Data density 1000, point-to-decision boundary distance threshold 0.05, regular item 1

drawDecisionBound(pdData, 1000, 0.05, 1)
"""
The regularization parameters are: 1, and the model accuracy is: accuracy = 83.05084745762711%
"""


Test 2 Data density 1000, point-to-decision boundary distance threshold 0.01, regular term 0.8

drawDecisionBound(pdData, 1000, 0.01, 0.8)
"""
The regularization parameter is 0.8, and the model accuracy is 83.05084745762711%
"""


The correct rate does not change significantly when the regular item does not change much (small dataset case)
Test 3 Data density 1000, point-to-decision boundary distance threshold 0.01, regular term 0.005

drawDecisionBound(pdData, 1000, 0.01, 0.005)
"""
The regularization parameter is 0.005, and the model accuracy is 83.89830508474576%
"""


When the regularization is relatively small and the solution is not treated with regularization, the model will be easier to fit and the test will be more correct.
Test 4 Data density 1000, point-to-decision boundary distance threshold 0.01, regular term 0

drawDecisionBound(pdData, 1000, 0.01, 0.)
"""
The regularization parameter is 0.0, and the model accuracy is 87.28813559322035%
"""


The regular term is zero and the model is over-fitted.
Test 5 Data density 1000, point-to-decision boundary distance threshold 0.05, regular item 1

drawDecisionBound(pdData, 1000, 0.01, 10.)
"""
The regularization parameter is 10.0, and the model accuracy is 74.57627118644068%
"""


When the regular term is too large, it will also cause under-fitting, the bottom of the correct rate.

## summary

The above and this paper use the logistic regression function to classify the data. In practice, there are many or more classifications. The following will describe how to use the logistic regression to classify the data.Program source code can be obtained by replying to Machine Learning in subscription number AIAS Programming Channel.

70 original articles were published, 236 won acceptance, 420,000 visits+

Posted on Sat, 14 Mar 2020 20:17:16 -0400 by todd-imc