[Machine Learning] Wu Enda Machine Learning Video Job-Logistic Regression Classification II

The video instructions are as follows:

Complete the Theory and Practice of Machine Learning Algorithms--Logistic Regression

Video address: https://www.bilibili.com/video/av95806420/

How to use Logical Regression for Biclassification is described above: [Machine Learning] Wu Enda Machine Learning Video Job-Logistic Regression Bi-classification 1 .Let's look at logistic regression for complex binary classification operations.

Bitaxonomy Type 2

The program of this file is based on Wu Enda's machine learning video logistic regression operation, using regularized logistic regression, using ex2data2.txt data.The data background is to predict whether microchips from manufacturing plants pass quality assurance (QA).

1 Load data and view data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Graphics can be displayed without plot in jupyter's magic function
%matplotlib inline
# Source data is untitled, one sample per row, the first two columns of each row are results twice, and the last column is whether or not they were accepted. Each column is separated by "," which is similar to csv format data
# Read data into pandas and set column names
pdData = pd.read_csv("ex2data2.txt", header=None, names=['Test1', 'Test2', 'Admitted'])
pdData.head()            # View the first five rows of data
"""
Test1	Test2	Admitted
0	0.051267	0.69956	1
1	-0.092742	0.68494	1
2	-0.213710	0.69225	1
3	-0.375000	0.50219	1
4	-0.513250	0.46564	1
"""
pdData.info()    # View data information
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118 entries, 0 to 117
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Test1     118 non-null    float64
 1   Test2     118 non-null    float64
 2   Admitted  118 non-null    int64  
dtypes: float64(2), int64(1)
memory usage: 2.9 KB
"""

Data Visualization

positive = pdData[pdData['Admitted'] == 1]   # Pick out the data you have accepted
negative = pdData[pdData['Admitted'] == 0]   # Pick out data that was not accepted
fig, ax = plt.subplots(figsize=(5, 5))      # Getting Drawing Objects
# Draw scatterplot of the data according to the two test results
ax.scatter(positive['Test1'], positive['Test2'], s=30, c='b', marker='o', label='Admitted')
# Scatter charts based on two exam results for data not accepted
ax.scatter(negative['Test1'], negative['Test2'], s=30, c='r', marker='x', label='Not Admitted')
# Add Legend
ax.legend()
# Set the name of the x,y axis
ax.set_xlabel('Microchip Test 1')
ax.set_ylabel('Microchip Test 2')

##2 Feature Mapping
 From the above data, we can see that it is not possible to divide it directly by a straight line. We use a method similar to polynomial linear regression in linear regression to create more features (feature mapping) from the existing data, where the polynomial is six times, and the last feature data is (1+2+...+7) 28.
def featureMapping(x1, x2, power):
    # x1:Feature 1
    # x2:Feature 2
    data = {"f_{}{}".format(i - j, j): 
    np.power(x1, i - j)*np.power(x2, j) 
    for i in range(power + 1) for j in range(i + 1)
           }
    return pd.DataFrame(data)
x1 = pdData['Test1']
x2 = pdData['Test2']
data = featureMapping(x1, x2, 6)
print(data.shape)
"""
(118, 28)
"""
data.head(2)

data.describe()  # View data description

3 Model Functions

Complementary to the model function, the linear function corresponding to the feature is as follows:
Expected node of symbol group type, but got node of type ordgroup
Logistic regression model functions are as follows:
hθ(x)=g(θTx)h_\theta(x) = g(\theta^T x)hθ​(x)=g(θTx)
Where the g(z) expression is as follows:
g(z)=11+e−z g(z) = \frac{1}{1+e^{-z}} g(z)=1+e−z1​
Using the model, we predict:

When h theta(x) > 0.5h_\theta(x) ge 0.5h theta(x) > 0.5 h theta(x) > 0.5, the predicted y=1y=1y=1

When h theta(x)<0.5h_\theta(x)<0.5h theta(x)<0.5h theta(x)<0.5, the predicted y=0y=0y=0

# Define a sigmoid function
def sigmoid(z):
    # z can be a number or a matrix in np
    return 1.0/(1+np.exp(-z))
def model(X, theta):
    # Single data sample dimension One row vector n features, theta is a row vector of length n
    # When X is data in m rows and n columns, returns data in m rows and 1 columns
    theta = theta.reshape(1, -1)  # Convert to Matrix Compliance Calculation
    return sigmoid(X.dot(theta.T))

4 Regularized loss function

J(θ)=1m∑i=1m[−y(i)log⁡(hθ(x(i)))−(1−y(i))log⁡(1−hθ(x(i)))]+λ2m∑j=1nθj2J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)-\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}J(θ)=m1​i=1∑m​[−y(i)log(hθ​(x(i)))−(1−y(i))log(1−hθ​(x(i)))]+2mλ​j=1∑n​θj2​
Features: There is one more lamb2m_j=1nthe j2\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}2m lambda_j=1n thej2 than ordinary loss function. Note: at this point J can be improved from the previous loss function

# loss function
def cost(theta, X, y, l=1):
    # X is the feature dataset A single feature data is a row vector shape to be mxn
    # y is the label dataset as a column vector shape to be mx1
    # theta is a row vector shape to be 1xn
    # l Regularization parameter, set to 1 by default
    # Return a scalar
    left = -y*np.log(model(X, theta))
    right = (1-y)*np.log(1 - model(X, theta))
    return np.sum(left - right)/len(X) + l*np.power(theta[1:], 2).sum()/(2*X.shape[0])
X = np.array(data)               # There is already a constant term
y = np.array(pdData['Admitted']).reshape(-1, 1)
theta = np.zeros(X.shape[1])
X.shape, y.shape, theta.shape
"""
((118, 28), (118, 1), (28,))
"""
cost(theta, X, y)  #  Calculate initial loss
"""
0.6931471805599454
"""

5 Calculate Gradient

The regularized gradient formula is divided into two parts:
∂J(θ)∂θ0=1m∑i=1m(hθ(x(i))−y(i))xj(i) for j=1\frac{\partial J(\theta)}{\partial \theta_{0}}=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \quad \text { for } j = 1∂θ0​∂J(θ)​=m1​i=1∑m​(hθ​(x(i))−y(i))xj(i)​ for j=1

∂J(θ)∂θj=(1m∑i=1m(hθ(x(i))−y(i))xj(i))+λmθj for j≥1\frac{\partial J(\theta)}{\partial \theta_{j}}=\left(\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\right)+\frac{\lambda}{m} \theta_{j} \quad \text { for } j \geq 1∂θj​∂J(θ)​=(m1​i=1∑m​(hθ​(x(i))−y(i))xj(i)​)+mλ​θj​ for j≥1

# Gradient of parameters
def gradient(theta, X, y, l=1):
    # X is the feature dataset mxn
    # y is label dataset mx1
    # l Regularization parameter, default 1
    # theta is a row vector 1xn
    grad = ((model(X, theta) - y).T@X).flatten()/ len(X) 
    grad[1:] = grad[1:] + l*theta[1:]/X.shape[0]   # Add a regular term to the non-constant parameter
    # Return Row Vector
    return grad
gradient(theta, X, y)   # View gradient values under current parameters
"""
array([8.47457627e-03, 1.87880932e-02, 7.77711864e-05, 5.03446395e-02,
       1.15013308e-02, 3.76648474e-02, 1.83559872e-02, 7.32393391e-03,
       8.19244468e-03, 2.34764889e-02, 3.93486234e-02, 2.23923907e-03,
       1.28600503e-02, 3.09593720e-03, 3.93028171e-02, 1.99707467e-02,
       4.32983232e-03, 3.38643902e-03, 5.83822078e-03, 4.47629067e-03,
       3.10079849e-02, 3.10312442e-02, 1.09740238e-03, 6.31570797e-03,
       4.08503006e-04, 7.26504316e-03, 1.37646175e-03, 3.87936363e-02])
"""

7 Use optimization function to calculate parameters

optimize module using scipy

import scipy.optimize as opt
res = opt.minimize(fun=cost, x0=theta, args=(X,y,1), method='Newton-CG', jac=gradient)
res

thete_pre = res.x   # Get parameter values

8 Calculate accuracy using training parameters

# Predict data
def predict(X, theta):
    # X Training Data
    # theta parameter
    return [1 if y_pre >= 0.5 else 0 for y_pre in model(X, theta).flatten()]
predictions = predict(X, thete_pre)
correct = [1 if ((a == 1 and b ==1) or (a ==0 and b == 0)) else 0 for (a, b) in zip(predictions, y)]
accuracy = 100*(sum(map(int, correct))/len(correct))       # Calculate Correctness Rate
# View Correctness Rate
print('accuracy = {0}%'.format(accuracy))
"""
accuracy = 83.05084745762711%
"""

9 Draw decision boundary

It can also be thought of as encapsulating the entire model
It can be understood that within the data range, many points are found between the Test1 axis [-1, 1.5] and Tes2 [-1, 1.5]. When the points are very close to the curve theta T x=0\theta^T x=0theta T x=0, that is, when a threshold threshold threshold threshold is reached, the decision boundary is drawn and the decision boundary is obtained.

# Getting Decision Boundary Data
def getDecisionData(data, density, power, thresh, l):
    # Density sample density, the larger, the thicker the line
    t1 = np.linspace(-1, 1.5, density)  # There are density sample points between -1 and 1.5
    t2 = np.linspace(-1, 1.5, density) 
    x_or = np.array(featureMapping(data['Test1'], data['Test2'], power))
    y_or = np.array(data['Admitted']).reshape(-1, 1)
    theta = np.zeros(x_or.shape[1])
    theta = opt.minimize(fun=cost, x0=theta, args=(x_or,y_or, l), method='Newton-CG', jac=gradient).x
    cordinate = [(x, y) for x in t1 for y in t2]  # Build coordinates
    x_cord, y_cord = zip(*cordinate)  # Create an iterator, *for unpacking
    mapped_cord = featureMapping(x_cord, y_cord, power)
    Y = np.array(mapped_cord) @ theta 
    res_data = mapped_cord[np.abs(Y) < thresh]  # The result of the calculation - 0 < thresh is near the specified threshold of the curve
    
    # Print prediction results
    predictions = predict(x_or, theta)
    correct = [1 if ((a == 1 and b ==1) or (a ==0 and b == 0)) else 0 for (a, b) in zip(predictions, y_or)]
    accuracy = 100*(sum(map(int, correct))/len(correct))       # Calculate Correctness Rate
    # View Correctness Rate
    print('The regularization parameters are:{}, Model accuracy is: accuracy = {}%'.format(l,accuracy))
    
    return res_data.f_10, res_data.f_01   # Get the corresponding x,y
def drawDecisionBound(pdData, density, thresh, l):
    x_cor, y_cor = getDecisionData(pdData, density, 6, thresh, l)  # change
    positive = pdData[pdData['Admitted'] == 1]   # Pick out the data you have accepted
    negative = pdData[pdData['Admitted'] == 0]   # Pick out data that was not accepted
    fig, ax = plt.subplots(figsize=(5, 5))      # Getting Drawing Objects
    # Draw scatterplot of the data according to the two test results
    ax.scatter(positive['Test1'], positive['Test2'], s=30, c='b', marker='o', label='Admitted')
    # Scatter charts based on two exam results for data not accepted
    ax.scatter(negative['Test1'], negative['Test2'], s=30, c='r', marker='x', label='Not Admitted')
    # Add Legend
    ax.legend()
    # Set the name of the x,y axis
    ax.set_xlabel('Microchip Test 1')
    ax.set_ylabel('Microchip Test 2')
    ax.scatter(x_cor, y_cor, label="fitted curve", s=1)  # s Set point size
    plt.title("origin data vs fitted curve")

Test 1 Data density 1000, point-to-decision boundary distance threshold 0.05, regular item 1

drawDecisionBound(pdData, 1000, 0.05, 1)
"""
The regularization parameters are: 1, and the model accuracy is: accuracy = 83.05084745762711%
"""

Test 2 Data density 1000, point-to-decision boundary distance threshold 0.01, regular term 0.8

drawDecisionBound(pdData, 1000, 0.01, 0.8)
"""
The regularization parameter is 0.8, and the model accuracy is 83.05084745762711%
"""


The correct rate does not change significantly when the regular item does not change much (small dataset case)
Test 3 Data density 1000, point-to-decision boundary distance threshold 0.01, regular term 0.005

drawDecisionBound(pdData, 1000, 0.01, 0.005)
"""
The regularization parameter is 0.005, and the model accuracy is 83.89830508474576%
"""


When the regularization is relatively small and the solution is not treated with regularization, the model will be easier to fit and the test will be more correct.
Test 4 Data density 1000, point-to-decision boundary distance threshold 0.01, regular term 0

drawDecisionBound(pdData, 1000, 0.01, 0.)
"""
The regularization parameter is 0.0, and the model accuracy is 87.28813559322035%
"""


The regular term is zero and the model is over-fitted.
Test 5 Data density 1000, point-to-decision boundary distance threshold 0.05, regular item 1

drawDecisionBound(pdData, 1000, 0.01, 10.)
"""
The regularization parameter is 10.0, and the model accuracy is 74.57627118644068%
"""


When the regular term is too large, it will also cause under-fitting, the bottom of the correct rate.

summary

The above and this paper use the logistic regression function to classify the data. In practice, there are many or more classifications. The following will describe how to use the logistic regression to classify the data.Program source code can be obtained by replying to Machine Learning in subscription number AIAS Programming Channel.

70 original articles were published, 236 won acceptance, 420,000 visits+
His message board follow

Tags: Lambda jupyter Programming

Posted on Sat, 14 Mar 2020 20:17:16 -0400 by todd-imc