Machine learning -- the principle and basic implementation of logical regression

summary

Logistic Regression (LR) is actually a very misleading concept. Although it has the word "regression" in its name, it is best at dealing with classification problems.
LR classifier is applicable to various generalized classification tasks, such as positive and negative emotion analysis of comment information (secondary classification), user click through rate (secondary classification), user default information prediction (secondary classification), spam detection (secondary classification), disease prediction (secondary classification), user level classification (multi classification), etc. What we mainly discuss here is the problem of binary classification.

1, Logistic regression

Both logistic regression and linear regression essentially get a straight line. The difference is that the straight line of linear regression is to fit the distribution of input variable X as much as possible, so as to minimize the distance from all sample points to the straight line in the training set; The straight line of logistic regression is to fit the decision boundary as much as possible, so that the sample points in the training set can be separated as much as possible. Therefore, the purposes of the two are different.

  • Sigmoid function
  • Understanding Sigmoid functions through code
import numpy as np
import math
import matplotlib.pyplot as plt
%matplotlib inline
X = np.linspace(-5,5,200)
y = [1/(1+math.e**(-x)) for x in X]

plt.plot(X,y)
plt.show()

X = np.linspace(-60,60,200)
y = [1/(1+math.e**(-x)) for x in X]

plt.plot(X,y)
plt.show()

  • loss function



  • Logistic regression and linear regression
    Logistic regression and linear regression are two kinds of models. Logistic regression is a classification model and linear regression is a regression model.

2, Small case

Dataset used: ./data/info.txt [I set 0 points for free download. If not, please let me know! Thank you!]

# Import related packages
import numpy as np
# Import LogisticRegression method from sklearn
from sklearn.linear_model import LogisticRegression
# Method for importing and dividing training set and test set
from sklearn.model_selection import train_test_split
import os
# read file
if not os.path.exists('./data/info.txt'):
    print('File does not exist!')
else:
    data = np.loadtxt('./data/info.txt',delimiter=",")# Delimiter reads the delimiter of the file
    print(data)

[[2.697e+03 6.254e+03 1.000e+00]
[1.872e+03 2.014e+03 0.000e+00]
[2.312e+03 8.120e+02 0.000e+00]
[1.983e+03 4.990e+03 1.000e+00]
[9.320e+02 3.920e+03 0.000e+00]
[1.321e+03 5.583e+03 1.000e+00]
[2.215e+03 1.560e+03 0.000e+00]
[1.659e+03 2.932e+03 0.000e+00]
[8.650e+02 7.316e+03 1.000e+00]
[1.685e+03 4.763e+03 0.000e+00]
[1.786e+03 2.523e+03 1.000e+00]]

# Dividing training set and test set
train_x, test_x, train_y, test_y = train_test_split(data[:, 0:2], data[:, 2], test_size=0.3)
# data[:, 0:2] characteristic values of the first two columns
# data[:, 2] last column label value
# test_size=0.3 proportion of test data column
train_x

array([[1659., 2932.],
[2215., 1560.],
[2312., 812.],
[ 865., 7316.],
[ 932., 3920.],
[2697., 6254.],
[1786., 2523.]])

test_x

array([[1983., 4990.],
[1872., 2014.],
[1321., 5583.],
[1685., 4763.]])

train_y

array([0., 0., 0., 1., 0., 1., 1.])

test_y

array([1., 0., 1., 0.])

# Call the method in sklearn 
model = LogisticRegression()
# The trained model is obtained through the training set
model.fit(train_x, train_y)
# Use the test data set to test the model and output the results
# test model 
pred_y = model.predict(test_x)
# Output to determine whether the predicted value is equal to the real value
print(pred_y == test_y)
print(model.score(test_x, test_y))

[ True True True False]
0.75

pred_y

array([1., 0., 1., 1.])

test_y

array([1., 0., 1., 0.])

The function of model.score(test_x, test_y) is to pred the data_ Y and test_ Y for comparison

summary

  • Loadtext() method of numpy

Tags: Python Machine Learning logistic regressive

Posted on Wed, 24 Nov 2021 15:48:04 -0500 by Yves