summary
Logistic Regression (LR) is actually a very misleading concept. Although it has the word "regression" in its name, it is best at dealing with classification problems.
LR classifier is applicable to various generalized classification tasks, such as positive and negative emotion analysis of comment information (secondary classification), user click through rate (secondary classification), user default information prediction (secondary classification), spam detection (secondary classification), disease prediction (secondary classification), user level classification (multi classification), etc. What we mainly discuss here is the problem of binary classification.
1, Logistic regression
Both logistic regression and linear regression essentially get a straight line. The difference is that the straight line of linear regression is to fit the distribution of input variable X as much as possible, so as to minimize the distance from all sample points to the straight line in the training set; The straight line of logistic regression is to fit the decision boundary as much as possible, so that the sample points in the training set can be separated as much as possible. Therefore, the purposes of the two are different.
- Sigmoid function
- Understanding Sigmoid functions through code
import numpy as np import math import matplotlib.pyplot as plt %matplotlib inline
X = np.linspace(-5,5,200) y = [1/(1+math.e**(-x)) for x in X] plt.plot(X,y) plt.show()
X = np.linspace(-60,60,200) y = [1/(1+math.e**(-x)) for x in X] plt.plot(X,y) plt.show()
- loss function
- Logistic regression and linear regression
Logistic regression and linear regression are two kinds of models. Logistic regression is a classification model and linear regression is a regression model.
- Reference article:
[ML] relationship and difference between linear regression and logistic regression : This article has a popular and simple understanding of maximum likelihood estimation
On machine learning: linear regression & logical regression : This article simply deduces the transformation from linear regression to logical regression, and introduces the differences between probability and probability
Comprehensively analyze and implement logical regression (Python) : This paper analyzes the logistic regression (LR) model from the perspective of model, learning objectives and optimization algorithm, and realizes LR training and prediction from scratch in Python.
2, Small case
Dataset used: ./data/info.txt [I set 0 points for free download. If not, please let me know! Thank you!]
# Import related packages import numpy as np # Import LogisticRegression method from sklearn from sklearn.linear_model import LogisticRegression # Method for importing and dividing training set and test set from sklearn.model_selection import train_test_split import os
# read file if not os.path.exists('./data/info.txt'): print('File does not exist!') else: data = np.loadtxt('./data/info.txt',delimiter=",")# Delimiter reads the delimiter of the file print(data)
[[2.697e+03 6.254e+03 1.000e+00]
[1.872e+03 2.014e+03 0.000e+00]
[2.312e+03 8.120e+02 0.000e+00]
[1.983e+03 4.990e+03 1.000e+00]
[9.320e+02 3.920e+03 0.000e+00]
[1.321e+03 5.583e+03 1.000e+00]
[2.215e+03 1.560e+03 0.000e+00]
[1.659e+03 2.932e+03 0.000e+00]
[8.650e+02 7.316e+03 1.000e+00]
[1.685e+03 4.763e+03 0.000e+00]
[1.786e+03 2.523e+03 1.000e+00]]
# Dividing training set and test set train_x, test_x, train_y, test_y = train_test_split(data[:, 0:2], data[:, 2], test_size=0.3) # data[:, 0:2] characteristic values of the first two columns # data[:, 2] last column label value # test_size=0.3 proportion of test data column
train_x
array([[1659., 2932.],
[2215., 1560.],
[2312., 812.],
[ 865., 7316.],
[ 932., 3920.],
[2697., 6254.],
[1786., 2523.]])
test_x
array([[1983., 4990.],
[1872., 2014.],
[1321., 5583.],
[1685., 4763.]])
train_y
array([0., 0., 0., 1., 0., 1., 1.])
test_y
array([1., 0., 1., 0.])
# Call the method in sklearn model = LogisticRegression() # The trained model is obtained through the training set model.fit(train_x, train_y) # Use the test data set to test the model and output the results # test model pred_y = model.predict(test_x) # Output to determine whether the predicted value is equal to the real value print(pred_y == test_y) print(model.score(test_x, test_y))
[ True True True False]
0.75
pred_y
array([1., 0., 1., 1.])
test_y
array([1., 0., 1., 0.])
The function of model.score(test_x, test_y) is to pred the data_ Y and test_ Y for comparison
summary
- Loadtext() method of numpy