Compared with the k-nearest neighbor algorithm and decision tree algorithm, the logistic regression algorithm is really the basic machine learning algorithm. Even if it is the present in-depth learning, it can also be used as the content of the logistic regression algorithm. The logistic regression algorithm is largely related to linear regression and logarithmic linear regression. This experiment introduces the principle of the logistic regression algorithm and its effect as a classification algorithm.
1.logistic Regression Algorithms
1.1. Linear Regression
Suppose you have such a bunch of datasets
Our goal is to find
For example, if we want to find a linear regression function for house prices, considering that the function is only related to the area of the house,
The core goal is to find this w and b so that the estimated f(x) corresponds to the corresponding y. For us, the smaller the difference between f(x) and y, the better, so we use least squares to make our judgment
When we were in high school, we already learned the formula of least squares. Here we list the complete solution process. We have solved the problem of finding more than one w. We derive W and b and make them equal to 0 to get the formula.
This is the solution of W and b when there is only one w. So what does w mean? W affects the slope of the linear regression function, which means that w can be interpreted as an attribute of an instance, and multiple w can be interpreted as attributes of an instance, which together affect the value of an instance
. Similarly, for multiple w, we can derive them individually and make them equal to 0. Solving this system of multivariate linear regression equations will find the linear regression equation we want.
The graph shows the relationship between the regression function and x in the case of two w.
1.2. Logarithmic linear regression
Although linear regression equations look good, in fact, there are few phenomena in life that can be described by linear regression functions alone. At this point, we need to use the logarithmic linear regression function, which enables us to better predict the nonlinear complex function while preserving the characteristics of the linear regression function.
Linear Regression Model:
It can be extended to:
In fact, this function can be nested continuously to get a complex nesting function. Let's take a simple example
This is the most basic logarithmic linear regression on which logistic is based to improve and improve.
logistic regression is generally used to solve classification problems. Although it does not limit the scale of the problems it solves, generally speaking, the problem that is most suitable for him is the two-class problem. For linear functions, the binary classification problem is best divided into step functions of y=0 and y=1.
Then the problem is obvious, it is not continuous and indistinguishable. How can we tell what kind of predictive target he belongs to when he is at the boundary of the range?
The logistic function (sigmoid) function solves this problem
Obviously, when the value range of x is expanded, the value of y is very close to 0 or 1, which is why it is suitable for solving the problem of binary classification.
So for w and b, we can use the maximum likelihood estimate, assuming we have a dataset
We want to find
The probability that x corresponds to y 1
The probability that x corresponds to y being zero
Then there are
x is the value of y, the probability of y=1 and the probability of y=0 multiplied by the sum of the actual values
The probability of y=1 and the probability of y=0 are derived from the logarithmic probability
Finally we can get it
According to the sigmoid function, the likelihood function can be overridden as:
The final solution is:
It's clear that this function can't be solved mathematically, so we can only adjust the parameters continuously by Newton's method so that the values we solve approximate our ideal values.
Update Formula for Newton's t+1 Round Iterative Solution
2. Code analysis of logistic regression algorithm
from numpy import * #sigmoid function def sigmoid(inX): return 1.0/(1.0+exp(-inX)) #Calculate 0,1 probability def classifyVector(inX, weights): prob = sigmoid(sum(inX*weights)) if prob > 0.5: return 1.0 else: return 0.0 #Iterative calculation of weights def stocGradAscent1(dataMatrix, classLabels, numIter=150): m,n = shape(dataMatrix) weights = ones(n) for j in range(numIter): dataIndex = list(range(m)) for i in range(m): alpha = 4/(1.0+j+i)+0.0001 randIndex = int(random.uniform(0,len(dataIndex))) h = sigmoid(sum(dataMatrix[randIndex]*weights)) error = classLabels[randIndex] - h weights = weights + alpha * error * dataMatrix[randIndex] del(dataIndex[randIndex]) return weights #Training Return Error Rate def colicTest(): #training set frTrain = open('D:/vscode/python/.vscode/test_data.txt') #Test Set frTest = open('D:/vscode/python/.vscode/train_data.txt') trainingSet =  trainingLabels =  #training set for line in frTrain.readlines(): currLine = line.strip().split('\t') lineArr = for i in range(21): lineArr.append(float(currLine[i])) trainingSet.append(lineArr) trainingLabels.append(float(currLine)) trainWeights = stocGradAscent1(array(trainingSet), trainingLabels, 500) errorCount = 0 numTestVec = 0.0 #Test Set for line in frTest.readlines(): numTestVec += 1.0 #Cutting Properties currLine = line.strip().split('\t') lineArr = for i in range(21): lineArr.append(float(currLine[i])) #Determine if the value is consistent with the true value if int(classifyVector(array(lineArr), trainWeights))!= int(float(currLine)): errorCount += 1 #Calculation error rate errorRate = (float(errorCount)/numTestVec) print ("The error rate of this training is%f" % errorRate) return errorRate #Call functions multiple times to get the average error rate def multiTest(): numTests = 100 errorSum=0.0 for k in range(numTests): errorSum += colicTest() print ("%d Average error rate per training session: %f" % (numTests, errorSum/float(numTests)))
3.logistic Regression Algorithm Experiments
This experiment uses a good article dataset to judge whether an article is good or not by 20 attribute data such as clicks and reading. The results of 100 experiments are as follows
Logistic Regression Advantages
1. There is no need to assume data distribution beforehand
2. Approximate probability predictions for Categories can be obtained (probability values can also be used for subsequent applications)
3. It can directly apply existing numerical optimization algorithms (such as Newton method) to find the optimal solution, which is fast and efficient.
This experiment found a large number of datasets, found that if the range of attributes is -1~1, it is difficult to accurately classify. Changing the dataset will significantly improve the effect when the digital range is expanded.