11 Classification issues
11.1 Logistic Regression
11.1.1 Generalized Linear Regression
 Course Review
 Linear regression: The relationship between independent and dependent variables is represented by a linear model. Estimate future or unknown data based on known sample data

loglinear regression
lny = wx+b that is, y=ewx+b
Y= wx+b
g(y) = wx+b i.e., y=h(wx+b)
y = g1(wx+b) 
generalized linear model
 g(.)): a link function, any monotonic differentiable function
 Highdimensional model Y = g1(WTX)
W = (w0,w1,...,wm)T
X = (x0,x1,...,xm)T
x0=1
11.1.2 Logistic Regression
11.1.2.1 Classification Issues
 Classification issues: spam identification, picture classification, disease diagnosis
 Classifier: Can even automatically classify input data
Input: Feature; Output: discrete values
11.1.2.2 Implementing Classifier
 Preparing training samples
 Training Classifier
 Classification of new samples
 unitstep function
Not smooth
Discontinuity  Bicategorization Problem 1/0  Positive and Reverse
 Logarithmic probability function: y=1/(1+ez)
Monotonic rise, continuous, smooth
Any order derivable  Logarithmic probability regression/logistic regression: y=1/(1+e(wx+b))
 Sigmoid function: A function of shape s that converts a function with an infinite range of values into a range of 0 to 1 values.
y=g1(z)=σ(z)=σ(wx+b)=1/(1+ez)=1/(1+e(wx+b))  Multivariate model
y = 1/(1+e(W^TX))
W = (w0,w1,...,wm)T
X = (x0,x1,...,xm)T
x0=1
11.1.3 CrossEntropy Loss Function
 CrossEntropy Loss Function
 Each error is nonnegative
 The change trend of function values and errors is consistent
 convex function
 Partial derivative for model parameters, none σ’ Item ()
 Average CrossEntropy Loss Function
 The value of the partial derivative is only affected by the deviation between the label value and the predicted value
 The crossentropy loss function is convex, so the minimum value obtained by using the gradient descent method is the global minimum
11.1.4 accuracy
 Accuracy can be used to evaluate a classifier's performance
 Accuracy = number of samples correctly classified / total number of samples
 It is impossible to subdivide a model classifier by accuracy alone
11.1.5 Manual threeclass problemcrossentropy loss function, weight update formula
11.2 Example: Implementing Univariate Logical Regression
11.2.1 sigmoid() functioncode implementation
y = 1 / ( 1 + e(wx+b))
>>> import tensorflow as tf >>> import numpy as np >>> x = np.array([1.,2.,3.,4.]) >>> w = tf.Variable(1.) >>> b = tf.Variable(1.) >>> y = 1/(1+tf.exp((w*x+b))) >>> y <tf.Tensor: id=46, shape=(4,), dtype=float32, numpy=array([0.880797 , 0.95257413, 0.98201376, 0.9933072 ], dtype=float32)>
 Note that the parameter requirement for the tf.exp() function is a floating point number, otherwise an error will be reported
11.2.2 CrossEntropy Loss FunctionCode Implementation
>>> import tensorflow as tf >>> import numpy as np >>> y = np.array([0,0,1,1]) >>> pred=np.array([0.1,0.2,0.8,0.49]) >>> # CrossEntropy Loss Function >>> tf.reduce_sum(y*tf.math.log(pred)+(1y)*tf.math.log(1pred)) <tf.Tensor: id=58, shape=(), dtype=float64, numpy=1.2649975061637104> >>> # Average Cross Entropy Patrol Loss Function >>> tf.reduce_mean(y*tf.math.log(pred)+(1y)*tf.math.log(1pred)) <tf.Tensor: id=70, shape=(), dtype=float64, numpy=0.3162493765409276>
11.2.3 AccuracyCode Implementation
11.2.3.1 Judgment threshold is 0.5tf.round() function
 Accuracy = number of samples correctly classified / total number of samples
>>> import tensorflow as tf >>> import numpy as np >>> y = np.array([0,0,1,1]) >>> pred=np.array([0.1,0.2,0.8,0.49]) >>> # If you set the threshold to 0.5, you can use the rounding function round() to convert it to 0 or 1 >>> tf.round(pred) <tf.Tensor: id=83, shape=(4,), dtype=float64, numpy=array([0., 0., 1., 0.])> >>> # Then use the equal() function to compare the predicted and labeled values element by element, and the result is a and y Onedimensional tensors of the same shape >>> tf.equal(tf.round(pred),y) <tf.Tensor: id=87, shape=(4,), dtype=bool, numpy=array([ True, True, True, False])> >>> # The cast() function is used below to convert this result to an integer >>> tf.cast(tf.equal(tf.round(pred),y),tf.int8) <tf.Tensor: id=92, shape=(4,), dtype=int8, numpy=array([1, 1, 1, 0], dtype=int8)> >>> # Then average all the elements to get the correct proportion of the sample in all the samples >>> tf.reduce_mean(tf.cast(tf.equal(tf.round(pred),y),tf.int8)) <tf.Tensor: id=99, shape=(), dtype=int8, numpy=0> >>> # Remember, if the round() function, the parameter is 0.5, and the value returned is 0 >>> tf.round(0.5) <tf.Tensor: id=101, shape=(), dtype=float32, numpy=0.0>
11.2.3.2 Judgment threshold is not a 0.5where(condition,a,b) function
where(condition,a,b)
 Returns the value of a or b based on a condition al
 If an element in the condition is true, the corresponding position returns a, otherwise b
>>> import tensorflow as tf >>> import numpy as np >>> y = np.array([0,0,1,1]) >>> pred=np.array([0.1,0.2,0.8,0.49]) >>> tf.where(pred<0.5,0,1) <tf.Tensor: id=105, shape=(4,), dtype=int32, numpy=array([0, 0, 1, 0])> >>> pred<0.5 array([ True, True, False, True]) >>> tf.where(pred<0.4,0,1)
 Parameters A and b can also be arrays or tensors, in which case a and b must have the same shape, and their first dimension must match the condition shape
>>> import tensorflow as tf >>> import numpy as np >>> pred=np.array([0.1,0.2,0.8,0.49]) >>> a = np.array([1,2,3,4]) >>> b = np.array([10,20,30,40]) >>> # When the element in pred is less than 0.5, the element at the corresponding position in a is returned, otherwise the element at the corresponding position in b is returned. >>> tf.where(pred<0.5,a,b) <tf.Tensor: id=117, shape=(4,), dtype=int32, numpy=array([ 1, 2, 30, 4])> >>> # You can see that 1, 2, and 4 elements in the array pred are all less than 0.5, so take the elements in array a ，The third element is greater than 0.5，take b Elements in
 Parameters A and b can also be omitted, where the index of elements greater than or equal to 0.5 in the return value array pred is given as a twodimensional tensor
>>> tf.where(pred>=0.5) <tf.Tensor: id=119, shape=(1, 1), dtype=int64, numpy=array([[2]], dtype=int64)>
*Use where below to calculate accuracy
>>> import tensorflow as tf >>> import numpy as np >>> y = np.array([0,0,1,1]) >>> pred=np.array([0.1,0.2,0.8,0.49]) >>> tf.reduce_mean(tf.cast(tf.equal(tf.where(pred<0.5,0,1),y),tf.float32)) <tf.Tensor: id=136, shape=(), dtype=float32, numpy=0.75>
11.2.4 Univariate Logistic Regression  Examples of Home Sales Records
 0: ordinary residence; 1: Premium Residence
11.2.4.1 Loading data
# 1 Load data import tensorflow as tf import numpy as np import matplotlib.pyplot as plt # The measure of area x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00,106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21]) # type y = np.array([1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0]) plt.scatter(x,y) plt.show()
The output is:
 Categories are only 0 and 1
11.2.4.2 Data Processing
# 2 Data processing # The sigmoid function is centered on 0, so it centers the data x_train = x  np.mean(x) y_train = y plt.scatter(x_train,y_train) plt.show()
The output is:
 As you can see, these points are translated as a whole, and their relative positions are unchanged
11.2.4.3 Setting Hyperparameters
# 3 Set Hyperparameters learn_rate = 0.005 iter = 5 display_step = 1
11.2.4.4 Setting model variable initial values
# 4 Set model variable initial value np.random.seed(612) w = tf.Variable(np.random.randn()) b = tf.Variable(np.random.randn())
11.2.4.5 Training Model
# 5 Training Model cross_train = [] acc_train = [] for i in range(0,iter+1): with tf.GradientTape() as tape: pred_train = 1/(1+tf.exp(w*x_train+b)) Loss_train = tf.reduce_mean(y_train*tf.math.log(pred_train)+(1y_train)*tf.math.log(1pred_train)) Accuracy_train = tf.reduce_mean(tf.cast(tf.equal(tf.where(pred_train<0.5,0,1),y_train),tf.float32)) cross_train.append(Loss_train) acc_train.append(Accuracy_train) dL_dw,dL_db = tape.gradient(Loss_train,[w,b]) w.assign_sub(learn_rate*dL_dw) b.assign_sub(learn_rate*dL_db) if i % display_step == 0: print("i: %i, Train Loss: %f, Accuracy: %f" % (i,Loss_train,Accuracy_train))
The output is:
i: 0, Train Loss: 1.140986, Accuracy: 0.375000 i: 1, Train Loss: 0.703207, Accuracy: 0.625000 i: 2, Train Loss: 0.648479, Accuracy: 0.625000 i: 3, Train Loss: 0.631729, Accuracy: 0.687500 i: 4, Train Loss: 0.624276, Accuracy: 0.687500 i: 5, Train Loss: 0.620331, Accuracy: 0.750000
11.2.4.6 Increase sigmoid curve visualization output
# 1 Load data import tensorflow as tf import numpy as np import matplotlib.pyplot as plt # The measure of area x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00,106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21]) # type y = np.array([1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0]) # 2 Data processing # The sigmoid function is centered on 0, so it centers the data x_train = x  np.mean(x) y_train = y # 3 Set Hyperparameters learn_rate = 0.005 iter = 5 display_step = 1 # 4 Set model variable initial value np.random.seed(612) w = tf.Variable(np.random.randn()) b = tf.Variable(np.random.randn()) x_ = range(80,80) y_ = 1/(1+tf.exp((w*x_+b))) # Scatter plot of output data before iteration plt.scatter(x_train,y_train) # sigmoid curve with initialization parameters plt.plot(x_,y_,c='r',linewidth=3) # 5 Training Model cross_train = [] acc_train = [] for i in range(0,iter+1): with tf.GradientTape() as tape: pred_train = 1/(1+tf.exp(w*x_train+b)) Loss_train = tf.reduce_mean(y_train*tf.math.log(pred_train)+(1y_train)*tf.math.log(1pred_train)) Accuracy_train = tf.reduce_mean(tf.cast(tf.equal(tf.where(pred_train<0.5,0,1),y_train),tf.float32)) cross_train.append(Loss_train) acc_train.append(Accuracy_train) dL_dw,dL_db = tape.gradient(Loss_train,[w,b]) w.assign_sub(learn_rate*dL_dw) b.assign_sub(learn_rate*dL_db) if i % display_step == 0: print("i: %i, Train Loss: %f, Accuracy: %f" % (i,Loss_train,Accuracy_train)) y_ = 1/(1+tf.exp((w*x_+b))) plt.plot(x_,y_) # sigmoid curve when outputting current weight plt.show()
The output is:
 Red is a sigmoid curve with initial parameters, and although it looks outlandish, the 10 samples that happened to belong to a common house have a probability of under 0.5, so the classification accuracy is 10/16
 Blue is the result of the first iteration
 Purple is the result of the last iteration
 Between 0 and 20, there is a certain overlap, there is some intersection, so points in this area may have classification errors, resulting in accuracy can not reach 100
 The results of the first classification, although more accurate, are more reasonable overall after more iterations
11.2.4.7 Validation Model
x_test = [128.15,45.00,141.43,106.27,99.00,53.84,85.36,70.00,162.00,114.60] pred_test = 1/(1+tf.exp((w*(x_testnp.mean(x))+b))) y_test = tf.where(pred_test<0.5,0,1) for i in range(len(x_test)): print(x_test[i],"\t",pred_test[i].numpy(),"\t",y_test[i].numpy(),"\t")
The output is:
128.15 0.84752923 1 45.0 0.003775026 0 141.43 0.94683856 1 106.27 0.4493811 0 99.0 0.30140132 0 53.84 0.008159011 0 85.36 0.11540905 0 70.0 0.032816157 0 162.0 0.99083763 1 114.6 0.62883806 1
sigmoid curves and scatterplots of 11.2.4.8 predictions
# 1 Load data import tensorflow as tf import numpy as np import matplotlib.pyplot as plt # The measure of area x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00,106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21]) # type y = np.array([1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0]) # 2 Data processing # The sigmoid function is centered on 0, so it centers the data x_train = x  np.mean(x) y_train = y # 3 Set Hyperparameters learn_rate = 0.005 iter = 5 display_step = 1 # 4 Set model variable initial value np.random.seed(612) w = tf.Variable(np.random.randn()) b = tf.Variable(np.random.randn()) #x_ = range(80,80) #y_ = 1/(1+tf.exp((w*x_+b))) # Scatter plot of output data before iteration #plt.scatter(x_train,y_train) # sigmoid curve with output parameters #plt.plot(x_,y_,c='r',linewidth=3) # 5 Training Model cross_train = [] acc_train = [] for i in range(0,iter+1): with tf.GradientTape() as tape: pred_train = 1/(1+tf.exp(w*x_train+b)) Loss_train = tf.reduce_mean(y_train*tf.math.log(pred_train)+(1y_train)*tf.math.log(1pred_train)) Accuracy_train = tf.reduce_mean(tf.cast(tf.equal(tf.where(pred_train<0.5,0,1),y_train),tf.float32)) cross_train.append(Loss_train) acc_train.append(Accuracy_train) dL_dw,dL_db = tape.gradient(Loss_train,[w,b]) w.assign_sub(learn_rate*dL_dw) b.assign_sub(learn_rate*dL_db) if i % display_step == 0: print("i: %i, Train Loss: %f, Accuracy: %f" % (i,Loss_train,Accuracy_train)) #y_ = 1/(1+tf.exp((w*x_+b))) #plt.plot(x_,y_) # Although x_is used Test means, but they are not test sets, test sets have labeled data x_test = [128.15,45.00,141.43,106.27,99.00,53.84,85.36,70.00,162.00,114.60] pred_test = 1/(1+tf.exp((w*(x_testnp.mean(x))+b))) y_test = tf.where(pred_test<0.5,0,1) for i in range(len(x_test)): print(x_test[i],"\t",pred_test[i].numpy(),"\t",y_test[i].numpy(),"\t") plt.scatter(x_test,y_test) x_ = range(80,80) y_ = 1/(1+tf.exp((w*x_+b))) plt.plot(x_+np.mean(x),y_) plt.show()
The output is:
11.3 Linear Classifier
11.3.1 Decision BoundaryLinear Separability
 A dataset in twodimensional space, if it can be divided into two categories by a straight line, which we call a linear separable dataset, is a linear classifier
 In threedimensional space, if a dataset is linearly separable, it means that it is divided into two classes by a plane
 In onedimensional space, all points are in a straight line, and linear separability can be understood as being separated by a point
 Here, classifying straight lines, planes, and points is called decision boundary
 An mcube,
If a hyperplane can be divided into two, then the data set is linearly separable, and the hyperplane is the decision boundary
11.3.2 Linear inseparability
 Linear inseparability: Samples are separated by two straight lines or by a curve
11.3.3 Logical Operations
11.3.3.1 Linear separableAND, OR, NOT

In logical operations, and &, or , not! Are linear separable

Obviously, the corresponding classifier can be obtained by training, and convergence is certain

And operation

Or operation

Nonoperation
11.3.3.2 Linear NonseparableXOR
 The XOR operation can also be thought of as a class addition operation
 Obviously, to separate it, there must be at least two straight lines
11.4 Example: Implementing multiple logistic regression
11.4.1 Iris Dataset (iris) Introduced Again
 Here, iris is classified using logistic regression
 Iris Dataset
 150 samples
 Four properties: calyx length (Sepal Length), calyx width (Sepal Width), petal length (Petal Length), and petal width (Petal Width)
 1 tag: Setosa, Versicolour, Virginia Iris
 The result of visualizing attribute pairs is:
Iris (blue), Iris (red), Virginia Iris (green)

Blue is so different from the other two that you can distinguish it by choosing either one. So two properties of iris, blue (iris) and red (iris discolored), calyx length and calyx width, were selected to classify them by logistic regression.

For reference, learn how to use Iris dataset [Neural Network and Deep LearningTensorFlow Practice]MOOC Course (6) (Matplotlib Data Visualization) Section 6.5 in