[Neural Network and Deep Learning-TensorFlow Practice]-MOOC Course (Classification Questions) in University of China

11 Classification issues

11.1 Logistic Regression

11.1.1 Generalized Linear Regression

  • Course Review
  1. Linear regression: The relationship between independent and dependent variables is represented by a linear model. Estimate future or unknown data based on known sample data
  • log-linear regression
    lny = wx+b that is, y=ewx+b
    Y= wx+b
    g(y) = wx+b i.e., y=h(wx+b)
    y = g-1(wx+b)

  • generalized linear model

  1. g(.)): a link function, any monotonic differentiable function
  • High-dimensional model Y = g-1(WTX)
    W = (w0,w1,...,wm)T
    X = (x0,x1,...,xm)T
    x0=1

11.1.2 Logistic Regression

11.1.2.1 Classification Issues

  • Classification issues: spam identification, picture classification, disease diagnosis
  • Classifier: Can even automatically classify input data
    Input: Feature; Output: discrete values

11.1.2.2 Implementing Classifier

  • Preparing training samples
  • Training Classifier
  • Classification of new samples
  • unit-step function
    Not smooth
    Discontinuity
  • Bi-categorization Problem 1/0 - Positive and Reverse
  • Logarithmic probability function: y=1/(1+e-z)
    Monotonic rise, continuous, smooth
    Any order derivable
  • Logarithmic probability regression/logistic regression: y=1/(1+e-(wx+b))
  • Sigmoid function: A function of shape s that converts a function with an infinite range of values into a range of 0 to 1 values.
    y=g-1(z)=σ(z)=σ(wx+b)=1/(1+e-z)=1/(1+e-(wx+b))
  • Multivariate model
    y = 1/(1+e-(W^TX))
    W = (w0,w1,...,wm)T
    X = (x0,x1,...,xm)T
    x0=1

11.1.3 Cross-Entropy Loss Function

  • Cross-Entropy Loss Function
  1. Each error is non-negative
  2. The change trend of function values and errors is consistent
  3. convex function
  4. Partial derivative for model parameters, none σ’ Item ()
  • Average Cross-Entropy Loss Function

  • The value of the partial derivative is only affected by the deviation between the label value and the predicted value
  • The cross-entropy loss function is convex, so the minimum value obtained by using the gradient descent method is the global minimum

11.1.4 accuracy

  • Accuracy can be used to evaluate a classifier's performance
  • Accuracy = number of samples correctly classified / total number of samples
  • It is impossible to subdivide a model classifier by accuracy alone

11.1.5 Manual three-class problem-cross-entropy loss function, weight update formula


11.2 Example: Implementing Univariate Logical Regression

11.2.1 sigmoid() function-code implementation

y = 1 / ( 1 + e-(wx+b))

>>> import tensorflow as tf
>>> import numpy as np
>>> x = np.array([1.,2.,3.,4.])
>>> w = tf.Variable(1.)
>>> b = tf.Variable(1.)
>>> y = 1/(1+tf.exp(-(w*x+b)))
>>> y
<tf.Tensor: id=46, shape=(4,), dtype=float32, numpy=array([0.880797  , 0.95257413, 0.98201376, 0.9933072 ], dtype=float32)>
  • Note that the parameter requirement for the tf.exp() function is a floating point number, otherwise an error will be reported

11.2.2 Cross-Entropy Loss Function-Code Implementation

>>> import tensorflow as tf
>>> import numpy as np

>>> y = np.array([0,0,1,1])
>>> pred=np.array([0.1,0.2,0.8,0.49])

>>> # Cross-Entropy Loss Function
>>> -tf.reduce_sum(y*tf.math.log(pred)+(1-y)*tf.math.log(1-pred))
<tf.Tensor: id=58, shape=(), dtype=float64, numpy=1.2649975061637104>

>>> # Average Cross Entropy Patrol Loss Function
>>> -tf.reduce_mean(y*tf.math.log(pred)+(1-y)*tf.math.log(1-pred)) 
<tf.Tensor: id=70, shape=(), dtype=float64, numpy=0.3162493765409276>

11.2.3 Accuracy-Code Implementation

11.2.3.1 Judgment threshold is 0.5-tf.round() function

  • Accuracy = number of samples correctly classified / total number of samples
>>> import tensorflow as tf
>>> import numpy as np

>>> y = np.array([0,0,1,1])
>>> pred=np.array([0.1,0.2,0.8,0.49])

>>> # If you set the threshold to 0.5, you can use the rounding function round() to convert it to 0 or 1 
>>> tf.round(pred)
<tf.Tensor: id=83, shape=(4,), dtype=float64, numpy=array([0., 0., 
1., 0.])>

>>> # Then use the equal() function to compare the predicted and labeled values element by element, and the result is a
 and y One-dimensional tensors of the same shape
>>> tf.equal(tf.round(pred),y)
<tf.Tensor: id=87, shape=(4,), dtype=bool, numpy=array([ True,  True,  True, False])>

>>> # The cast() function is used below to convert this result to an integer
>>> tf.cast(tf.equal(tf.round(pred),y),tf.int8)
<tf.Tensor: id=92, shape=(4,), dtype=int8, numpy=array([1, 1, 1, 0], dtype=int8)>

>>> # Then average all the elements to get the correct proportion of the sample in all the samples 
>>> tf.reduce_mean(tf.cast(tf.equal(tf.round(pred),y),tf.int8))    
<tf.Tensor: id=99, shape=(), dtype=int8, numpy=0>

>>> # Remember, if the round() function, the parameter is 0.5, and the value returned is 0
>>> tf.round(0.5)
<tf.Tensor: id=101, shape=(), dtype=float32, numpy=0.0>

11.2.3.2 Judgment threshold is not a 0.5-where(condition,a,b) function

where(condition,a,b)
  • Returns the value of a or b based on a condition al
  • If an element in the condition is true, the corresponding position returns a, otherwise b
>>> import tensorflow as tf
>>> import numpy as np

>>> y = np.array([0,0,1,1])
>>> pred=np.array([0.1,0.2,0.8,0.49])

>>> tf.where(pred<0.5,0,1)
<tf.Tensor: id=105, shape=(4,), dtype=int32, numpy=array([0, 0, 1, 
0])>

>>> pred<0.5
array([ True,  True, False,  True])

>>> tf.where(pred<0.4,0,1)
  • Parameters A and b can also be arrays or tensors, in which case a and b must have the same shape, and their first dimension must match the condition shape
>>> import tensorflow as tf
>>> import numpy as np

>>> pred=np.array([0.1,0.2,0.8,0.49])

>>> a = np.array([1,2,3,4])
>>> b = np.array([10,20,30,40])

>>> # When the element in pred is less than 0.5, the element at the corresponding position in a is returned, otherwise the element at the corresponding position in b is returned.
>>> tf.where(pred<0.5,a,b)
<tf.Tensor: id=117, shape=(4,), dtype=int32, numpy=array([ 1,  2, 30,  4])>
>>> # You can see that 1, 2, and 4 elements in the array pred are all less than 0.5, so take the elements in array a
,The third element is greater than 0.5,take b Elements in
  • Parameters A and b can also be omitted, where the index of elements greater than or equal to 0.5 in the return value array pred is given as a two-dimensional tensor
>>> tf.where(pred>=0.5)
<tf.Tensor: id=119, shape=(1, 1), dtype=int64, numpy=array([[2]], dtype=int64)>

*Use where below to calculate accuracy

>>> import tensorflow as tf
>>> import numpy as np
>>> y = np.array([0,0,1,1]) 
>>> pred=np.array([0.1,0.2,0.8,0.49])  

>>> tf.reduce_mean(tf.cast(tf.equal(tf.where(pred<0.5,0,1),y),tf.float32))
<tf.Tensor: id=136, shape=(), dtype=float32, numpy=0.75>

11.2.4 Univariate Logistic Regression - Examples of Home Sales Records

  • 0: ordinary residence; 1: Premium Residence

11.2.4.1 Loading data

# 1 Load data
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# The measure of area
x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00,106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21])
# type
y = np.array([1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0])

plt.scatter(x,y)
plt.show()

The output is:

  • Categories are only 0 and 1

11.2.4.2 Data Processing

# 2 Data processing
# The sigmoid function is centered on 0, so it centers the data
x_train = x - np.mean(x)
y_train = y

plt.scatter(x_train,y_train)
plt.show()

The output is:

  • As you can see, these points are translated as a whole, and their relative positions are unchanged

11.2.4.3 Setting Hyperparameters

# 3 Set Hyperparameters
learn_rate = 0.005
iter = 5
display_step = 1

11.2.4.4 Setting model variable initial values

# 4 Set model variable initial value
np.random.seed(612)
w = tf.Variable(np.random.randn())
b = tf.Variable(np.random.randn())

11.2.4.5 Training Model

# 5 Training Model
cross_train = []
acc_train = []

for i in range(0,iter+1):

    with tf.GradientTape() as tape:
        pred_train = 1/(1+tf.exp(-w*x_train+b))
        Loss_train = -tf.reduce_mean(y_train*tf.math.log(pred_train)+(1-y_train)*tf.math.log(1-pred_train))
        Accuracy_train = tf.reduce_mean(tf.cast(tf.equal(tf.where(pred_train<0.5,0,1),y_train),tf.float32))

    cross_train.append(Loss_train)
    acc_train.append(Accuracy_train)

    dL_dw,dL_db = tape.gradient(Loss_train,[w,b])

    w.assign_sub(learn_rate*dL_dw)
    b.assign_sub(learn_rate*dL_db)

    if i % display_step == 0:
        print("i: %i, Train Loss: %f, Accuracy: %f" % (i,Loss_train,Accuracy_train))

The output is:

i: 0, Train Loss: 1.140986, Accuracy: 0.375000
i: 1, Train Loss: 0.703207, Accuracy: 0.625000
i: 2, Train Loss: 0.648479, Accuracy: 0.625000
i: 3, Train Loss: 0.631729, Accuracy: 0.687500
i: 4, Train Loss: 0.624276, Accuracy: 0.687500
i: 5, Train Loss: 0.620331, Accuracy: 0.750000

11.2.4.6 Increase sigmoid curve visualization output

# 1 Load data
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# The measure of area
x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00,106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21])
# type
y = np.array([1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0])

# 2 Data processing
# The sigmoid function is centered on 0, so it centers the data
x_train = x - np.mean(x)
y_train = y

# 3 Set Hyperparameters
learn_rate = 0.005
iter = 5
display_step = 1

# 4 Set model variable initial value
np.random.seed(612)
w = tf.Variable(np.random.randn())
b = tf.Variable(np.random.randn())

x_ = range(-80,80)
y_ = 1/(1+tf.exp(-(w*x_+b)))

# Scatter plot of output data before iteration
plt.scatter(x_train,y_train)
# sigmoid curve with initialization parameters
plt.plot(x_,y_,c='r',linewidth=3)

# 5 Training Model
cross_train = []
acc_train = []

for i in range(0,iter+1):

    with tf.GradientTape() as tape:
        pred_train = 1/(1+tf.exp(-w*x_train+b))
        Loss_train = -tf.reduce_mean(y_train*tf.math.log(pred_train)+(1-y_train)*tf.math.log(1-pred_train))
        Accuracy_train = tf.reduce_mean(tf.cast(tf.equal(tf.where(pred_train<0.5,0,1),y_train),tf.float32))

    cross_train.append(Loss_train)
    acc_train.append(Accuracy_train)

    dL_dw,dL_db = tape.gradient(Loss_train,[w,b])

    w.assign_sub(learn_rate*dL_dw)
    b.assign_sub(learn_rate*dL_db)

    if i % display_step == 0:
        print("i: %i, Train Loss: %f, Accuracy: %f" % (i,Loss_train,Accuracy_train))
        y_ = 1/(1+tf.exp(-(w*x_+b)))
        plt.plot(x_,y_)
        # sigmoid curve when outputting current weight
plt.show()

The output is:

  • Red is a sigmoid curve with initial parameters, and although it looks outlandish, the 10 samples that happened to belong to a common house have a probability of under 0.5, so the classification accuracy is 10/16
  • Blue is the result of the first iteration
  • Purple is the result of the last iteration
  • Between 0 and 20, there is a certain overlap, there is some intersection, so points in this area may have classification errors, resulting in accuracy can not reach 100
  • The results of the first classification, although more accurate, are more reasonable overall after more iterations

11.2.4.7 Validation Model

x_test = [128.15,45.00,141.43,106.27,99.00,53.84,85.36,70.00,162.00,114.60]
pred_test = 1/(1+tf.exp(-(w*(x_test-np.mean(x))+b)))
y_test = tf.where(pred_test<0.5,0,1)
for i in range(len(x_test)):
    print(x_test[i],"\t",pred_test[i].numpy(),"\t",y_test[i].numpy(),"\t")

The output is:

128.15   0.84752923      1 
45.0     0.003775026     0
141.43   0.94683856      1
106.27   0.4493811       0
99.0     0.30140132      0
53.84    0.008159011     0
85.36    0.11540905      0
70.0     0.032816157     0
162.0    0.99083763      1
114.6    0.62883806      1

sigmoid curves and scatterplots of 11.2.4.8 predictions

# 1 Load data
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# The measure of area
x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00,106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21])
# type
y = np.array([1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0])

# 2 Data processing
# The sigmoid function is centered on 0, so it centers the data
x_train = x - np.mean(x)
y_train = y

# 3 Set Hyperparameters
learn_rate = 0.005
iter = 5
display_step = 1

# 4 Set model variable initial value
np.random.seed(612)
w = tf.Variable(np.random.randn())
b = tf.Variable(np.random.randn())

#x_ = range(-80,80)
#y_ = 1/(1+tf.exp(-(w*x_+b)))

# Scatter plot of output data before iteration
#plt.scatter(x_train,y_train)
# sigmoid curve with output parameters
#plt.plot(x_,y_,c='r',linewidth=3)

# 5 Training Model
cross_train = []
acc_train = []

for i in range(0,iter+1):

    with tf.GradientTape() as tape:
        pred_train = 1/(1+tf.exp(-w*x_train+b))
        Loss_train = -tf.reduce_mean(y_train*tf.math.log(pred_train)+(1-y_train)*tf.math.log(1-pred_train))
        Accuracy_train = tf.reduce_mean(tf.cast(tf.equal(tf.where(pred_train<0.5,0,1),y_train),tf.float32))

    cross_train.append(Loss_train)
    acc_train.append(Accuracy_train)

    dL_dw,dL_db = tape.gradient(Loss_train,[w,b])

    w.assign_sub(learn_rate*dL_dw)
    b.assign_sub(learn_rate*dL_db)

    if i % display_step == 0:
        print("i: %i, Train Loss: %f, Accuracy: %f" % (i,Loss_train,Accuracy_train))
        #y_ = 1/(1+tf.exp(-(w*x_+b)))
        #plt.plot(x_,y_)
# Although x_is used Test means, but they are not test sets, test sets have labeled data
x_test = [128.15,45.00,141.43,106.27,99.00,53.84,85.36,70.00,162.00,114.60]
pred_test = 1/(1+tf.exp(-(w*(x_test-np.mean(x))+b)))
y_test = tf.where(pred_test<0.5,0,1)
for i in range(len(x_test)):
    print(x_test[i],"\t",pred_test[i].numpy(),"\t",y_test[i].numpy(),"\t")

plt.scatter(x_test,y_test)

x_ = range(-80,80)
y_ = 1/(1+tf.exp(-(w*x_+b)))
plt.plot(x_+np.mean(x),y_)

plt.show()

The output is:

11.3 Linear Classifier

11.3.1 Decision Boundary-Linear Separability

  • A dataset in two-dimensional space, if it can be divided into two categories by a straight line, which we call a linear separable dataset, is a linear classifier
  • In three-dimensional space, if a dataset is linearly separable, it means that it is divided into two classes by a plane
  • In one-dimensional space, all points are in a straight line, and linear separability can be understood as being separated by a point
  • Here, classifying straight lines, planes, and points is called decision boundary
  • An m-cube,
    If a hyperplane can be divided into two, then the data set is linearly separable, and the hyperplane is the decision boundary

11.3.2 Linear inseparability

  • Linear inseparability: Samples are separated by two straight lines or by a curve

11.3.3 Logical Operations

11.3.3.1 Linear separable-AND, OR, NOT

  • In logical operations, and &, or |, not! Are linear separable

  • Obviously, the corresponding classifier can be obtained by training, and convergence is certain

  • And operation

  • Or operation

  • Non-operation

11.3.3.2 Linear Nonseparable-XOR

  • The XOR operation can also be thought of as a class addition operation
  • Obviously, to separate it, there must be at least two straight lines

11.4 Example: Implementing multiple logistic regression

11.4.1 Iris Dataset (iris) Introduced Again

  • Here, iris is classified using logistic regression
  • Iris Dataset
  1. 150 samples
  2. Four properties: calyx length (Sepal Length), calyx width (Sepal Width), petal length (Petal Length), and petal width (Petal Width)
  3. 1 tag: Setosa, Versicolour, Virginia Iris
  4. The result of visualizing attribute pairs is:

    Iris (blue), Iris (red), Virginia Iris (green)

11.7 References

[1] Neural Network and Deep Learning--TensorFlow Practice

Tags: neural networks TensorFlow Deep Learning

Posted on Fri, 12 Nov 2021 11:51:02 -0500 by shiggins