# 11 Classification issues

## 11.1 Logistic Regression

### 11.1.1 Generalized Linear Regression

• Course Review
1. Linear regression: The relationship between independent and dependent variables is represented by a linear model. Estimate future or unknown data based on known sample data
• log-linear regression
lny = wx+b that is, y=ewx+b
Y= wx+b
g(y) = wx+b i.e., y=h(wx+b)
y = g-1(wx+b)

• generalized linear model

1. g(.)): a link function, any monotonic differentiable function
• High-dimensional model Y = g-1(WTX)
W = (w0,w1,...,wm)T
X = (x0,x1,...,xm)T
x0=1

### 11.1.2 Logistic Regression

#### 11.1.2.1 Classification Issues

• Classification issues: spam identification, picture classification, disease diagnosis
• Classifier: Can even automatically classify input data
Input: Feature; Output: discrete values

#### 11.1.2.2 Implementing Classifier

• Preparing training samples
• Training Classifier
• Classification of new samples
• unit-step function
Not smooth
Discontinuity
• Bi-categorization Problem 1/0 - Positive and Reverse
• Logarithmic probability function: y=1/(1+e-z)
Monotonic rise, continuous, smooth
Any order derivable
• Logarithmic probability regression/logistic regression: y=1/(1+e-(wx+b))
• Sigmoid function: A function of shape s that converts a function with an infinite range of values into a range of 0 to 1 values.
y=g-1(z)=σ(z)=σ(wx+b)=1/(1+e-z)=1/(1+e-(wx+b))
• Multivariate model
y = 1/(1+e-(W^TX))
W = (w0,w1,...,wm)T
X = (x0,x1,...,xm)T
x0=1

### 11.1.3 Cross-Entropy Loss Function

• Cross-Entropy Loss Function
1. Each error is non-negative
2. The change trend of function values and errors is consistent
3. convex function
4. Partial derivative for model parameters, none σ’ Item () • Average Cross-Entropy Loss Function  • The value of the partial derivative is only affected by the deviation between the label value and the predicted value
• The cross-entropy loss function is convex, so the minimum value obtained by using the gradient descent method is the global minimum

### 11.1.4 accuracy

• Accuracy can be used to evaluate a classifier's performance
• Accuracy = number of samples correctly classified / total number of samples
• It is impossible to subdivide a model classifier by accuracy alone

### 11.1.5 Manual three-class problem-cross-entropy loss function, weight update formula  ## 11.2 Example: Implementing Univariate Logical Regression

### 11.2.1 sigmoid() function-code implementation

y = 1 / ( 1 + e-(wx+b))

```>>> import tensorflow as tf
>>> import numpy as np
>>> x = np.array([1.,2.,3.,4.])
>>> w = tf.Variable(1.)
>>> b = tf.Variable(1.)
>>> y = 1/(1+tf.exp(-(w*x+b)))
>>> y
<tf.Tensor: id=46, shape=(4,), dtype=float32, numpy=array([0.880797  , 0.95257413, 0.98201376, 0.9933072 ], dtype=float32)>
```
• Note that the parameter requirement for the tf.exp() function is a floating point number, otherwise an error will be reported

### 11.2.2 Cross-Entropy Loss Function-Code Implementation ```>>> import tensorflow as tf
>>> import numpy as np

>>> y = np.array([0,0,1,1])
>>> pred=np.array([0.1,0.2,0.8,0.49])

>>> # Cross-Entropy Loss Function
>>> -tf.reduce_sum(y*tf.math.log(pred)+(1-y)*tf.math.log(1-pred))
<tf.Tensor: id=58, shape=(), dtype=float64, numpy=1.2649975061637104>

>>> # Average Cross Entropy Patrol Loss Function
>>> -tf.reduce_mean(y*tf.math.log(pred)+(1-y)*tf.math.log(1-pred))
<tf.Tensor: id=70, shape=(), dtype=float64, numpy=0.3162493765409276>
```

### 11.2.3 Accuracy-Code Implementation

#### 11.2.3.1 Judgment threshold is 0.5-tf.round() function

• Accuracy = number of samples correctly classified / total number of samples
```>>> import tensorflow as tf
>>> import numpy as np

>>> y = np.array([0,0,1,1])
>>> pred=np.array([0.1,0.2,0.8,0.49])

>>> # If you set the threshold to 0.5, you can use the rounding function round() to convert it to 0 or 1
>>> tf.round(pred)
<tf.Tensor: id=83, shape=(4,), dtype=float64, numpy=array([0., 0.,
1., 0.])>

>>> # Then use the equal() function to compare the predicted and labeled values element by element, and the result is a
and y One-dimensional tensors of the same shape
>>> tf.equal(tf.round(pred),y)
<tf.Tensor: id=87, shape=(4,), dtype=bool, numpy=array([ True,  True,  True, False])>

>>> # The cast() function is used below to convert this result to an integer
>>> tf.cast(tf.equal(tf.round(pred),y),tf.int8)
<tf.Tensor: id=92, shape=(4,), dtype=int8, numpy=array([1, 1, 1, 0], dtype=int8)>

>>> # Then average all the elements to get the correct proportion of the sample in all the samples
>>> tf.reduce_mean(tf.cast(tf.equal(tf.round(pred),y),tf.int8))
<tf.Tensor: id=99, shape=(), dtype=int8, numpy=0>

>>> # Remember, if the round() function, the parameter is 0.5, and the value returned is 0
>>> tf.round(0.5)
<tf.Tensor: id=101, shape=(), dtype=float32, numpy=0.0>
```

#### 11.2.3.2 Judgment threshold is not a 0.5-where(condition,a,b) function

```where(condition,a,b)
```
• Returns the value of a or b based on a condition al
• If an element in the condition is true, the corresponding position returns a, otherwise b
```>>> import tensorflow as tf
>>> import numpy as np

>>> y = np.array([0,0,1,1])
>>> pred=np.array([0.1,0.2,0.8,0.49])

>>> tf.where(pred<0.5,0,1)
<tf.Tensor: id=105, shape=(4,), dtype=int32, numpy=array([0, 0, 1,
0])>

>>> pred<0.5
array([ True,  True, False,  True])

>>> tf.where(pred<0.4,0,1)
```
• Parameters A and b can also be arrays or tensors, in which case a and b must have the same shape, and their first dimension must match the condition shape
```>>> import tensorflow as tf
>>> import numpy as np

>>> pred=np.array([0.1,0.2,0.8,0.49])

>>> a = np.array([1,2,3,4])
>>> b = np.array([10,20,30,40])

>>> # When the element in pred is less than 0.5, the element at the corresponding position in a is returned, otherwise the element at the corresponding position in b is returned.
>>> tf.where(pred<0.5,a,b)
<tf.Tensor: id=117, shape=(4,), dtype=int32, numpy=array([ 1,  2, 30,  4])>
>>> # You can see that 1, 2, and 4 elements in the array pred are all less than 0.5, so take the elements in array a
，The third element is greater than 0.5，take b Elements in
```
• Parameters A and b can also be omitted, where the index of elements greater than or equal to 0.5 in the return value array pred is given as a two-dimensional tensor
```>>> tf.where(pred>=0.5)
<tf.Tensor: id=119, shape=(1, 1), dtype=int64, numpy=array([], dtype=int64)>
```

*Use where below to calculate accuracy

```>>> import tensorflow as tf
>>> import numpy as np
>>> y = np.array([0,0,1,1])
>>> pred=np.array([0.1,0.2,0.8,0.49])

>>> tf.reduce_mean(tf.cast(tf.equal(tf.where(pred<0.5,0,1),y),tf.float32))
<tf.Tensor: id=136, shape=(), dtype=float32, numpy=0.75>
```

### 11.2.4 Univariate Logistic Regression - Examples of Home Sales Records • 0: ordinary residence; 1: Premium Residence

```# 1 Load data
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# The measure of area
x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00,106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21])
# type
y = np.array([1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0])

plt.scatter(x,y)
plt.show()
```

The output is: • Categories are only 0 and 1

#### 11.2.4.2 Data Processing

```# 2 Data processing
# The sigmoid function is centered on 0, so it centers the data
x_train = x - np.mean(x)
y_train = y

plt.scatter(x_train,y_train)
plt.show()
```

The output is: • As you can see, these points are translated as a whole, and their relative positions are unchanged

#### 11.2.4.3 Setting Hyperparameters

```# 3 Set Hyperparameters
learn_rate = 0.005
iter = 5
display_step = 1

```

#### 11.2.4.4 Setting model variable initial values

```# 4 Set model variable initial value
np.random.seed(612)
w = tf.Variable(np.random.randn())
b = tf.Variable(np.random.randn())
```

#### 11.2.4.5 Training Model

```# 5 Training Model
cross_train = []
acc_train = []

for i in range(0,iter+1):

pred_train = 1/(1+tf.exp(-w*x_train+b))
Loss_train = -tf.reduce_mean(y_train*tf.math.log(pred_train)+(1-y_train)*tf.math.log(1-pred_train))
Accuracy_train = tf.reduce_mean(tf.cast(tf.equal(tf.where(pred_train<0.5,0,1),y_train),tf.float32))

cross_train.append(Loss_train)
acc_train.append(Accuracy_train)

w.assign_sub(learn_rate*dL_dw)
b.assign_sub(learn_rate*dL_db)

if i % display_step == 0:
print("i: %i, Train Loss: %f, Accuracy: %f" % (i,Loss_train,Accuracy_train))
```

The output is:

```i: 0, Train Loss: 1.140986, Accuracy: 0.375000
i: 1, Train Loss: 0.703207, Accuracy: 0.625000
i: 2, Train Loss: 0.648479, Accuracy: 0.625000
i: 3, Train Loss: 0.631729, Accuracy: 0.687500
i: 4, Train Loss: 0.624276, Accuracy: 0.687500
i: 5, Train Loss: 0.620331, Accuracy: 0.750000
```

#### 11.2.4.6 Increase sigmoid curve visualization output

```# 1 Load data
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# The measure of area
x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00,106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21])
# type
y = np.array([1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0])

# 2 Data processing
# The sigmoid function is centered on 0, so it centers the data
x_train = x - np.mean(x)
y_train = y

# 3 Set Hyperparameters
learn_rate = 0.005
iter = 5
display_step = 1

# 4 Set model variable initial value
np.random.seed(612)
w = tf.Variable(np.random.randn())
b = tf.Variable(np.random.randn())

x_ = range(-80,80)
y_ = 1/(1+tf.exp(-(w*x_+b)))

# Scatter plot of output data before iteration
plt.scatter(x_train,y_train)
# sigmoid curve with initialization parameters
plt.plot(x_,y_,c='r',linewidth=3)

# 5 Training Model
cross_train = []
acc_train = []

for i in range(0,iter+1):

pred_train = 1/(1+tf.exp(-w*x_train+b))
Loss_train = -tf.reduce_mean(y_train*tf.math.log(pred_train)+(1-y_train)*tf.math.log(1-pred_train))
Accuracy_train = tf.reduce_mean(tf.cast(tf.equal(tf.where(pred_train<0.5,0,1),y_train),tf.float32))

cross_train.append(Loss_train)
acc_train.append(Accuracy_train)

w.assign_sub(learn_rate*dL_dw)
b.assign_sub(learn_rate*dL_db)

if i % display_step == 0:
print("i: %i, Train Loss: %f, Accuracy: %f" % (i,Loss_train,Accuracy_train))
y_ = 1/(1+tf.exp(-(w*x_+b)))
plt.plot(x_,y_)
# sigmoid curve when outputting current weight
plt.show()
```

The output is: • Red is a sigmoid curve with initial parameters, and although it looks outlandish, the 10 samples that happened to belong to a common house have a probability of under 0.5, so the classification accuracy is 10/16
• Blue is the result of the first iteration
• Purple is the result of the last iteration
• Between 0 and 20, there is a certain overlap, there is some intersection, so points in this area may have classification errors, resulting in accuracy can not reach 100
• The results of the first classification, although more accurate, are more reasonable overall after more iterations

#### 11.2.4.7 Validation Model

```x_test = [128.15,45.00,141.43,106.27,99.00,53.84,85.36,70.00,162.00,114.60]
pred_test = 1/(1+tf.exp(-(w*(x_test-np.mean(x))+b)))
y_test = tf.where(pred_test<0.5,0,1)
for i in range(len(x_test)):
print(x_test[i],"\t",pred_test[i].numpy(),"\t",y_test[i].numpy(),"\t")

```

The output is:

```128.15   0.84752923      1
45.0     0.003775026     0
141.43   0.94683856      1
106.27   0.4493811       0
99.0     0.30140132      0
53.84    0.008159011     0
85.36    0.11540905      0
70.0     0.032816157     0
162.0    0.99083763      1
114.6    0.62883806      1
```

#### sigmoid curves and scatterplots of 11.2.4.8 predictions

```# 1 Load data
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# The measure of area
x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00,106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21])
# type
y = np.array([1,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0])

# 2 Data processing
# The sigmoid function is centered on 0, so it centers the data
x_train = x - np.mean(x)
y_train = y

# 3 Set Hyperparameters
learn_rate = 0.005
iter = 5
display_step = 1

# 4 Set model variable initial value
np.random.seed(612)
w = tf.Variable(np.random.randn())
b = tf.Variable(np.random.randn())

#x_ = range(-80,80)
#y_ = 1/(1+tf.exp(-(w*x_+b)))

# Scatter plot of output data before iteration
#plt.scatter(x_train,y_train)
# sigmoid curve with output parameters
#plt.plot(x_,y_,c='r',linewidth=3)

# 5 Training Model
cross_train = []
acc_train = []

for i in range(0,iter+1):

pred_train = 1/(1+tf.exp(-w*x_train+b))
Loss_train = -tf.reduce_mean(y_train*tf.math.log(pred_train)+(1-y_train)*tf.math.log(1-pred_train))
Accuracy_train = tf.reduce_mean(tf.cast(tf.equal(tf.where(pred_train<0.5,0,1),y_train),tf.float32))

cross_train.append(Loss_train)
acc_train.append(Accuracy_train)

w.assign_sub(learn_rate*dL_dw)
b.assign_sub(learn_rate*dL_db)

if i % display_step == 0:
print("i: %i, Train Loss: %f, Accuracy: %f" % (i,Loss_train,Accuracy_train))
#y_ = 1/(1+tf.exp(-(w*x_+b)))
#plt.plot(x_,y_)
# Although x_is used Test means, but they are not test sets, test sets have labeled data
x_test = [128.15,45.00,141.43,106.27,99.00,53.84,85.36,70.00,162.00,114.60]
pred_test = 1/(1+tf.exp(-(w*(x_test-np.mean(x))+b)))
y_test = tf.where(pred_test<0.5,0,1)
for i in range(len(x_test)):
print(x_test[i],"\t",pred_test[i].numpy(),"\t",y_test[i].numpy(),"\t")

plt.scatter(x_test,y_test)

x_ = range(-80,80)
y_ = 1/(1+tf.exp(-(w*x_+b)))
plt.plot(x_+np.mean(x),y_)

plt.show()
```

The output is: ## 11.3 Linear Classifier

### 11.3.1 Decision Boundary-Linear Separability

• A dataset in two-dimensional space, if it can be divided into two categories by a straight line, which we call a linear separable dataset, is a linear classifier
• In three-dimensional space, if a dataset is linearly separable, it means that it is divided into two classes by a plane
• In one-dimensional space, all points are in a straight line, and linear separability can be understood as being separated by a point
• Here, classifying straight lines, planes, and points is called decision boundary
• An m-cube,
If a hyperplane can be divided into two, then the data set is linearly separable, and the hyperplane is the decision boundary ### 11.3.2 Linear inseparability

• Linear inseparability: Samples are separated by two straight lines or by a curve ### 11.3.3 Logical Operations

#### 11.3.3.1 Linear separable-AND, OR, NOT

• In logical operations, and &, or |, not! Are linear separable

• Obviously, the corresponding classifier can be obtained by training, and convergence is certain

• And operation • Or operation • Non-operation #### 11.3.3.2 Linear Nonseparable-XOR

• The XOR operation can also be thought of as a class addition operation • Obviously, to separate it, there must be at least two straight lines

## 11.4 Example: Implementing multiple logistic regression

### 11.4.1 Iris Dataset (iris) Introduced Again

• Here, iris is classified using logistic regression
• Iris Dataset
1. 150 samples
2. Four properties: calyx length (Sepal Length), calyx width (Sepal Width), petal length (Petal Length), and petal width (Petal Width)
3. 1 tag: Setosa, Versicolour, Virginia Iris
4. The result of visualizing attribute pairs is: Iris (blue), Iris (red), Virginia Iris (green)

## 11.7 References

Posted on Fri, 12 Nov 2021 11:51:02 -0500 by shiggins