Artificial intelligence ----- > the third day, the time for data conversion, Tensorflow, a common machine learning algorithm

When is data conversion required and possible


     Pre training conversion
        -> Suitable for static model: that is, the model will not change with the change of user data
         Advantages:
             The transformation of data does not affect the use of the model
             When the algorithm of data conversion changes, it will not affect the retraining of the model
         Disadvantages:
             It can not solve the demand that the model changes with user data
             When the algorithm of data conversion is changed, all data needs to be re converted
     Training time conversion
        -> Suitable for dynamic model: that is, the model will change with the change of user data
         Advantages:
             It can solve the demand that the model changes with user data
             When the algorithm of data conversion is changed, it is not necessary to convert all data again
         Disadvantages:
             Data conversion takes time, and the use of the model will be affected

     Offline model -------- static model -------- > offline training
     Online model -------- dynamic model -------- > online training

TensorFLow


     An open source AI / deep learning SDK provided by google
             Tensorflow ------------------- > Training
             Tensorflow Lite ------------------ > test

     2. Installation of tensorflow:
    pip install tensorflow==1.8.0 -i https://mirrors.aliyun.com/pypi/simple/

     Use of tensorflow:
         The data used by tensorflow are all tensors. (vector)         

example:    

            import tensorflow as tf

            hello = tf.constant("Hello, TensorFlow")
            print(hello)
            sess = tf.Session()
            print(sess.run(hello))


         If you want to construct a tensor through tensorflow, you must use the tf.constant() method,
         Moreover, the process of using tensors must be completed in the tensorflow session,
         If you want to use sess. Run (traversal name)

         Use tensorflow to add two integers:         

import tensorflow as tf

#tf.constant(): construct a tensor according to a constant
a = tf.constant([3,4,5,6,7])
b = tf.constant(4)
c = tf.add(a,b)
print(c)
'''
with tf.Session() as sess:
    print(sess.run(a+b))
    print(sess.run(a-b))
'''

print(a)

sess = tf.Session()
print(sess.run(c))
mytest = tf.summary.FileWriter("log",sess.graph)
sess.close()


         Add the following statement to the tensorflow session:
            summary_write = tf.summary.FileWriter("log", sess.graph)
             To view the execution process of tensorflow:
                tensorboard --logdir="log"
         After execution, a url will be returned. Copy the url to the browser to view it

Common machine learning algorithms


    1.KNN


         K-nearest neighbor algorithm: This is a classification algorithm (most commonly used as secondary classification)

         Algorithm idea: find K (odd) data closest to the test point, and then make category statistics on the k data. The category with the largest number is the category of the test point.

         It is understood that those who are close to Zhu are red and those who are close to ink are black

     How to distinguish between black heart chrysanthemum and sunflower

                         Rudbeckia hirta              Sunflower
         Plant height          80-100cm          100-350cm
         Flower diameter              10-15cm           10-30cm

        hxj = [
            [88, 13],
            [90, 12],
            [82, 15],
            [93, 10],
            [95, 11],
            [99, 13],
            [83, 10]
            ]

        xrk = [
            [102, 13],
            [190, 18],
            [120, 15],
            [140, 20],
            [180, 18],
            [320, 27],
            [210, 22]
            ]

     How to calculate the distance:
         That is, find the distance from the point to all points in the training set
             How to calculate the distance between two points in the plane coordinate system:
                a: x1,y1
                b: x2,y2

                 Distance between a and b:
                    sqrt((x1-x2)**2 + (y1-y2)**2)

                for x in train_data:
                    sqrt((x[0]-t[0])**2 + (x[1]-t[1])**2)
     Sort all distances:
     Obtain the subscript of the first odd number of (K) closest points
     Get the category of the nearest K points
     Make statistics on categories to get the category with the largest number

     KNN process:
         1. Obtain the converted training set and test set
         2. Calculate the distance from the test set to all data in the training set
         3. Find the first odd number of closest points
         4. Count the labels and corresponding numbers of the first odd points
         5. Return the largest number of tags

import numpy as np
import matplotlib.pyplot as plt
'''
    use Counter It's for statistics
'''
from collections import Counter 

#Create a training set with 14 elements. The first 7 are the data of black heart chrysanthemum and the last 7 are the data of sunflower
#Each data (plant height (cm), flower diameter (cm))
Train_data = [
            [88, 13],
			[90, 12],
			[82, 15],
			[93, 10],
			[95, 11],
			[99, 13],
			[83, 10],
            [102, 13],
			[190, 18],
			[120, 15],
			[140, 20],
			[180, 18],
			[320, 27],
			[210, 22]
]
#Label of the training set (0 for black heart chrysanthemum, 1 for sunflower)
train_label = [0,0,0,0,0,0,0,1,1,1,1,1,1,1]

#In order to display the data characteristics on matplotlib, you need to convert it to np.ndarray
mytrain_data = np.array(Train_data)
mylb = np.array(train_label)

#print(mytrain_data[mylb==0,0])
#Draw data characteristic diagram
plt.title("myknn")
#Use the scatter chart to display the data of black heart chrysanthemum
    #arr_name[arr1_name==xx,0]: get arr_ In the name array, the subscript is the same as the element No. 0 of the element with element content of 0 in the mylb array
plt.scatter(mytrain_data[mylb==0,0],mytrain_data[mylb==0,1],label="hxj")
#Use a scatter chart to display sunflower data
plt.scatter(mytrain_data[mylb==1,0],mytrain_data[mylb==1,1],label="xrk")
#Show test points (test sets)
x_test = [120,17]
plt.scatter(x_test[0],x_test[1],label="test-point")
plt.legend()

mydist = []
#Calculate the Euclidean distance from the test point to all points
for x in mytrain_data:
    mydist.append(np.sqrt(np.sum((x-x_test)**2)))
#Convert the calculated distance to np.ndarray
mydit = np.array(mydist)

#Use the subscript sorting method to obtain the subscripts of the K nearest elements
myret = np.argsort(mydit)
print(mydit)
print(myret)
K = 5
#Gets the label of the most recent K elements
myTop = mylb[myret[:K]]
print(myTop)
#Statistics data in myTop
ret = Counter(myTop)
#Get statistical results
print(ret.most_common(),type(ret.most_common()))
#Classification of output test points
print(x_test,"class is:",ret.most_common()[0][0],end=" ")
plt.show()


     Use of KNN: it is often used for simple binary classification (the dimension of eigenvalues in data cannot be too high)

     Defects of KNN:
         1. The algorithm has an unexplainable type
         2. Prone to dimensional disasters
         3. Prone to computing disaster

     Features of KNN:
         There is no training process, or the training process is to record the data of training sets and labels


2. Linear regression


     Problem of predicting a value
     That is to study a linear problem, that is, the test set and the predicted value show a certain linear relationship, and what we want to study is how to infer this relationship.
     That is, a very close linear relationship is fitted through the law between discrete points.

     Algorithm idea of univariate linear regression:
         We find that the model presents the law of oblique section, that is, y = K * x + B
         Our requirement is to find the optimal K and B      

import numpy as np
import matplotlib.pyplot as plt

#y = 3x + 4

k = 3
b = 4

tr_x = np.random.uniform(0,40,100)
tr_y = tr_x * k + b
#print(tr_x)
#print(tr_y)

train_data = []
for i in range(tr_x.size):
    train_data.append([np.random.normal(tr_x[i],2),np.random.normal(tr_y[i],2)])

#print(train_data)
mytrain_data = np.array(train_data).reshape(-1,2)

#Find xmean, ymean
xmean = np.mean(mytrain_data[:,0])
ymean = np.mean(mytrain_data[:,1])
print(xmean,ymean)
#Seek k
    #Find the molecule of k
fz = np.sum((mytrain_data[:,0] - xmean)*(mytrain_data[:,1] - ymean))
print(fz)
    #Find the denominator of k
fm = np.sum((mytrain_data[:,0] - xmean)**2)
print(fm)
k = fz / fm
print(k)

#Seek b
b = ymean - k * xmean
print(b)

xp = np.arange(0,40)
yp = k * xp + b
#print(mytrain_data)
plt.title("MY-Liner")
plt.scatter(mytrain_data[:,0],mytrain_data[:,1],label="train_point")
plt.plot(tr_x,tr_y,color="red",label="random-liner")
plt.plot(xp,yp,color="yellow",label="model-liner")
plt.legend()
plt.show()

 

Tags: AI neural networks NLP

Posted on Fri, 17 Sep 2021 06:24:25 -0400 by peterg0123