# Python machine learning

3-day quick start python machine learning in 2018 [dark horse programmer]

## Machine learning (4) KNN algorithm

### KNN algorithm (also called K-nearest neighbor algorithm)

#### Here is a small question to explain what KNN algorithm is

Here is a map we have. Now we only know where the five villains are and how far they are from us. Next, we need to use KNN algorithm to infer where we are

We are in the red circle, so it's obvious that the nearest little blue is most likely in our region

Core idea: extrapolate our categories from our neighbors

#### Definition

If most of the k most similar samples in the feature space belong to a certain category, then the sample also belongs to that category

#### Second example: Film

**The known training sets are such a few films. They have 6 samples and 2 features. The target values are love films and action films. We need to judge according to the features? What kind of films do they belong to

Figure 2 shows the distance between these six films and? Films. The situation we discussed earlier is that k=1, here

When k = 1, it's a love movie

When k = 2, it's a love movie

When k= 6,? Cannot be determined

**

Therefore, our judgment result has a great relationship with k. if k is too small, it will be affected by the abnormal value

For data, we need to use dimensionless processing, that is, standardized processing of data

#### Case: prediction of Iris species

##### 1. Import iris data set first

from sklearn.datasets import load_iris iris = load_iris()

##### 2. Partition data set

from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=6)

##### 3. Characteristic Engineering: Standardization

from sklearn.preprocessing import StandardScaler transfer = StandardScaler() x_train = transfer.fit_transform(x_train) x_test = transfer.transform(x_test)

##### 4. Create KNN algorithm predictor

from sklearn.neighbors import KNeighborsClassifier estimator = KNeighborsClassifier(n_neighbors=3) # Take k=3 here. estimator.fit(x_train, y_train)

##### 5. Model evaluation

###### 5.1 direct comparison of real value and predicted value

y_predict = estimator.predict(x_test) print('y_predict:\n', y_predict) print('Direct comparison of real and predicted values:\n', y_test == y_predict)

y_predict: [0 2 0 0 2 1 1 0 2 1 2 1 2 2 1 1 2 1 1 0 0 2 0 0 1 1 1 2 0 1 0 1 0 0 1 2 1 2] //Direct comparison of real and predicted values: [ True True True True True True False True True True True True True True True False True True True True True True True True True True True True True True True True True True False True True True]

###### 5.2 calculation accuracy

score = estimator.score(x_test, y_test) print('Accuracy rate is:\n', score)

The accuracy is: 0.9210526315789473

from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier def knn_iris(): ''' //Classification of iris by KNN algorithm :return: ''' # 1) Get data iris = load_iris() # 2) Partition data set x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=6) # 3) Feature Engineering: Standardization transfer = StandardScaler() x_train = transfer.fit_transform(x_train) x_test = transfer.transform(x_test) # 4) KNN algorithm predictor estimator = KNeighborsClassifier(n_neighbors=3) estimator.fit(x_train, y_train) # 5) Model evaluation # Method 1: direct comparison between the real value and the predicted value y_predict = estimator.predict(x_test) print('y_predict:\n', y_predict) print('Direct comparison of real and predicted values:\n', y_test == y_predict) # Method 2: calculation accuracy score = estimator.score(x_test, y_test) print('Accuracy rate is:\n', score) return None if __name__ == '__main__': # Code 1: classification of iris by KNN algorithm knn_iris()