# Python machine learning

3-day quick start python machine learning in 2018 [dark horse programmer]

## Machine learning (4) KNN algorithm

### KNN algorithm (also called K-nearest neighbor algorithm)

#### Here is a small question to explain what KNN algorithm is Here is a map we have. Now we only know where the five villains are and how far they are from us. Next, we need to use KNN algorithm to infer where we are
We are in the red circle, so it's obvious that the nearest little blue is most likely in our region
Core idea: extrapolate our categories from our neighbors

#### Definition

If most of the k most similar samples in the feature space belong to a certain category, then the sample also belongs to that category #### Second example: Film **The known training sets are such a few films. They have 6 samples and 2 features. The target values are love films and action films. We need to judge according to the features? What kind of films do they belong to
Figure 2 shows the distance between these six films and? Films. The situation we discussed earlier is that k=1, here
When k = 1, it's a love movie
When k = 2, it's a love movie
When k= 6,? Cannot be determined
**
Therefore, our judgment result has a great relationship with k. if k is too small, it will be affected by the abnormal value
For data, we need to use dimensionless processing, that is, standardized processing of data #### Case: prediction of Iris species

##### 1. Import iris data set first
```from sklearn.datasets import load_iris
```
##### 2. Partition data set
```from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=6)
```
##### 3. Characteristic Engineering: Standardization
```from sklearn.preprocessing import StandardScaler
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
```
##### 4. Create KNN algorithm predictor
```from sklearn.neighbors import KNeighborsClassifier
estimator = KNeighborsClassifier(n_neighbors=3) # Take k=3 here.
estimator.fit(x_train, y_train)
```
##### 5. Model evaluation
###### 5.1 direct comparison of real value and predicted value
```y_predict = estimator.predict(x_test)
print('y_predict:\n', y_predict)
print('Direct comparison of real and predicted values:\n', y_test == y_predict)

```
```y_predict:
[0 2 0 0 2 1 1 0 2 1 2 1 2 2 1 1 2 1 1 0 0 2 0 0 1 1 1 2 0 1 0 1 0 0 1 2 1
2]
//Direct comparison of real and predicted values:
[ True  True  True  True  True  True False  True  True  True  True  True
True  True  True False  True  True  True  True  True  True  True  True
True  True  True  True  True  True  True  True  True  True False  True
True  True]
```
###### 5.2 calculation accuracy
```score = estimator.score(x_test, y_test)
print('Accuracy rate is:\n', score)
```
```The accuracy is:
0.9210526315789473
```
```from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

def knn_iris():
'''
//Classification of iris by KNN algorithm
:return:
'''
# 1) Get data
# 2) Partition data set
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=6)
# 3) Feature Engineering: Standardization
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
# 4) KNN algorithm predictor
estimator = KNeighborsClassifier(n_neighbors=3)
estimator.fit(x_train, y_train)

# 5) Model evaluation
# Method 1: direct comparison between the real value and the predicted value
y_predict = estimator.predict(x_test)
print('y_predict:\n', y_predict)
print('Direct comparison of real and predicted values:\n', y_test == y_predict)

# Method 2: calculation accuracy
score = estimator.score(x_test, y_test)
print('Accuracy rate is:\n', score)

return None

if __name__ == '__main__':
# Code 1: classification of iris by KNN algorithm
knn_iris()

```  83 original articles published, praised 4, 4631 visitors

Tags: Python

Posted on Mon, 03 Feb 2020 09:28:55 -0500 by MisterWebz