3-day quick start python machine learning in 2018 [dark horse programmer]
Machine learning (4) KNN algorithm
KNN algorithm (also called K-nearest neighbor algorithm)
Here is a small question to explain what KNN algorithm is
Here is a map we have. Now we only know where the five villains are and how far they are from us. Next, we need to use KNN algorithm to infer where we are
We are in the red circle, so it's obvious that the nearest little blue is most likely in our region
Core idea: extrapolate our categories from our neighbors
If most of the k most similar samples in the feature space belong to a certain category, then the sample also belongs to that category
**The known training sets are such a few films. They have 6 samples and 2 features. The target values are love films and action films. We need to judge according to the features? What kind of films do they belong to
Figure 2 shows the distance between these six films and? Films. The situation we discussed earlier is that k=1, here
When k = 1, it's a love movie
When k = 2, it's a love movie
When k= 6,? Cannot be determined
**
Therefore, our judgment result has a great relationship with k. if k is too small, it will be affected by the abnormal value
For data, we need to use dimensionless processing, that is, standardized processing of data
from sklearn.datasets import load_iris iris = load_iris()2. Partition data set
from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=6)3. Characteristic Engineering: Standardization
from sklearn.preprocessing import StandardScaler transfer = StandardScaler() x_train = transfer.fit_transform(x_train) x_test = transfer.transform(x_test)4. Create KNN algorithm predictor
from sklearn.neighbors import KNeighborsClassifier estimator = KNeighborsClassifier(n_neighbors=3) # Take k=3 here. estimator.fit(x_train, y_train)5. Model evaluation 5.1 direct comparison of real value and predicted value
y_predict = estimator.predict(x_test) print('y_predict:\n', y_predict) print('Direct comparison of real and predicted values:\n', y_test == y_predict)
y_predict: [0 2 0 0 2 1 1 0 2 1 2 1 2 2 1 1 2 1 1 0 0 2 0 0 1 1 1 2 0 1 0 1 0 0 1 2 1 2] //Direct comparison of real and predicted values: [ True True True True True True False True True True True True True True True False True True True True True True True True True True True True True True True True True True False True True True]5.2 calculation accuracy
score = estimator.score(x_test, y_test) print('Accuracy rate is:\n', score)The accuracy is: 0.9210526315789473
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier def knn_iris(): ''' //Classification of iris by KNN algorithm :return: ''' # 1) Get data iris = load_iris() # 2) Partition data set x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=6) # 3) Feature Engineering: Standardization transfer = StandardScaler() x_train = transfer.fit_transform(x_train) x_test = transfer.transform(x_test) # 4) KNN algorithm predictor estimator = KNeighborsClassifier(n_neighbors=3) estimator.fit(x_train, y_train) # 5) Model evaluation # Method 1: direct comparison between the real value and the predicted value y_predict = estimator.predict(x_test) print('y_predict:\n', y_predict) print('Direct comparison of real and predicted values:\n', y_test == y_predict) # Method 2: calculation accuracy score = estimator.score(x_test, y_test) print('Accuracy rate is:\n', score) return None if __name__ == '__main__': # Code 1: classification of iris by KNN algorithm knn_iris()Jocker_Tong 83 original articles published, praised 4, 4631 visitors Private letter follow