# summary

SVM was first proposed by Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1963. The current version (soft margin) was proposed by Corinna Cortes and Vapnik in 1993 and published in 1995. Before the emergence of deep learning (2012), SVM was considered as the most successful and best performing algorithm in machine learning in recent ten years.

Support vector machines (SVM) is a binary classification model. Of course, if modified, it can also be used for the classification of multi class problems. Its basic type is the linear classifier with the largest interval defined in the feature space. Support vector machine can be transformed into a nonlinear classifier by kernel technique. It belongs to binary classification algorithm and can support linear and nonlinear classification. The main idea is to find a hyperplane in space that is more sufficient to divide all data samples, and to minimize the distance from all data in this set to this hyperplane. In fact, there are many lines (or hyperplanes) that can separate the two types of targets. What we are looking for is the line (or hyperplane) with the maximum distance when dividing the two types of targets in these lines (or hyperplanes). We call such a line or hyperplane the best linear classifier.

This chapter will introduce nonlinear separable support vector machines.

# Experimental steps

## 1 install and import the required libraries

!pip install numpy==1.16.0 !pip install pandas==0.25.0 !pip install scikit-learn==0.22.1 !pip install matplotlib==3.1.0 import numpy as np import matplotlib.pyplot as plt from sklearn import svm %matplotlib inline from mpl_toolkits.mplot3d import Axes3D from sklearn.model_selection import GridSearchCV

## 2 create dataset

Before discussing the details of nonlinear SVM, we first create a nonlinear data set to see the effect. The code example is as follows:

np.random.seed(0) X_xor=np.random.randn(200,2) y_xor=np.logical_xor(X_xor[:,0]>0,X_xor[:,1]>0) y_xor=np.where(y_xor,1,-1) plt.scatter(X_xor[y_xor==1,0],X_xor[y_xor==1,1], c='b',marker='x',label='1') plt.scatter(X_xor[y_xor==-1,0],X_xor[y_xor==-1,1], c='r',marker='s',label='-1') plt.ylim(-3.0) plt.legend() plt.show

## 3 kernel function

Obviously, it is impossible to separate positive and negative classes with linear hyperplane. For the nonlinear case, SVM first completes the calculation in the low-dimensional space, and then maps the input space to the high-dimensional feature space through the kernel function. This high-dimensional feature space can make the original linear non separable data become linearly separable, and finally construct the optimal separation hyperplane in the high-dimensional feature space, So as to separate the nonlinear data which is not easy to be separated on the plane. Let 𝜙 (𝑥) represent the feature vector after mapping x, so in the feature space, the model corresponding to the divided hyperplane can be expressed as:

### Several common kernel functions

#### 1. Linear Kernel

#### 2 Polynomial Kernel

#### 3 Gaussian Kernel

## 4 three dimensional diagram to represent the dimension of the starting data

fig = plt.figure('3D scatter plot') ax = Axes3D(fig) ax.scatter(X_xor[y_xor==1,0]+6,X_xor[y_xor==1,1], c='b',marker='x',label='1',s=30,cmap='autumn') ax.scatter(X_xor[y_xor==-1,0],X_xor[y_xor==-1,1], c='r',marker='s',label='-1',s=30,cmap='autumn') ax.view_init(elev=30, azim=30)

## 5 sklearn to realize SVM RBF classification

With the concept of kernel function, we begin to train a kernel SVM to see whether it can correctly classify linear non separable data sets. In the classification, we used grid search to select the best super parameter in 9 cases formed by C=(0.1,1,10) and gamma = (1,0.1,0.01). We used 4-fold cross validation. This is just an example. In practice, you may need more parameter combinations to adjust parameters.

grid = GridSearchCV(svm.SVC(), param_grid={"C":[0.1, 1, 10], "gamma": [1, 0.1, 0.01]}, cv=4) grid.fit(X_xor, y_xor) print("The best parameters are %s with a score of %0.2f" % (grid.best_params_, grid.best_score_))

In other words, through grid search, among the 9 sets of super parameters given by us, C=1 and Gamma=1 have the highest scores, which is our final parameter candidate.

After training each of the nine combinations, we color the points in the grid and observe the effect picture of classification. The code is as follows:

x_min, x_max = X_xor[:, 0].min() - 1, X_xor[:, 0].max() + 1 y_min, y_max = X_xor[:, 1].min() - 1, X_xor[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max,0.02), np.arange(y_min, y_max, 0.02)) for i, C in enumerate((0.1, 1, 10)): for j, gamma in enumerate((1, 0.1, 0.01)): plt.subplot() clf = svm.SVC(C=C, gamma=gamma) clf.fit(X_xor,y_xor) Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) # Put the result into a color plot Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8) # Plot also the training points plt.scatter(X_xor[:, 0], X_xor[:, 1], c=y_xor, cmap=plt.cm.coolwarm) plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.xticks(()) plt.yticks(()) plt.xlabel(" gamma=" + str(gamma) + " C=" + str(C)) plt.show()