# SVM with Python | Support Vector Machines (SVM) Vector Machines Machine Learning | KGP Talkie

## What is Support Vector Machines (SVM)

We will start our discussion with little introduction about `SVM`. `Support Vector Machine`(SVM) is a supervised `binary` classification algorithm. Given a set of points of two types in `N-dimensional` place SVM generates a `(N−1) dimensional` hyperplane to separate those points into two groups.

A `SVM` classifier would attempt to draw a `straight line` separating the `two sets` of data, and thereby create a `model` for `classification`. For `two dimensional` data like that shown here, this is a task we could do by hand. But immediately we see a problem: there is `more than one` possible dividing line that can perfectly `discriminate` between the two classes.

• Support Vectors
• Hyperplane
• Margin

## Support Vectors

`Support vectors` are the data points, which are closest to the `hyperplane`. These points will define the `separating line` better by calculating `margins`. These points are more relevant to the `construction` of the `classifier`.

#### Hyperplane

A `hyperplane` is a decision plane which `separates` between a set of objects having `different class` memberships.

#### Margin

A `margin` is a gap between the two lines on the closest `class points`. This is calculated as the `perpendicular distance` from the line to `support vectors` or closest points. If the `margin` is larger in between the `classes`, then it is considered a `good margin`, a smaller margin is a `bad margin`.

## How SVM works?

• Generate `hyperplanes` which segregates the classes in the best way. Left-hand side figure showing `three hyperplanes` black, blue and orange. Here, the blue and orange have higher `classification error`, but the black is separating the two classes correctly.
• Select the right `hyperplane` with the maximum `segregation` from the either nearest data points as shown in the right-hand side figure.

## Separation Planes

• Linear
• Non-Linear

### Dealing with non-linear and inseparable planes

`SVM` uses a `kernel` trick to transform the input space to a `higher dimensional` space

#### Beauty of Kernal

`kernels` allow us to do stuff in `infinite dimensions`. Sometimes going to `higher dimension` is not just computationally `expensive`, but also `impossible`. function can be a `mapping` from `n-dimension` to `infinite dimension` which we may have little idea of how to deal with. Then kernel gives us a wonderful shortcut.

#### SVM Kernels

• Linear
• Polynomial

The `SVM` algorithm is implemented in practice using a `kernel`. Kernel helps you to build a more `accurate` classifier.

• A linear kernel can be used as normal `dot product` any two given observations. The product between `two vectors` is the sum of the `multiplication` of each pair of `input values`.Training a SVM with a Linear Kernel is Faster than with any other Kernel.
• A `polynomial kernel` is a more generalized form of the `linear kernel`. The polynomial kernel can distinguish curved or nonlinear `input space`.
• The `Radial basis function (RBF)` kernel is a popular `kernel function` commonly used in `Support Vector Machine` classification. `RBF` can map an input space in `infinite dimensional` space.

## Let’s Build Model in sklearn

```import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
```
```from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
```
```cancer = datasets.load_breast_cancer()
cancer.keys()
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])
print(cancer.DESCR)```
```.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
- radius (mean of distances from center to points on the perimeter)
- texture (standard deviation of gray-scale values)
- perimeter
- area
- smoothness (local variation in radius lengths)
- compactness (perimeter^2 / area - 1.0)
- concavity (severity of concave portions of the contour)
- concave points (number of concave portions of the contour)
- symmetry
- fractal dimension ("coastline approximation" - 1)

The mean, standard error, and "worst" or largest (mean of the three
worst/largest values) of these features were computed for each image,
resulting in 30 features.  For instance, field 0 is Mean Radius, field

- class:
- WDBC-Malignant
- WDBC-Benign

:Summary Statistics:

===================================== ====== ======
Min    Max
===================================== ====== ======
texture (mean):                       9.71   39.28
perimeter (mean):                     43.79  188.5
area (mean):                          143.5  2501.0
smoothness (mean):                    0.053  0.163
compactness (mean):                   0.019  0.345
concavity (mean):                     0.0    0.427
concave points (mean):                0.0    0.201
symmetry (mean):                      0.106  0.304
fractal dimension (mean):             0.05   0.097
texture (standard error):             0.36   4.885
perimeter (standard error):           0.757  21.98
area (standard error):                6.802  542.2
smoothness (standard error):          0.002  0.031
compactness (standard error):         0.002  0.135
concavity (standard error):           0.0    0.396
concave points (standard error):      0.0    0.053
symmetry (standard error):            0.008  0.079
fractal dimension (standard error):   0.001  0.03
texture (worst):                      12.02  49.54
perimeter (worst):                    50.41  251.2
area (worst):                         185.2  4254.0
smoothness (worst):                   0.071  0.223
compactness (worst):                  0.027  1.058
concavity (worst):                    0.0    1.252
concave points (worst):               0.0    0.291
symmetry (worst):                     0.156  0.664
fractal dimension (worst):            0.055  0.208
===================================== ====== ======

:Missing Attribute Values: None

:Class Distribution: 212 - Malignant, 357 - Benign

```
```cancer.target_names
array(['malignant', 'benign'], dtype='<U9')
cancer.feature_names[: 5]```
```array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
'mean smoothness'], dtype='<U23')```
`cancer.feature_names.shape`
`(30,)`
```X = cancer.data
y = cancer.target
X.shape, y.shape```
`((569, 30), (569,))`

Let’s print the slicing array of x , y:

```X[: 2]
```
```array([[1.799e+01, 1.038e+01, 1.228e+02, 1.001e+03, 1.184e-01, 2.776e-01,
3.001e-01, 1.471e-01, 2.419e-01, 7.871e-02, 1.095e+00, 9.053e-01,
8.589e+00, 1.534e+02, 6.399e-03, 4.904e-02, 5.373e-02, 1.587e-02,
3.003e-02, 6.193e-03, 2.538e+01, 1.733e+01, 1.846e+02, 2.019e+03,
1.622e-01, 6.656e-01, 7.119e-01, 2.654e-01, 4.601e-01, 1.189e-01],
[2.057e+01, 1.777e+01, 1.329e+02, 1.326e+03, 8.474e-02, 7.864e-02,
8.690e-02, 7.017e-02, 1.812e-01, 5.667e-02, 5.435e-01, 7.339e-01,
3.398e+00, 7.408e+01, 5.225e-03, 1.308e-02, 1.860e-02, 1.340e-02,
1.389e-02, 3.532e-03, 2.499e+01, 2.341e+01, 1.588e+02, 1.956e+03,
1.238e-01, 1.866e-01, 2.416e-01, 1.860e-01, 2.750e-01, 8.902e-02]])```
```y[: 10]
```
`array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])`

## Standardization

`Standardization` of a dataset is a common requirement for many `machine learning` estimators: they might behave badly if the individual feature do not more or less look like standard `normally distributed` data (e.g. Gaussian with `0` mean and `unit` variance).
The idea behind `StandardScaler()` is that it will transform your data such that its `distribution` will have a `mean` value `0` and `standard deviation` of `1`.

```scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled[2:2]
```
`array([], shape=(0, 30), dtype=float64)`

## Split the data and build the model

```X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size = 0.2, random_state = 1, stratify = y)
```

### Linear kernel

Let’s create a `Linear Kernel` SVM using the `sklearn` library of Python. Linear Kernel is used when the data is `Linearly` separable, that is, it can be separated using a `single Line`. It is one of the most common kernels to be used. It is mostly used when there are a Large number of Features in a particular Data Set.

```from sklearn import svm
```
```clf = svm.SVC(kernel='linear')
clf.fit(X_train, y_train)
y_predict = clf.predict(X_test)

print('Accuracy: ', metrics.accuracy_score(y_test, y_predict))

print('Precision: ', metrics.precision_score(y_test, y_predict))
print('Recall: ', metrics.recall_score(y_test, y_predict))

print('Confusion Matrix')

mat = metrics.confusion_matrix(y_test, y_predict)
sns.heatmap(mat, square = True, annot = True, fmt = 'd', cbar = False,
xticklabels=cancer.target_names,
yticklabels=cancer.target_names)

plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()
```
```Accuracy:  0.9649122807017544
Precision:  0.9594594594594594
Recall:  0.9861111111111112
Confusion Matrix
```

### np.unique()

This function returns an `array of unique elements` in the input array. The function can be able to return a `tuple of array of unique vales` and an array of associated indices. Nature of the indices depend upon the type of `return` parameter in the function call.
Let’s see the following code:

```element, count = np.unique(y_test, return_counts=True)
element, count
```
`(array([0, 1]), array([42, 72], dtype=int64))`
```X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1, stratify = y)

clf = svm.SVC(kernel='linear')
clf.fit(X_train, y_train)
y_predict = clf.predict(X_test)

print('Accuracy: ', metrics.accuracy_score(y_test, y_predict))
```
```Accuracy:  0.9649122807017544
```

## Polynomial Kernel

The `Polynomial kernel` is a `non-stationary` kernel. Polynomial kernels are well suited for problems where all the training data is `normalized`. In the case of this kernel, you also have to pass a value for the `degree` parameter of the SVC class. This basically is the `degree` of the polynomial.
Take a look at how we can use a `polynomial` kernel to implement kernel SVM:

```clf = svm.SVC(kernel='poly', degree = 5, gamma = 100)
clf.fit(X_train, y_train)
y_predict = clf.predict(X_test)

print('Accuracy: ', metrics.accuracy_score(y_test, y_predict))

print('Precision: ', metrics.precision_score(y_test, y_predict))
print('Recall: ', metrics.recall_score(y_test, y_predict))

print('Confusion Matrix')

mat = metrics.confusion_matrix(y_test, y_predict)
sns.heatmap(mat, square = True, annot = True, fmt = 'd', cbar = False,
xticklabels=cancer.target_names,
yticklabels=cancer.target_names)

plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()
```
```Accuracy:  0.631578947368421
Precision:  0.631578947368421
Recall:  1.0
Confusion Matrix
```

## Sigmoid Kernel

Finally, let’s use a `sigmoid kernel` for implementing Kernel SVM.The `sigmoid kernel` was quite popular for support vector machines due to its origin from `neural networks`.To use the `sigmoid kernel`, you have to specify `'sigmoid'` as value for the kernel parameter of the SVC class.
Take a look at the following script.

```clf = svm.SVC(kernel='sigmoid', gamma = 200, C = 10000)
clf.fit(X_train, y_train)
y_predict = clf.predict(X_test)

print('Accuracy: ', metrics.accuracy_score(y_test, y_predict))

print('Precision: ', metrics.precision_score(y_test, y_predict))
print('Recall: ', metrics.recall_score(y_test, y_predict))

print('Confusion Matrix')

mat = metrics.confusion_matrix(y_test, y_predict)
sns.heatmap(mat, square = True, annot = True, fmt = 'd', cbar = False,
xticklabels=cancer.target_names,
yticklabels=cancer.target_names)

plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()
```
```Accuracy:  0.631578947368421
Precision:  0.631578947368421
Recall:  1.0
Confusion Matrix
```

Subscribe
Notify of 