### CompartiMOSS - 50 Numeros

El día de ayer la revista CompartiMOSS ha cumplido su numero 50. Para esto los

A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problem. Compared to newer algorithms like neural networks, they have two main advantages: higher speed and better performance with a limited number of samples (in the thousands).

How does SVM Works?

We can imagine we have two tags: *red* and *blue*, and our data has two features: *x* and *y*. We want a classifier that, given a pair of *(x,y)* coordinates, outputs if it’s either *red* or *blue*. We plot our already labeled training data on a plane.

A support vector machine takes these data points and outputs the hyperplane (which in two dimensions it’s simply a line) that best separates the tags. This line is the **decision boundary**: anything that falls to one side of it we will classify as *blue*, and anything that falls to the other as *red*.

But, what exactly is *the best* hyperplane? For SVM, it’s the one that maximizes the margins from both tags. In other words: the hyperplane (remember it's a line in this case) whose distance to the nearest element of each tag is the largest.

**Non Linear Data**

Now this example was easy, since clearly the data was linearly separable — we could draw a straight line to separate *red* and *blue*. Sadly, usually things aren’t that simple. Take a look at this case:

It’s pretty clear that there’s not a linear decision boundary (a single straight line that separates both tags). However, the vectors are very clearly segregated and it looks as though it should be easy to separate them.

So here’s what we’ll do: we will add a third dimension. Up until now we had two dimensions: *x* and *y*. We create a new *z* dimension, and we rule that it be calculated a certain way that is convenient for us: *z = x² + y²* (you’ll notice that’s the equation for a circle).

This will give us a three-dimensional space. Taking a slice of that space, it looks like this:

What can SVM do with this? Let’s see:

That’s great! Note that since we are in three dimensions now, the hyperplane is a plane parallel to the *x* axis at a certain* z* (let’s say *z = 1*).

What’s left is mapping it back to two dimensions:

And there we go! Our decision boundary is a circumference of radius 1, which separates both tags using SVM

The explanation above is taken from: https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/

**Show me the code**

Now, lets try to do SVM in Python:

Let's first load the required dataset you will use.

```
from sklearn import datasets
#Load dataset
cancer = datasets.load_breast_cancer()
```

After you have loaded the dataset, you might want to know a little bit more about it. You can check feature and target names.

```
# print the names of the features
print("Features: ", cancer.feature_names)
# print the label type of cancer('malignant' 'benign')
print("Labels: ", cancer.target_names)
```

```
Features: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
'mean smoothness' 'mean compactness' 'mean concavity'
'mean concave points' 'mean symmetry' 'mean fractal dimension'
'radius error' 'texture error' 'perimeter error' 'area error'
'smoothness error' 'compactness error' 'concavity error'
'concave points error' 'symmetry error' 'fractal dimension error'
'worst radius' 'worst texture' 'worst perimeter' 'worst area'
'worst smoothness' 'worst compactness' 'worst concavity'
'worst concave points' 'worst symmetry' 'worst fractal dimension']
Labels: ['malignant' 'benign']
```

Split the dataset by using the function `train_test_split()`

. you need to pass 3 parameters features, target, and test_set size. Additionally, you can use random_state to select records randomly.

```
from sklearn.model_selection import train_test_split
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.3,random_state=109) # 70% training and 30% test
```

Let's build support vector machine model. First, import the SVM module and create support vector classifier object by passing argument kernel as the linear kernel in `SVC()`

function.

Then, fit your model on train set using `fit()`

and perform prediction on the test set using `predict()`

.

```
from sklearn import svm
#SVM Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel
#Train the model
clf.fit(X_train, y_train)
#Predict the response for test dataset
y_pred = clf.predict(X_test)
```

Let's estimate how accurately the classifier or model can predict the breast cancer of patients.

Accuracy can be computed by comparing actual test set values and predicted values.

```
from sklearn import metrics
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
```

`Accuracy: 0.9649122807017544`

you can also check precision and recall of model

```
print("Precision:",metrics.precision_score(y_test, y_pred))
print("Recall:",metrics.recall_score(y_test, y_pred))
```

```
Precision: 0.9811320754716981
Recall: 0.9629629629629629
```

I also suggest to read the following post, where its explained in a different way