Support Vector Machines (SVM): Separating Classes Accurately

Combining simplicity and power, SVMs provide an elegant solution to many classification problems.

Jan 25, 2025

Hello!!

I'm really excited to continue our series "Top 8 Machine Learning Algorithms", as today it's time to dive into SVM - Support Vector Machines! ✨

Support Vector Machines are one of the most powerful and versatile techniques in the field of machine learning. They are widely used in classification and regression tasks due to their ability to handle complex problems, even in high-dimensional spaces.

In this article, we'll explore what SVMs are, how they work, and their main applications.

You can find the code on Colab at: https://exploringartificialintelligence.substack.com/p/notebooks

What are Support Vector Machines?

SVMs are supervised algorithms that aim to find a hyperplane that best separates the data into different classes.

A hyperplane is a line (in 2D), a plane (in 3D), or a higher-dimensional structure that divides the data in such a way that points from different classes are on opposite sides.

The main goal of the algorithm is to maximize the margin, i.e., the distance between the hyperplane and the closest data points from each class, known as support vectors. These support vectors are crucial for building the SVM model, as they define the decision boundary between the classes.

A larger margin typically implies better generalization to new data.

In other words, the SVM algorithm seeks to find a separating hyperplane that maximizes this margin, making it a maximum-margin classifier.

The image above illustrates the basic functioning of a support vector machine in a binary classification problem (with two classes).

The blue and green points represent training data belonging to two different classes, and the SVM's task is to separate these two classes in such a way that each is on a distinct side.

The hyperplane is the red line that optimally separates the two classes. In a two-dimensional problem (as shown here), the hyperplane is a straight line. For higher dimensions, it could be a plane or a more complex structure.

The margin (yellow region) is the distance between the central hyperplane and the parallel lines along the support vectors (blue and green). The goal of the SVM is to maximize this margin, ensuring a more robust separation between the classes.

The parallel lines (wx - b = 1 and wx - b = -1) are the margins' boundaries, passing through the data points closest to the hyperplane from each class. These points are known as support vectors.

Support vectors are the points that touch the parallel lines (one line for each class) and determine the position and orientation of the hyperplane. Only these points are used in calculating the hyperplane, making the model efficient.

The arrow w indicates the normal vector to the hyperplane. It is perpendicular to the separating line and defines its direction.

How do SVMs work?

The basic functioning of an SVM can be broken down into several key steps:

Selection of the Hyperplane:

The algorithm identifies the hyperplane that optimally separates the data. In the case of a linearly separable dataset, the hyperplane is defined by the largest possible margin.

Support Vectors:

These are the data points that are closest to the hyperplane. They are essential for defining the position and orientation of the hyperplane.

Kernel Trick:

When the data is not linearly separable, SVMs use kernel functions to map the data into a higher-dimensional space, where a linear hyperplane can be found. Examples of kernels include linear, polynomial, Gaussian (RBF), and sigmoid.

Regularization:

A regularization parameter (C) controls the balance between maximizing the margin and minimizing classification error on the training data. High values of C place more importance on correctly classifying the data, while low values prioritize a larger margin.

Advantages and Disadvantages of SVM

Advantages:

Effective in high-dimensional spaces, capable of handling situations where the number of dimensions is greater than the number of samples.
Uses a robust approach for non-linearly separable data through kernels, handling complex problems that could not be solved with just a linear separating hyperplane.

Disadvantages:

Can be computationally intensive, especially with large datasets.
Choosing the kernel and tuning parameters requires experimentation and can be challenging.
Not inherently probabilistic, which makes it harder to directly interpret the probabilities associated with the classifications.

Applications of SVM

Support Vector Machines have various practical applications, such as:

Pattern Recognition: Image recognition, facial recognition, and text classification.
Anomaly Detection: Fraud analysis in financial transactions and failure detection in systems.
Bioinformatics: Gene classification, disease prediction, and protein analysis.
Recommendation Systems: User segmentation and prediction of preferences.

Practical Example in Python

Now it’s time to get hands-on! Let’s code.

Below is a code snippet that classifies emails as spam or not spam. Create a Jupyter notebook and paste the code:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import PCA

# Example emails (fictional data)
emails = [
    "Buy now and get a 50% discount!", 
    "Your invoice is due tomorrow. Pay now.", 
    "You have been selected to win a prize!", 
    "Meeting scheduled for Monday at 2 PM.", 
    "Click here to redeem your exclusive reward.", 
    "The report has been sent for review.", 
    "Last chance to participate in the promotion!", 
    "Congratulations! You've won an exclusive discount!", 
    "Limited offer! Buy now and save!", 
    "Take advantage of the promotion, buy one and get two!", 
    "Your account is locked, click here to reactivate it.", 
    "Win a 500 reais prize! Click to claim it!",
    "Don't miss your chance to win an iPhone!", 
    "Your payment was successfully received.",
    "New product on our website! Check it out now!",
    "Your balance has been updated. Access your account.",
    "Sign up for our free digital marketing course!", 
    "Your subscription was automatically renewed.", 
    "Last opportunity! 70% discount promotion!", 
    "Reminder: The meeting is tomorrow at 10 AM.", 
    "Your package has been shipped. Track your delivery.", 
    "Attention! Important update about your account.", 
    "Click here for more information about our promotion.", 
    "Stay up to date with the latest news on our blog.", 
    "Your order has been successfully confirmed!", 
    "Congratulations, you have been selected in the giveaway!", 
    "Unmissable promotion for new customers. Enjoy!", 
    "Your payment is pending. Pay now!", 
    "Don't miss your chance to win concert tickets!", 
    "Exclusive offer for VIP members, access now!"
]

# Labels (1 = spam, 0 = not spam)
labels = [1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1]

# Convert emails into a word count matrix
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails).toarray()

# Dimensionality reduction to 2 features using PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_reduced, labels, test_size=0.3, random_state=42)

# Train the SVM model
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)

# Visualize the hyperplane with the top 2 components
x_min, x_max = X_reduced[:, 0].min() - 1, X_reduced[:, 0].max() + 1
y_min, y_max = X_reduced[:, 1].min() - 1, X_reduced[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))

# Predict the class of each point on the grid
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot the data points and the hyperplane
plt.contourf(xx, yy, Z, alpha=0.75, cmap=plt.cm.coolwarm)
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=labels, edgecolors='k', cmap=plt.cm.coolwarm, marker='o')
plt.title("SVM - Spam vs Not Spam (With PCA)")
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()

Result:

Our code trains an SVM model to identify emails as spam or not. First, we transform the emails into a numerical matrix where each word becomes a variable that can be analyzed by the model. This process is done with the CountVectorizer, which converts the emails into word counts.

Since this matrix can have many columns (one for each word), we use PCA (Principal Component Analysis) to reduce the dimensionality of the data to two main variables, allowing the separation between spam and non-spam to be represented in a 2D graph.

Next, the data is split into 70% for training and 30% for testing, we train the model and generate a graph showing the separation between spam (red) and non-spam (blue).

You can find the code on Colab at: https://exploringartificialintelligence.substack.com/p/notebooks

Conclusion

Support Vector Machines are a powerful tool in the machine learning arsenal. Their ability to handle complex problems and high-dimensional spaces makes them a popular choice for various applications.

However, like any technique, they have limitations and require careful tuning to achieve optimal performance. With a proper understanding of the fundamentals and efficient use of kernels, SVMs can be an effective solution to many challenging problems.

In the next post, we will dive deeper into K-Nearest Neighbors (KNN). ❤️

Exploring Artificial Intelligence

Discussion about this post