Support Vector Machine Complexity

SVM can be slow

1. Quadratic Complexity

Support Vector Machines have Quadratic Complexity. It can be shown as O(N^2) using Big-O notation. Quadratic Complexity is known to be sluggish and it doesn’t scale very well. Although most modern computers will handle certain amount of data without any problems SVM is not expected to work well with big data which may involve millions or billions of rows of data. For more detailed data size considerations with SVMs please see the related section below.

However, LinearSVM is a special class in scikit-learn which performs relatively well with larger datasets thanks to its use of optimized parameters (notably liblinear estimator, one vs all multiclass reduction and linear kernelling). This can be a good alternative when data is linear.

Also depending on the kernel SVM algorithm complexity can go up to O(N^3).

100+ years old algorithmic complexity notation

2. What is Big O?

Big O notation is commonly used to show time complexity or computation complexity of algorithms. It represents the worst-case scenario for computation regarding the specific algorithm. Big O arised from the work of German mathematicians Edmund Landau and Paul Bachman in late 19th century. “O” in “Big O” stands for “Order of Complexity”, a term that originated from German nomenclature of the same matter: “Ordnung der Komplexität”.

From fastest to slowest or least computationally exhaustive to most computationally exhaustive Big O Computational Complexities can be listed as:

O(1): Constant Runtime or No Complexity
O(n): Linear Complexity
O(log n): Logarithmic Complexity
O(n log n): Log-linear Complexity
O(n^2): Quadratic Complexity. This is also the complexity of Support Vector Machines.
O(2^n): Exponential Complexity
O(n!): Factorial Complexity

A Kernel for Non-Linear Data

3. Data Size

When working with Support Vector Machines approximately a range of 100K-500K rows will be an appropriate upper limit. Appropriate data size also depends on your computation resources and parameters adjustments, such as kernel choice.

If your model is simple enough you will likely be able to train and predict 250K rows below 10 minutes. With LinearSVM, model will be much faster. Basically, you can analyze data with millions of rows in a matter of minutes.

A Kernel for Polynomial Data

4. Feature Size

How about feature size? Is Support Vector Machine capable of handling 100s of labeled columns? What about 1000s? The answer is the more dimensional the dataset gets the more drawbacks of SVM’s Quadratic Complexity will be felt.

Also normalization scaling is an important operation to consider when working with kernels. Scikit-learn’s MinMaxScaler tool from preprocessing library can be very handy for this purpose. A dataset with large feature values can make a difference of 10X or more runtime saving after being transformed to [0, 1] range using MinMaxScaler normalization.

A Kernel for Sigmoid Data

5- Runtime Performance Tests

Our tests with SVC on a simple classification problem yielded computation results accordingly. (default rbf kernel was used in SVC models, all other parameters were left as default.)

LinearSVC (75K): 2 seconds
LinearSVC (150K): 8 seconds
LinearSVC (750K): 85.5 seconds
LinearSVC (1M): 125 seconds

SVC (75K): ~1 minute
SVC (150K): ~3 minutes
SVC (300K): ~12 minutes

Please note: the tests were done with non-dimensional data (only 2 features) using a fairly fast laptop (i7 8th Gen processor, 16GB RAM)

Of course there can be many different factors that can affect these computation results with Support Vector Machines, however, we believe they still provide an accurate guidance regarding performances of SVC, SVR, LinearSVC and LinearSVR. (NuSVC and NuSVR would be expected to yield very similar results to SVC and SVR since the only difference is the scaling of C parameter by using Nu parameter instead.) You can read more about differen Support Vector Machine Implementations below:

Or you can also refer to other machine learning algorithms with much more favorable runtime performances such as:

Random Forest Algorithm (classification and regression)
Naive Bayes Algorithm (classification only)
Logistic Regression Algorithm (classification only)

custom kernel example

https://scikit-learn.org/stable/auto_examples/svm/plot_custom_kernel.html

Praesent porttitor, nulla vitae posuere iaculis, arcu nisl dignissim dolor, a pretium mi sem ut ipsum. Fusce fermentum.