Fast SVM |
Non-linear and multi-class classifications |
Shared memory parallelism |
Quickly removes non-SV samples in the earlier steps of optimization and uses block diagonal matrices to approximate the kernel matrix |
- Runtime complexity linearly scales with the size of samples and classes
- Good scalability compared to Libsvm, SVMLight, and SVMTorch
- Reduces problem size
- Efficient computation of the kernel matrix
- Efficient memory usage
- Feature reduction
- Efficient parameter tuning
|
- Slightly degrades the accuracy in some cases
- Benefits not fully clear for a large number of SVs
|
Dong et al. [2005] |
PS-SVM |
Non-linear and binary classifications |
Shared memory parallelism |
Implements an iterative re-weighted least square SVM instead of quadratic programming and parallelizes the matrix inversion |
- Twice speedup for doubling the number of cores
- Efficient matrix inversion in parallel
- An efficient approximation of a large matrix using a sparse greedy method
|
- Performs not well for small-scale problems or problems with a few numbers of SVs
- Efficiency deterioration using 8 cores
- Limited memory
- Benefits not fully clear for a large number of cores
|
Morales et al. [2011] |
CPA SVM |
Non-linear and multi-class classifications |
Shared memory parallelism |
Implements a modular approximate cutting plane algorithm in which a series of successive approximations to the original problem is generated and it uses structural kernels |
- Good parallelizability
- Reduces computationally expensive kernel evaluations
- Scales almost linearly with the number of CPUs
- Efficiently solves class imbalanced problems
|
- Limited scalability in terms of the number of training data
- Benefits are not fully clear for other kernels
|
Severyn and Moschitti [2011] |
FPGA SVM |
Non-linear and binary classifications |
FPGA* |
Uses a hybrid working set algorithm in which the working set size increases to reduce the iteration numbers |
- Speedup (25 times) compared to a single threaded implementation
- Speedups (15 and 23 times) compared to Libsvm and SVMLight
- Fast converges with a small number of iterations
|
- Dedicated hardware for kernel computations
- May suffer from limited RAM blocks
|
Venkateshan et al. [2015] |
Massive parallel SVM |
Non-linear and binary classifications |
FPGA |
Uses fixed-point arithmetic in computationally expensive operations |
- Fixed-point arithmetic accelerates the computations
- Builds compact circuits for reducing power dissipation
- An order of magnitude more performance than similar FPGA implementations
|
- Faster computations trade off the accuracy
- Dedicated hardware for kernel computations
|
Graf et al. [2009] |
The comparison of GPUs and FPGA |
Non-linear and binary classifications |
GPUs and FPGA |
Impalements Gilbert algorithm for solving SVMs in which it finds the point on a convex hull which is closest to the origin |
- Outperforms GPU-based parallelism for data that fit in FPGA RAM blocks
- Good parallelizability
- Explainable results
|
- Does not perform well for data that do not fit in FPGA RAM blocks
- Limited FPGA RAM blocks
- Speedups remain constant for the increasing training data size
- Limited scalability
|
Papadonikolakis [2009] |