Adaptive shrinking |
Non-linear and binary classifications |
Distributed HPC architecture |
Eliminates insignificant data in an adaptive and parallel manner. |
- A fast and efficient method of reducing data
- Fast convergence
- Robust for data sparsity
- Special support of sparse data
- Up to 26 times computation speedup
|
- Not robust for different types of data and different settings of hyperparameters
- Incorrect eliminations of samples leading to incorrect boundaries
|
Narasimhan et al. [2014] |
Scalable approach and adaptive shrinking |
Non-linear and binary classifications |
Distributed HPC architecture |
Uses an adaptive shrinking for elimination of insignificant samples on large scale systems |
- Good scalability in terms of the number of processors (up to 4096 cores)
- Scales well to a large number of training data
- Maintaining accuracy using synchronized data structures
- Space complexity reduction
- Adaptive reconstitution of data structures
- Intuitive heuristics for adaptive shrinking
|
- Limited parallelizability of SMO
|
Vishnu et al. [2015] |
Batch mode SVM |
Non-linear and binary classifications |
Distributed HPC architecture |
Uses decomposition techniques with batch method |
- Overcomes convergence problems
- Handles a large set of SVs that does not fit in the memory
- Solves large classification and regression problems
|
- Unclear scalability in terms of the number of processors
|
Yom-Tov [2004] |
Parallel SVM |
Non-linear and binary classifications and regression |
Distributed HPC architecture |
Efficiently parallelizes SVM to reach high speedup using decomposition |
- Supports different formulations of SVMs for classification and regression problems
- Supports sparse data representations
- Memory management with caching
|
- Benefits not fully clear in terms of speedup in some cases
- Data type impact the speedup
|
Brugger [2006] |
PDS |
Non-linear and binary classifications |
Distributed HPC parallelism |
Formalizes an SVM optimization problem using a different duality called Fenchel duality |
- Increases the efficiency and suitability of the problem regarding distributed parallelism
- Distributed computing with clusters of independent nodes and memory
- k/2 speedup using k processors
|
- Inefficient for easy problems with a few numbers of SVs
|
Hazan et al. [2008] |
MRPsvm |
Non-linear and binary classifications |
Map-reduce based parallelism |
Implements a decomposition method and uses SMO as an inner solver for multi-core systems |
- Reduces memory requirements
- Handles data sparsity
- Higher speedup compared to Libsvm
|
- Reforming sparse format back to dense format for conducting dot products
- Needs more efficient memory usage
- Only slight performance improvement in terms of time for increasing 4 cores to 8 cores
- Limited parallelizability of SMO
|
Zhao et al. [2011] |
Ontology-based Parallel SMO |
Linear and binary classifications |
Distributed big data architecture using map-reduce |
Distributes training data across multiple nodes and uses an ontology-based enhancement strategy as a feedback to minimize the impact of the deterioration of accuracy |
- Improves accuracy
- Good scalability in terms of the number of training samples
- Easy implementation
|
- Accuracy depends on the quality of intelligence provided by the end-user
- Limited parallelizability of SMO
|
Caruana et al. [2011] |
RAMSMO |
Multi-class classification |
Distributed big data architecture using map-reduce |
Implements a resource aware multi-class classification algorithm for image annotation which distributes data into smaller binary chunks |
- Load balancing of imbalanced data
- Fast convergence
- Reduces the overheads
- Solves multi-class classifications
- Computes on heterogeneous computing environment
- Improves the performance in terms of time while keeping high accuracy
- Good scalability
|
- Unclear impact of class sizes on the classification accuracy
- Complex load balancing strategy
- Limited parallelizability of SMO
|
Alham et al. [2014] |
OHD-SVM |
Non-linear and binary classifications |
GPU-based parallelism |
Implements a hierarchical decomposition algorithm that fits better GPU architecture |
- Publicly available code
- Robust to sparse and dense data formats
- Higher speedup compared to the fastest published GPU-based implementations
|
- Limited shared memory
- Limited parallelizability of SMO
|
Vanek et al. [2017] |
gpuSVM |
Linear, non-linear, and binary classifications |
GPU-based parallelism |
Implements fast SVMs with adaptive first and second order working set selection heuristics |
- An open source implementation
- Robust to data sparsity
- Supports multiple kernel functions
- Improves the performance in terms of time using adaptive heuristics
- Higher speedup (5 to 35 times) compared to Libsvm
- One of the best GPU-based SVM implementations in terms of time and accuracy
|
- Limited arithmetic precision
- Limited parallelizability of SMO
|
Catanzaro et al. [2008] |
cuSVM |
Non-linear and binary classifications |
GPU-based parallelism |
Implements a modified SMO decomposition method using kernel caching |
- An open source implementation
- Improves the performance of SMO in terms of accuracy and time compared to Libsvm
- Solves classification and regression problems
- Uses mixed precision floating point arithmetics
|
- Unclear robustness for data sparsity
- Slower than gpuSVM in most cases
- No support of a class validation strategy
- Benefits of doubling the precision did not lead to doubling accuracy
- Limited parallelizability of SMO
|
Carpenter [2009] |
P2SMO |
Linear, non-linear, and multi-class classifications |
GPU-based parallelism |
Implements the first GPU-based SVM implementation of one-versus all multi-class classification |
- Robust to dense data format
- Supports multiple kernel functions
|
- Task-controlling overhead
- Creates too many classifiers
|
Herrero-Lopez [2011] |
SVM for high-throughput screening data |
Non-linear classifications |
GPU-based parallelism |
Modifies SMO using a large number of working set variables and caching |
- Improves the performance in terms of accuracy and time compared to SVMLight
- Improves performance of the algorithm in terms of accuracy and time using data movement between CPUs and GPUs
- Reduces the complexity of the SMO algorithm
|
- The used kernel function may not be applicable in other cases
- Restricted and limited application domain
|
Liao et al. [2009] |
GPU-assisted of LibSVM |
Non-linear classifications |
GPU-based parallelism |
Implements a hybrid CPU-GPU parallel implementation of modified Libsvm |
- Implements a cross-validation that leads to n-fold speedup compared to the standard grid search
- Avoids re-computation of kernel elements at every iteration in the cross-validation
- Hybrid parallelism leads to one order of magnitude speedup compared to only CPU parallelism
- Good scalability in terms of the number of input data
|
- Needs to load the whole input data into the GPU memory
- Limited GPU memory
|
Athanasopoulos et al. [2011] |
SVM with sparse matrix formats |
Non-linear and multi-class classifications |
GPU-based parallelism |
Implements efficient sparse matrix operations in SVM using caching |
- Reduces task-controlling overhead
- Higher speedup (134 times) compared to Libsvm
- Robust to data sparsity
- Creates fewer classifiers (N(N-1)/2) compared to the similar approaches
- Improves the performance in terms of time using sparse matrix format
|
- Benefits not fully clear in dense matrix formats
- Benefits not fully clear in some cases
|
Lin and Chien [2010] |
GSVM |
Non-linear and binary classifications |
GPU-based parallelism |
Implements parallel testing and training for large-scale problems for intruder detection |
- Fast data access
- Reduces computational time using scatter-parallel-reduce-gather strategy
- Real-time processing capabilities for large-scale problems
- Higher speedup compared to the similar approaches
- Good scalability in terms of the number of input data
- Improves the performance in terms of time without loss of classification accuracy compared to Libsvm
- Explainable results
|
- Limited shared memory on GPUs
- Limited parallelizability of SMO
|
Zhang et al. [2014] |
CUDA-accelerated SVM for celestial object classification |
Binary and non-linear classifications |
GPU-based parallelism |
Implements accelerated SVMLight based classifications and Improves the training time (10 times) and predicting time (364 times) in celestial object classification |
- Higher speedup of training and predicting processes
- Improves the performance in terms of time compared to SVMLight
- A fast and reliable classifier
- Efficient selection of working set size
|
- Long runtime for small data due to failure of hiding I/O latency of GPU
- Limited shared memory on GPUs
|
Peng et al. [2011] |