The comparison of articles and methods that perform decomposition techniques in parallel. Here, we highlight the articles in the survey that have interesting findings. The description of each column is as follows,

Note the table can be sorted by clicking on the title of each column.

Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
Adaptive shrinking Non-linear and binary classifications Distributed HPC architecture Eliminates insignificant data in an adaptive and parallel manner.
  • A fast and efficient method of reducing data
  • Fast convergence
  • Robust for data sparsity
  • Special support of sparse data
  • Up to 26 times computation speedup
  • Not robust for different types of data and different settings of hyperparameters
  • Incorrect eliminations of samples leading to incorrect boundaries
Narasimhan et al. [2014]
Scalable approach and adaptive shrinking Non-linear and binary classifications Distributed HPC architecture Uses an adaptive shrinking for elimination of insignificant samples on large scale systems
  • Good scalability in terms of the number of processors (up to 4096 cores)
  • Scales well to a large number of training data
  • Maintaining accuracy using synchronized data structures
  • Space complexity reduction
  • Adaptive reconstitution of data structures
  • Intuitive heuristics for adaptive shrinking
  • Limited parallelizability of SMO
Vishnu et al. [2015]
Batch mode SVM Non-linear and binary classifications Distributed HPC architecture Uses decomposition techniques with batch method
  • Overcomes convergence problems
  • Handles a large set of SVs that does not fit in the memory
  • Solves large classification and regression problems
  • Unclear scalability in terms of the number of processors
Yom-Tov [2004]
Parallel SVM Non-linear and binary classifications and regression Distributed HPC architecture Efficiently parallelizes SVM to reach high speedup using decomposition
  • Supports different formulations of SVMs for classification and regression problems
  • Supports sparse data representations
  • Memory management with caching
  • Benefits not fully clear in terms of speedup in some cases
  • Data type impact the speedup
Brugger [2006]
PDS Non-linear and binary classifications Distributed HPC parallelism Formalizes an SVM optimization problem using a different duality called Fenchel duality
  • Increases the efficiency and suitability of the problem regarding distributed parallelism
  • Distributed computing with clusters of independent nodes and memory
  • k/2 speedup using k processors
  • Inefficient for easy problems with a few numbers of SVs
Hazan et al. [2008]
MRPsvm Non-linear and binary classifications Map-reduce based parallelism Implements a decomposition method and uses SMO as an inner solver for multi-core systems
  • Reduces memory requirements
  • Handles data sparsity
  • Higher speedup compared to Libsvm
  • Reforming sparse format back to dense format for conducting dot products
  • Needs more efficient memory usage
  • Only slight performance improvement in terms of time for increasing 4 cores to 8 cores
  • Limited parallelizability of SMO
Zhao et al. [2011]
Ontology-based Parallel SMO Linear and binary classifications Distributed big data architecture using map-reduce Distributes training data across multiple nodes and uses an ontology-based enhancement strategy as a feedback to minimize the impact of the deterioration of accuracy
  • Improves accuracy
  • Good scalability in terms of the number of training samples
  • Easy implementation
  • Accuracy depends on the quality of intelligence provided by the end-user
  • Limited parallelizability of SMO
Caruana et al. [2011]
RAMSMO Multi-class classification Distributed big data architecture using map-reduce Implements a resource aware multi-class classification algorithm for image annotation which distributes data into smaller binary chunks
  • Load balancing of imbalanced data
  • Fast convergence
  • Reduces the overheads
  • Solves multi-class classifications
  • Computes on heterogeneous computing environment
  • Improves the performance in terms of time while keeping high accuracy
  • Good scalability
  • Unclear impact of class sizes on the classification accuracy
  • Complex load balancing strategy
  • Limited parallelizability of SMO
Alham et al. [2014]
OHD-SVM Non-linear and binary classifications GPU-based parallelism Implements a hierarchical decomposition algorithm that fits better GPU architecture
  • Publicly available code
  • Robust to sparse and dense data formats
  • Higher speedup compared to the fastest published GPU-based implementations
  • Limited shared memory
  • Limited parallelizability of SMO
Vanek et al. [2017]
gpuSVM Linear, non-linear, and binary classifications GPU-based parallelism Implements fast SVMs with adaptive first and second order working set selection heuristics
  • An open source implementation
  • Robust to data sparsity
  • Supports multiple kernel functions
  • Improves the performance in terms of time using adaptive heuristics
  • Higher speedup (5 to 35 times) compared to Libsvm
  • One of the best GPU-based SVM implementations in terms of time and accuracy
  • Limited arithmetic precision
  • Limited parallelizability of SMO
Catanzaro et al. [2008]
cuSVM Non-linear and binary classifications GPU-based parallelism Implements a modified SMO decomposition method using kernel caching
  • An open source implementation
  • Improves the performance of SMO in terms of accuracy and time compared to Libsvm
  • Solves classification and regression problems
  • Uses mixed precision floating point arithmetics
  • Unclear robustness for data sparsity
  • Slower than gpuSVM in most cases
  • No support of a class validation strategy
  • Benefits of doubling the precision did not lead to doubling accuracy
  • Limited parallelizability of SMO
Carpenter [2009]
P2SMO Linear, non-linear, and multi-class classifications GPU-based parallelism Implements the first GPU-based SVM implementation of one-versus all multi-class classification
  • Robust to dense data format
  • Supports multiple kernel functions
  • Task-controlling overhead
  • Creates too many classifiers
Herrero-Lopez [2011]
SVM for high-throughput screening data Non-linear classifications GPU-based parallelism Modifies SMO using a large number of working set variables and caching
  • Improves the performance in terms of accuracy and time compared to SVMLight
  • Improves performance of the algorithm in terms of accuracy and time using data movement between CPUs and GPUs
  • Reduces the complexity of the SMO algorithm
  • The used kernel function may not be applicable in other cases
  • Restricted and limited application domain
Liao et al. [2009]
GPU-assisted of LibSVM Non-linear classifications GPU-based parallelism Implements a hybrid CPU-GPU parallel implementation of modified Libsvm
  • Implements a cross-validation that leads to n-fold speedup compared to the standard grid search
  • Avoids re-computation of kernel elements at every iteration in the cross-validation
  • Hybrid parallelism leads to one order of magnitude speedup compared to only CPU parallelism
  • Good scalability in terms of the number of input data
  • Needs to load the whole input data into the GPU memory
  • Limited GPU memory
Athanasopoulos et al. [2011]
SVM with sparse matrix formats Non-linear and multi-class classifications GPU-based parallelism Implements efficient sparse matrix operations in SVM using caching
  • Reduces task-controlling overhead
  • Higher speedup (134 times) compared to Libsvm
  • Robust to data sparsity
  • Creates fewer classifiers (N(N-1)/2) compared to the similar approaches
  • Improves the performance in terms of time using sparse matrix format
  • Benefits not fully clear in dense matrix formats
  • Benefits not fully clear in some cases
Lin and Chien [2010]
GSVM Non-linear and binary classifications GPU-based parallelism Implements parallel testing and training for large-scale problems for intruder detection
  • Fast data access
  • Reduces computational time using scatter-parallel-reduce-gather strategy
  • Real-time processing capabilities for large-scale problems
  • Higher speedup compared to the similar approaches
  • Good scalability in terms of the number of input data
  • Improves the performance in terms of time without loss of classification accuracy compared to Libsvm
  • Explainable results
  • Limited shared memory on GPUs
  • Limited parallelizability of SMO
Zhang et al. [2014]
CUDA-accelerated SVM for celestial object classification Binary and non-linear classifications GPU-based parallelism Implements accelerated SVMLight based classifications and Improves the training time (10 times) and predicting time (364 times) in celestial object classification
  • Higher speedup of training and predicting processes
  • Improves the performance in terms of time compared to SVMLight
  • A fast and reliable classifier
  • Efficient selection of working set size
  • Long runtime for small data due to failure of hiding I/O latency of GPU
  • Limited shared memory on GPUs
Peng et al. [2011]