The comparison of articles and methods that perform decomposition techniques in parallel. Here, we highlight the articles in the survey that have interesting findings. The description of each column is as follows,

"Algorithm" indicates the corresponding algorithm's name or approach.

"Problem Type" shows whether the algorithm solves binary, multi-class, linear or/and non-linear classifications.

"Parallelism Type" highlights whether the algorithm uses the shared memory parallelism, distributed big data architectures, distributed HPC architecture, or/and GPU parallelism.

"Main Idea" simply shows the main contribution of the author(s).

"Pros" explains the remarks and findings of the algorithm. Besides, it highlights the advantages of the algorithm compared to the similar approaches, if possible.

"Cons" describes the shortcomings or the place for further improvements of the corresponding algorithm. Besides, it shows the weak points of the algorithm compared to the similar approaches, if possible.

Note the table can be sorted by clicking on the title of each column.
Algorithm	Problem Type	Parallelism Type	Main Idea	Pros	Cons	Reference
Algorithm	Problem Type	Parallelism Type	Main Idea	Pros	Cons	Reference
Adaptive shrinking	Non-linear and binary classifications	Distributed HPC architecture	Eliminates insignificant data in an adaptive and parallel manner.	A fast and efficient method of reducing data Fast convergence Robust for data sparsity Special support of sparse data Up to 26 times computation speedup	Not robust for different types of data and different settings of hyperparameters Incorrect eliminations of samples leading to incorrect boundaries	Narasimhan et al. [2014]
Scalable approach and adaptive shrinking	Non-linear and binary classifications	Distributed HPC architecture	Uses an adaptive shrinking for elimination of insignificant samples on large scale systems	Good scalability in terms of the number of processors (up to 4096 cores) Scales well to a large number of training data Maintaining accuracy using synchronized data structures Space complexity reduction Adaptive reconstitution of data structures Intuitive heuristics for adaptive shrinking	Limited parallelizability of SMO	Vishnu et al. [2015]
Batch mode SVM	Non-linear and binary classifications	Distributed HPC architecture	Uses decomposition techniques with batch method	Overcomes convergence problems Handles a large set of SVs that does not fit in the memory Solves large classification and regression problems	Unclear scalability in terms of the number of processors	Yom-Tov [2004]
Parallel SVM	Non-linear and binary classifications and regression	Distributed HPC architecture	Efficiently parallelizes SVM to reach high speedup using decomposition	Supports different formulations of SVMs for classification and regression problems Supports sparse data representations Memory management with caching	Benefits not fully clear in terms of speedup in some cases Data type impact the speedup	Brugger [2006]
PDS	Non-linear and binary classifications	Distributed HPC parallelism	Formalizes an SVM optimization problem using a different duality called Fenchel duality	Increases the efficiency and suitability of the problem regarding distributed parallelism Distributed computing with clusters of independent nodes and memory k/2 speedup using k processors	Inefficient for easy problems with a few numbers of SVs	Hazan et al. [2008]
MRPsvm	Non-linear and binary classifications	Map-reduce based parallelism	Implements a decomposition method and uses SMO as an inner solver for multi-core systems	Reduces memory requirements Handles data sparsity Higher speedup compared to Libsvm	Reforming sparse format back to dense format for conducting dot products Needs more efficient memory usage Only slight performance improvement in terms of time for increasing 4 cores to 8 cores Limited parallelizability of SMO	Zhao et al. [2011]
Ontology-based Parallel SMO	Linear and binary classifications	Distributed big data architecture using map-reduce	Distributes training data across multiple nodes and uses an ontology-based enhancement strategy as a feedback to minimize the impact of the deterioration of accuracy	Improves accuracy Good scalability in terms of the number of training samples Easy implementation	Accuracy depends on the quality of intelligence provided by the end-user Limited parallelizability of SMO	Caruana et al. [2011]
RAMSMO	Multi-class classification	Distributed big data architecture using map-reduce	Implements a resource aware multi-class classification algorithm for image annotation which distributes data into smaller binary chunks	Load balancing of imbalanced data Fast convergence Reduces the overheads Solves multi-class classifications Computes on heterogeneous computing environment Improves the performance in terms of time while keeping high accuracy Good scalability	Unclear impact of class sizes on the classification accuracy Complex load balancing strategy Limited parallelizability of SMO	Alham et al. [2014]
OHD-SVM	Non-linear and binary classifications	GPU-based parallelism	Implements a hierarchical decomposition algorithm that fits better GPU architecture	Publicly available code Robust to sparse and dense data formats Higher speedup compared to the fastest published GPU-based implementations	Limited shared memory Limited parallelizability of SMO	Vanek et al. [2017]
gpuSVM	Linear, non-linear, and binary classifications	GPU-based parallelism	Implements fast SVMs with adaptive first and second order working set selection heuristics	An open source implementation Robust to data sparsity Supports multiple kernel functions Improves the performance in terms of time using adaptive heuristics Higher speedup (5 to 35 times) compared to Libsvm One of the best GPU-based SVM implementations in terms of time and accuracy	Limited arithmetic precision Limited parallelizability of SMO	Catanzaro et al. [2008]
cuSVM	Non-linear and binary classifications	GPU-based parallelism	Implements a modified SMO decomposition method using kernel caching	An open source implementation Improves the performance of SMO in terms of accuracy and time compared to Libsvm Solves classification and regression problems Uses mixed precision floating point arithmetics	Unclear robustness for data sparsity Slower than gpuSVM in most cases No support of a class validation strategy Benefits of doubling the precision did not lead to doubling accuracy Limited parallelizability of SMO	Carpenter [2009]
P2SMO	Linear, non-linear, and multi-class classifications	GPU-based parallelism	Implements the first GPU-based SVM implementation of one-versus all multi-class classification	Robust to dense data format Supports multiple kernel functions	Task-controlling overhead Creates too many classifiers	Herrero-Lopez [2011]
SVM for high-throughput screening data	Non-linear classifications	GPU-based parallelism	Modifies SMO using a large number of working set variables and caching	Improves the performance in terms of accuracy and time compared to SVMLight Improves performance of the algorithm in terms of accuracy and time using data movement between CPUs and GPUs Reduces the complexity of the SMO algorithm	The used kernel function may not be applicable in other cases Restricted and limited application domain	Liao et al. [2009]
GPU-assisted of LibSVM	Non-linear classifications	GPU-based parallelism	Implements a hybrid CPU-GPU parallel implementation of modified Libsvm	Implements a cross-validation that leads to n-fold speedup compared to the standard grid search Avoids re-computation of kernel elements at every iteration in the cross-validation Hybrid parallelism leads to one order of magnitude speedup compared to only CPU parallelism Good scalability in terms of the number of input data	Needs to load the whole input data into the GPU memory Limited GPU memory	Athanasopoulos et al. [2011]
SVM with sparse matrix formats	Non-linear and multi-class classifications	GPU-based parallelism	Implements efficient sparse matrix operations in SVM using caching	Reduces task-controlling overhead Higher speedup (134 times) compared to Libsvm Robust to data sparsity Creates fewer classifiers (N(N-1)/2) compared to the similar approaches Improves the performance in terms of time using sparse matrix format	Benefits not fully clear in dense matrix formats Benefits not fully clear in some cases	Lin and Chien [2010]
GSVM	Non-linear and binary classifications	GPU-based parallelism	Implements parallel testing and training for large-scale problems for intruder detection	Fast data access Reduces computational time using scatter-parallel-reduce-gather strategy Real-time processing capabilities for large-scale problems Higher speedup compared to the similar approaches Good scalability in terms of the number of input data Improves the performance in terms of time without loss of classification accuracy compared to Libsvm Explainable results	Limited shared memory on GPUs Limited parallelizability of SMO	Zhang et al. [2014]
CUDA-accelerated SVM for celestial object classification	Binary and non-linear classifications	GPU-based parallelism	Implements accelerated SVMLight based classifications and Improves the training time (10 times) and predicting time (364 times) in celestial object classification	Higher speedup of training and predicting processes Improves the performance in terms of time compared to SVMLight A fast and reliable classifier Efficient selection of working set size	Long runtime for small data due to failure of hiding I/O latency of GPU Limited shared memory on GPUs	Peng et al. [2011]