The comparison of articles and methods that perform distributed algorithms for training SVMs in parallel. Here, we highlight the articles in the survey that have interesting findings. The description of each column is as follows,

  • "Algorithm" indicates the corresponding algorithm's name or approach.
  • "Problem Type" shows whether the algorithm solves binary, multi-class, linear or/and non-linear classifications.
  • "Parallelism Type" highlights whether the algorithm uses the shared memory parallelism, distributed big data architectures, distributed HPC architecture, or/and GPU parallelism.
  • "Main Idea" simply shows the main contribution of the author(s).
  • "Pros" explains the remarks and findings of the algorithm. Besides, it highlights the advantages of the algorithm compared to the similar approaches, if possible.
  • "Cons" describes the shortcomings or the place for further improvements of the corresponding algorithm. Besides, it shows the weak points of the algorithm compared to the similar approaches, if possible.

Note the table can be sorted by clicking on the title of each column.

AlgorithmProblem TypeParallelism TypeMain IdeaProsConsReference
AlgorithmProblem TypeParallelism TypeMain IdeaProsConsReference
Automatic parallel SVM GPU-based Parallelism Non-linear and binary classifications Develops a directive-based technique using a gradient ascent for the optimization problem. Performs a novel feature extraction in which pixel values are converted to a high dimensional input vector
  • Higher speedup (3 times) for a simple change of memory access pattern
  • Higher speedup (10-fold) for changing the input data structure
  • Higher memory throughput
  • The fastest GPU-based SVM for datasets with high dimensional input space
  • Dimension reduction of problem
  • Maintaining high accuracy
  • Efficient parameter tuning
  • Benefits not fully clear regarding scalability
Codreanu et al. [2014]
D-ADMM Linear and binary classifications Distributed HPC Architectures Uses a coloring scheme that leads to fewer communications between nodes than that in the state-of-the-art algorithms
  • Reduces communication between nodes
  • Asynchronous computations
  • Good parallelizability
  • Unclear theoretical explanation of why this algorithm is more efficient than the similar algorithms
  • Supports only linear classifications
  • Slow convergence
Mota et al. [2013]
DSVM Linear and binary classifications Distributed big data architectures using map-reduce Performs block minimizations and obtains final results using averaging and line search strategies
  • Scales well in terms of the number of training and testing samples
  • Communication complexity is independent of the number of training samples
  • Fast computation time for training large-scale problems (over 80 million samples in less than 11 minutes)
  • Supports only linear classifications
  • No theoretical proof of global convergence
Pechyony et al. [2011]
Ensemble SVMs Non-linear and Multi-class classifications Shared memory parallelism Implements ensemble learning to train data and uses simple aggregation schemes to find the global result
  • Significant speedup compared to Libsvm with comparative accuracy
  • Reduces training complexity
  • Avoids duplicate storage
  • High classification accuracy
  • Good parallelizability
  • An intuitive programming framework
  • Needs more complex aggregation schemes for further performance improvement
  • Deterioration of accuracy for low-dimensional problems
  • Limited memory
Claesen et al. [2014]
euroCAT Linear and binary classifications Distributed HPC Architectures Implements a multi-centric distributed learning in which data privacy is a top priority with application in personalized medicine
  • Preserves data privacy
  • Higher speedup while having good accuracy
  • Quality of the model is similar to the centralized models
  • Asynchronous communications
  • Efficient parameter tuning
  • Supports only linear classifications
  • Slow convergence
Deist et al. [2017]
GaBP SVM Non-linear and binary classifications Distributed HPC Architectures Reformulates the standard SVM problem from algebraic to a probabilistic domain using Gaussian distribution
  • Reduces overheads using a few numbers of passing messages
  • Good scalability in terms of the number of cores (up to 1024 cores)
  • Comparable accuracy with SVMLight
  • The largest parallel implementation of the Gaussian belief propagation ever done
  • A linear speedup using 256 CPUs
  • Robust to sparse and dense matrix operations
  • Lacks support of asynchronous communications in heterogeneous computing environments
  • Slow convergence of a large number of input data and large networks
  • Communication overhead
  • Computes the full kernel matrix
Bickson et al. [2008]
GADMM Linear and binary classifications Distributed HPC Architectures Implements a group-based ADMM in which all slave nodes are divided into groups and uses a weighted average method to gather the results from all groups and update the global variable
  • Fast convergence
  • Higher speedup (30%) compared to non-grouped ADMM
  • Reduces the number of ADMM iterations
  • Improves the global consensus
  • Good scalability in terms of the number of training samples
  • Supports only linear classifications
  • Slightly degrades the accuracy
Wang et al. [2017]
GPU-Tailored kernalized SVM GPU-based Parallelism Non-linear, binary, and multi-class classifications. Handles sparse data using sparsity clustering which groups training samples with similar sparsity patterns on GPU
  • Solves problems with sparse datasets that may not follow the GPUs' special memory pattern
  • Robust to both sparse and dense data formats
  • Efficiently handles sparse data
  • Publicly available implementation
  • Overcomes some of limited memory issues on GPUs
  • Reduces runtime
  • Efficient memory access pattern
  • Easy configuration
  • Orders of magnitude faster performance than existing CPUs and GPUs implementations
  • Benefits not fully clear for other data structures
  • May still suffers from limited memory
  • Initial clustering of data can be computationally expensive
Cotter et al. [2011]
Initial clustering Unclear Shared memory parallelism Minimizes the number of training examples that represent the original large-scale problem using k-means clustering
  • Reduces number of training examples using clustering
  • Robust for unbalanced data
  • Easy implementation
  • Speedup (3 times) compared to the sequential version
  • Possible deterioration of accuracy
  • Limited explainability of results
  • Benefits not fully clear for non-linear and multi-class classifications
  • Limited memory
Shrivastava et al. [2011]
kSVM Non-linear and Multi-class classifications Shared memory parallelism Distributes data into clusters in which clusters locally train data in parallel
  • Scales well with the size of training data
  • Maintains the classification accuracy
  • Good scalability in terms of the number of training samples
  • Exists a trade-off between computation time and accuracy
  • Finds local SVs instead of the global SVs
Do et al. [2015]
Showing 1 to 10 of 16 entries
12