The comparison of articles and methods that perform parallel algorithms other than mentioned in the previous tables. Here, we highlight the articles in the survey that have interesting findings. The description of each column is as follows,

Note the table can be sorted by clicking on the title of each column.

Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
PIRWLS and PSIRWLS Non-linear, binary, and multi-class classifications Shared memory parallelism Implements two parallel iterative re-weighted least square SVMs to solve a full SVM and a semi-parametric model of SVMs. Parallelizes a Cholesky decomposition to solve linear systems in each iteration
  • Reduces computationally expensive kernel evaluations
  • Avoids computing a computationally expensive matrix inversion
  • Outperforms LIBIRWLS proposed by Morales and Vazquez [2017]
  • Higher speedup compared to the standard SVM libraries
  • Similar accuracy to the standard SVM libraries, but in less time
  • A large numbers of SVs in some cases
  • Slower than the standard libraries using one core
  • Benefits not fully clear for a large number of cores
  • May suffer from limited memory
Morales and Vazquez [2016]
LIBIRWLS Non-linear and binary classifications Shared memory parallelism Implements a parallel library that uses iterative re-weighted least square SVM instead of quadratic programming. Approximates the kernel matrix using a sparse greedy approximation
  • Efficiently controls the number of SVs
  • Reduces training samples using random sampling
  • Does not perform better than PSIRWLS proposed by Morales and Vazquez [2016] in terms of accuracy
  • May suffer from limited memory
  • Benefits not fully clear in terms of scalability
Morales and Vazquez [2017]
P-packSVM Non-linear and multi-class classifications Distributed HPC Architectures Stochastically approximates the gradient using gradient descent, so that only a single training sample is used instead of using the whole training samples. Packs many consecutive gradient descent steps in one pack
  • Supports any arbitrary kernel function
  • Good scalability in terms of the number of training samples and cores
  • Outperforms the similar state-of-the-art algorithms
  • Reduces communication overhead
  • Reduces number of iterations
  • Outperforms IPM- and SMO-based algorithms
  • Reduces the problem size
  • Trains large-scale problems in much shorter time and comparative accuracy compared to the similar approaches
  • Lacks a convergence criterion
  • Sub-linear speedup in some cases
Zhu et al. [2009]
PmSVM Non-linear and multi-class classifications Hybrid shared and distributed HPC Architectures Builds balanced bagging classifiers using under-sampling strategy and trains only a fraction of training data
  • Fast convergence
  • Scales well with the number of training samples
  • Avoids training on full training samples
  • Handles class-imbalance problems using an under-sampling strategy
  • Higher speedup (286 times) compared to the sequential version and (1000 times) compared to Libsvm and LIBLINEAR.
  • Suffers from limited memory
  • Requires to load the whole training data into the memory
  • Does not support heterogeneous computing environments
  • Benefits not fully clear for a large number of cores
Nghi et al. [2015]
Gradient system Non-linear and multi-class classifications Distributed HPC Architectures Uses neural networks for training SVMs. Solves high-frequency oscillations in the neighborhood of the discontinuity surface (chattering problem) using a boundary layer technique
  • Good parallelizability
  • Calculates only the lower triangular matrix addressed by SVM
  • Outperforms SVMLight and Libsvm
  • Solves the chattering problem
  • Benefits not fully clear for scalability of large-scale problems
  • Benefits not fully clear for scalability in terms of the number of processors
  • Complex configuration
Ferreira et al. [2006]
ASGD Non-linear and multi-class classifications Distributed big data architectures using Hadoop map-reduce Performs feature extraction in parallel using a large number of mappers. Performs averaging stochastic gradient descent in parallel for multi-class classifications
  • Reduces dimension of the problem
  • Fast extraction of fairly sophisticated features
  • Fast training of redundant data using an averaging scheme
  • Good parallelizability
  • Reduces the traffic for loading data
  • Good scalability in terms of the number of training samples
  • Fast convergence
  • Benefits not fully clear for non-redundant data
  • May suffer from communication overheads
  • Concerns regarding asymptotic convergence property
Lin et al. [2011]
Map-Reduce SVM Unclear Distributed big data architectures Reformulates the optimization problem to perform better parallelization
  • Improves ASGD proposed by Lin et al. [2011] in terms of communication overhead
  • Better parallelizability compared to the similar approaches
  • Reduces the communication between processors using an averaging scheme
  • Linear speedup while increasing the number of processors
  • Outperforms the standard cascade SVMs in terms of speedup
  • Performs other machine learning algorithms in parallel on the proposed platform
  • Concerns regarding scalability in terms of the number of cores
  • Requires using whole training samples in each iteration
  • Concerns regarding scalability in terms of the number of training samples
  • Unclear type of the classification problems
Chu et al. [chu2007]
BMBPGD Distributed big data architectures using Apache Spark Non-linear and binary classifications Implements a budgeted mini-batch gradient descent in parallel on Apache Spark
  • Uses only a fraction of training examples in each iteration
  • Good scalability in terms of the number of training samples
  • Keeps the number of SVs under control
  • Avoids the curse of dimensionality
  • Outperforms SGD-based algorithms in terms of accuracy and time
  • Concerns regarding accuracy in some cases
  • Exists a trade-off between accuracy and time
Tao et al. [2014]