The comparison of articles and methods that perform the kernel computations in parallel. Here, we highlight the articles in the survey that have interesting findings. The description of each column is as follows,

  • "Algorithm" indicates the corresponding algorithm's name or approach.
  • "Problem Type" shows whether the algorithm solves binary, multi-class, linear or/and non-linear classifications.
  • "Parallelism Type" highlights whether the algorithm uses the shared memory parallelism, distributed big data architectures, distributed HPC architecture, or/and GPU parallelism.
  • "Main Idea" simply shows the main contribution of the author(s).
  • "Pros" explains the remarks and findings of the algorithm. Besides, it highlights the advantages of the algorithm compared to the similar approaches, if possible.
  • "Cons" describes the shortcomings or the place for further improvements of the corresponding algorithm. Besides, it shows the weak points of the algorithm compared to the similar approaches, if possible.

Note the table can be sorted by clicking on the title of each column.
*FPGA denotes Field Programmable Gate Arrays.

AlgorithmProblem TypeParallelism TypeMain IdeaProsConsReference
AlgorithmProblem TypeParallelism TypeMain IdeaProsConsReference
CPA SVM Non-linear and multi-class classifications Shared memory parallelism Implements a modular approximate cutting plane algorithm in which a series of successive approximations to the original problem is generated and it uses structural kernels
  • Good parallelizability
  • Reduces computationally expensive kernel evaluations
  • Scales almost linearly with the number of CPUs
  • Efficiently solves class imbalanced problems
  • Limited scalability in terms of the number of training data
  • Benefits are not fully clear for other kernels
Severyn and Moschitti [2011]
Fast SVM Non-linear and multi-class classifications Shared memory parallelism Quickly removes non-SV samples in the earlier steps of optimization and uses block diagonal matrices to approximate the kernel matrix
  • Runtime complexity linearly scales with the size of samples and classes
  • Good scalability compared to Libsvm, SVMLight, and SVMTorch
  • Reduces problem size
  • Efficient computation of the kernel matrix
  • Efficient memory usage
  • Feature reduction
  • Efficient parameter tuning
  • Slightly degrades the accuracy in some cases
  • Benefits not fully clear for a large number of SVs
Dong et al. [2005]
FPGA SVM Non-linear and binary classifications FPGA* Uses a hybrid working set algorithm in which the working set size increases to reduce the iteration numbers
  • Speedup (25 times) compared to a single threaded implementation
  • Speedups (15 and 23 times) compared to Libsvm and SVMLight
  • Fast converges with a small number of iterations
  • Dedicated hardware for kernel computations
  • May suffer from limited RAM blocks
Venkateshan et al. [2015]
Massive parallel SVM Non-linear and binary classifications FPGA Uses fixed-point arithmetic in computationally expensive operations
  • Fixed-point arithmetic accelerates the computations
  • Builds compact circuits for reducing power dissipation
  • An order of magnitude more performance than similar FPGA implementations
  • Faster computations trade off the accuracy
  • Dedicated hardware for kernel computations
Graf et al. [2009]
PS-SVM Non-linear and binary classifications Shared memory parallelism Implements an iterative re-weighted least square SVM instead of quadratic programming and parallelizes the matrix inversion
  • Twice speedup for doubling the number of cores
  • Efficient matrix inversion in parallel
  • An efficient approximation of a large matrix using a sparse greedy method
  • Performs not well for small-scale problems or problems with a few numbers of SVs
  • Efficiency deterioration using 8 cores
  • Limited memory
  • Benefits not fully clear for a large number of cores
Morales et al. [2011]
The comparison of GPUs and FPGA Non-linear and binary classifications GPUs and FPGA Impalements Gilbert algorithm for solving SVMs in which it finds the point on a convex hull which is closest to the origin
  • Outperforms GPU-based parallelism for data that fit in FPGA RAM blocks
  • Good parallelizability
  • Explainable results
  • Does not perform well for data that do not fit in FPGA RAM blocks
  • Limited FPGA RAM blocks
  • Speedups remain constant for the increasing training data size
  • Limited scalability
Papadonikolakis [2009]
Showing 1 to 6 of 6 entries
1