The comparison of articles and methods that perform the kernel computations in parallel. Here, we highlight the articles in the survey that have interesting findings. The description of each column is as follows,

Note the table can be sorted by clicking on the title of each column.
*FPGA denotes Field Programmable Gate Arrays.

Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
Fast SVM Non-linear and multi-class classifications Shared memory parallelism Quickly removes non-SV samples in the earlier steps of optimization and uses block diagonal matrices to approximate the kernel matrix
  • Runtime complexity linearly scales with the size of samples and classes
  • Good scalability compared to Libsvm, SVMLight, and SVMTorch
  • Reduces problem size
  • Efficient computation of the kernel matrix
  • Efficient memory usage
  • Feature reduction
  • Efficient parameter tuning
  • Slightly degrades the accuracy in some cases
  • Benefits not fully clear for a large number of SVs
Dong et al. [2005]
PS-SVM Non-linear and binary classifications Shared memory parallelism Implements an iterative re-weighted least square SVM instead of quadratic programming and parallelizes the matrix inversion
  • Twice speedup for doubling the number of cores
  • Efficient matrix inversion in parallel
  • An efficient approximation of a large matrix using a sparse greedy method
  • Performs not well for small-scale problems or problems with a few numbers of SVs
  • Efficiency deterioration using 8 cores
  • Limited memory
  • Benefits not fully clear for a large number of cores
Morales et al. [2011]
CPA SVM Non-linear and multi-class classifications Shared memory parallelism Implements a modular approximate cutting plane algorithm in which a series of successive approximations to the original problem is generated and it uses structural kernels
  • Good parallelizability
  • Reduces computationally expensive kernel evaluations
  • Scales almost linearly with the number of CPUs
  • Efficiently solves class imbalanced problems
  • Limited scalability in terms of the number of training data
  • Benefits are not fully clear for other kernels
Severyn and Moschitti [2011]
FPGA SVM Non-linear and binary classifications FPGA* Uses a hybrid working set algorithm in which the working set size increases to reduce the iteration numbers
  • Speedup (25 times) compared to a single threaded implementation
  • Speedups (15 and 23 times) compared to Libsvm and SVMLight
  • Fast converges with a small number of iterations
  • Dedicated hardware for kernel computations
  • May suffer from limited RAM blocks
Venkateshan et al. [2015]
Massive parallel SVM Non-linear and binary classifications FPGA Uses fixed-point arithmetic in computationally expensive operations
  • Fixed-point arithmetic accelerates the computations
  • Builds compact circuits for reducing power dissipation
  • An order of magnitude more performance than similar FPGA implementations
  • Faster computations trade off the accuracy
  • Dedicated hardware for kernel computations
Graf et al. [2009]
The comparison of GPUs and FPGA Non-linear and binary classifications GPUs and FPGA Impalements Gilbert algorithm for solving SVMs in which it finds the point on a convex hull which is closest to the origin
  • Outperforms GPU-based parallelism for data that fit in FPGA RAM blocks
  • Good parallelizability
  • Explainable results
  • Does not perform well for data that do not fit in FPGA RAM blocks
  • Limited FPGA RAM blocks
  • Speedups remain constant for the increasing training data size
  • Limited scalability
Papadonikolakis [2009]