The comparison of articles and methods that perform the kernel computations in parallel. Here, we highlight the articles in the survey that have interesting findings. The description of each column is as follows,

"Algorithm" indicates the corresponding algorithm's name or approach.

"Problem Type" shows whether the algorithm solves binary, multi-class, linear or/and non-linear classifications.

"Parallelism Type" highlights whether the algorithm uses the shared memory parallelism, distributed big data architectures, distributed HPC architecture, or/and GPU parallelism.

"Main Idea" simply shows the main contribution of the author(s).

"Pros" explains the remarks and findings of the algorithm. Besides, it highlights the advantages of the algorithm compared to the similar approaches, if possible.

"Cons" describes the shortcomings or the place for further improvements of the corresponding algorithm. Besides, it shows the weak points of the algorithm compared to the similar approaches, if possible.

Note the table can be sorted by clicking on the title of each column.
*FPGA denotes Field Programmable Gate Arrays.
Algorithm	Problem Type	Parallelism Type	Main Idea	Pros	Cons	Reference
Algorithm	Problem Type	Parallelism Type	Main Idea	Pros	Cons	Reference
Fast SVM	Non-linear and multi-class classifications	Shared memory parallelism	Quickly removes non-SV samples in the earlier steps of optimization and uses block diagonal matrices to approximate the kernel matrix	Runtime complexity linearly scales with the size of samples and classes Good scalability compared to Libsvm, SVMLight, and SVMTorch Reduces problem size Efficient computation of the kernel matrix Efficient memory usage Feature reduction Efficient parameter tuning	Slightly degrades the accuracy in some cases Benefits not fully clear for a large number of SVs	Dong et al. [2005]
PS-SVM	Non-linear and binary classifications	Shared memory parallelism	Implements an iterative re-weighted least square SVM instead of quadratic programming and parallelizes the matrix inversion	Twice speedup for doubling the number of cores Efficient matrix inversion in parallel An efficient approximation of a large matrix using a sparse greedy method	Performs not well for small-scale problems or problems with a few numbers of SVs Efficiency deterioration using 8 cores Limited memory Benefits not fully clear for a large number of cores	Morales et al. [2011]
CPA SVM	Non-linear and multi-class classifications	Shared memory parallelism	Implements a modular approximate cutting plane algorithm in which a series of successive approximations to the original problem is generated and it uses structural kernels	Good parallelizability Reduces computationally expensive kernel evaluations Scales almost linearly with the number of CPUs Efficiently solves class imbalanced problems	Limited scalability in terms of the number of training data Benefits are not fully clear for other kernels	Severyn and Moschitti [2011]
FPGA SVM	Non-linear and binary classifications	FPGA*	Uses a hybrid working set algorithm in which the working set size increases to reduce the iteration numbers	Speedup (25 times) compared to a single threaded implementation Speedups (15 and 23 times) compared to Libsvm and SVMLight Fast converges with a small number of iterations	Dedicated hardware for kernel computations May suffer from limited RAM blocks	Venkateshan et al. [2015]
Massive parallel SVM	Non-linear and binary classifications	FPGA	Uses fixed-point arithmetic in computationally expensive operations	Fixed-point arithmetic accelerates the computations Builds compact circuits for reducing power dissipation An order of magnitude more performance than similar FPGA implementations	Faster computations trade off the accuracy Dedicated hardware for kernel computations	Graf et al. [2009]
The comparison of GPUs and FPGA	Non-linear and binary classifications	GPUs and FPGA	Impalements Gilbert algorithm for solving SVMs in which it finds the point on a convex hull which is closest to the origin	Outperforms GPU-based parallelism for data that fit in FPGA RAM blocks Good parallelizability Explainable results	Does not perform well for data that do not fit in FPGA RAM blocks Limited FPGA RAM blocks Speedups remain constant for the increasing training data size Limited scalability	Papadonikolakis [2009]