The comparison of articles and methods that perform the Interior Point Method (IPM) for solving SVM problems in parallel. Here, we highlight the articles in the survey that have interesting findings. The description of each column is as follows,

Note the table can be sorted by clicking on the title of each column.

Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
Incremental Approximate Matrix Factorization IPM Non-linear and binary classifications Shared memory parallelism Employs incremental approximate matrix factorization using ICF and Kronecker
  • Reduces memory requirements
  • Creates a balance between accuracy and computation time
  • Suffers from numerical stableness
  • Exists a trade-off between speedup and accuracy
  • May suffer from limited shared memory
Wu et al. [2006]
Hybrid MPI-OpenMP Linear IPM Linear and binary classifications Hybrid shared-distributed memory parallelism Distributes data amongst parallel nodes using block structure and uses factorization for numerical stability
  • Higher classification accuracy compared to LibLinear
  • Removes a dense matrix from the computation by reformulating the problem
  • Good parallelizability
  • Faster than a solo MPI implementation
  • Numerical stability compared to the similar approaches
  • Efficient computation of computationally expensive operations
  • Supports only linear classifications
  • Requires loading of all samples into the memory
Woodsend et al. [2009]
PSVM Non-linear and binary classifications Distributed HPC Architectures Reduces the problem size using matrix factorization and partitions the problem using row-based data blocks
  • Reduces memory requirements
  • Higher speedup compared to the similar approaches
  • Good scalability in terms of the number of cores
  • Reduces the problem size
  • Efficient computation of matrix inversion
  • A small fraction of the problem in each iteration is sequential
  • May suffer from synchronization and communication overheads
  • Reducing problem size impacts the performance in terms of the classification accuracy
  • The suggested reduced size of problem is not optimal in many cases
  • Sub-linear speedup due to sequential steps
Chang et al. in [2009]
SVM-PCG Linear and binary classifications Distributed HPC Architectures Uses a linear preconditioned method to compute each iteration
  • Competitive performance with SVMLight
  • Higher speedup compared to other similar approaches
  • Reduces the problem size
  • Efficient computations of matrix inversion and matrix operations
  • Open access
  • No support for nonlinear kernels
  • Exists sequential operations that have potential to be performed in parallel
  • Benefits not fully clear for large-scale problems with a large number of features
Gertz et al. [2010]
GPU-cluster SVM Non-linear and binary classifications GPU parallelism Accelerates a cross-validation process by performing multiple tasks in parallel
  • Higher speedup (90 times) compared to Libsvm using a single node
  • Higher speedup (3 times) compared to the CPU implementations
  • Scales well with the number of training samples
  • Efficient memory access
  • Efficient data transfer between the host and the device
  • Efficient parameter tuning using cross-validation
  • Avoids re-computation of computationally expensive tasks
  • Does not take the maximum advantage of GPU features, e.g., the pinned memory and unified virtual address
  • May suffers from limited memory
Li et al. [2013]
HPSVM Non-linear and binary classifications GPU parallelism Implements a heterogeneous parallel SVM and accelerates the computationally expensive matrix operations using an approximate factorization
  • Low memory requirements
  • Fast convergence
  • Good parallelizability
  • Takes the maximum advantage of CPU-GPU parallelism using dual buffer 3-stage pipeline mechanism
  • Higher speedup (11 times) compared to the CPU implementation
  • Improves the work proposed by Li et al. [2013] in terms of data transfer and memory access
  • Efficient hierarchical memory access
  • Good scalability in terms of the number of training samples
  • High training accuracy
  • Exists some sequential operations
  • May suffer from limited memory
Li et al. [2016]