The comparison of articles and methods that perform incremental learning of SVMs in parallel. Here, we highlight the articles in the survey that have interesting findings in combining training SVMs with incremental learning. The description of each column is as follows,

Note the table can be sorted by clicking on the title of each column.

Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
Algorithm Problem Type Parallelism Type Main Idea Pros Cons Reference
StreamSVM Linear and binary classifications Shared memory parallelism Implements the first training linear SVM algorithm which integrates caching with optimization and streaming data in a parallel manner
  • Increasing throughput
  • High accuracy
  • Reduces memory requirements using data blocks
  • Outperforms linear SVM solvers by orders of magnitude
  • Data expansion on the fly by generating features on demand
  • Exploits the memory storage hierarchy
  • Good scalability in terms of the number of input data
  • Multiple passes through the entire data
  • High overhead
Matsushima et al. [2012]
Incremental LIBLinear Linear and multi-class classifications Hybrid shared-distributed memory parallelism Implements a balanced bagging incremental algorithm without loss of classification accuracy
  • Solves imbalanced problems
  • Good scalability in terms of the number of cores
  • Good scalability in terms of the number of training data
  • Efficient reduction of training data
  • High speedup (732 times) against the original implementation and (1193 times) against LIBLinear
  • Efficient memory usage
  • Only supports linear classifications
Doan et al. [2013]
LS-SVM Linear, non-linear, and binary classifications Distributed HPC architecture Classifies large-scale problems using extended and boosting in least square SVM algorithm on a standard workstation
  • Adaptive distributing of data
  • High scalability in terms of the number of training data (20 billion data)
  • Robust for different data types
  • Robust for high dimensional input space
  • Complexity is linearly dependent on the number of training examples and the number of machines
  • Complexity is linearly dependent on the second order of the number of dimensions of the input space
  • High speedup for large-scale problems
  • High performance computations of matrix operations
  • Requires a central computing unit to combine the resulting SVs
  • Communication overhead
  • Sequential computations
  • Benefits not fully clear for complex non-linear classifications
Do and Poulet [2006]
Tree-based incremental SVM Linear classifications Distributed HPC architecture Implements parallel heap-based tree topology for incremental SVM
  • High speedup for a large increment size
  • Close to being linearly scalable for large increment size
  • Good performance of parallel version on large increment size
  • Efficient access of data for all nodes
  • Limited network IO bound on small increment size
  • Accuracy degradation in unbalanced cases
  • Only supports linear classifications
  • Poor performance of parallel version on the small increment size
  • Slightly poor performance of parallel version on the medium increment size
Tveit and Engum [2003]
Centralized SVM Linear and binary classifications Distributed HPC architecture Learns from distributed data sources and improves the work proposed by Syed et al. [1999] to obtain the global optimum using distributed systems
  • Learns from distributed data sources
  • Good potential to apply data privacy
  • Fast convergence in a finite number of iterations
  • May suffer from communication and synchronization overheads
  • Supports only linear classifications
Caragea et al. [2005]
Consensus ADMM Non-linear and binary classifications Distributed HPC architecture Implements a consensus ADMM-based SVM
  • Node failure safe
  • Reduces communication overhead by communicating with neighboring nodes only
  • Good scalability in terms of the number of computing nodes
  • Good parallelizability
  • Learns online and offline
  • Good potential to apply data privacy
  • Negative impact of poor choices of parameters on convergence
  • May require many iterations before convergence
  • Benefits not fully shown for learning large scale problems
Forero et al. [2010]
DPSVM Non-linear and binary classifications Distributed HPC architecture Exchanges SVs in a strongly connected network in which several servers work fast on distributed data in parallel with low communication cost
  • Fast convergence
  • Independence of computing time to the number of SVs at each iteration
  • Good scalability in terms of the number of training data
  • Limited communication cost for not large networks
  • Global optimum
  • Outperforms the standard cascade SVM
  • Close to linear scalability for not large networks
  • Configurable network
  • Online implementation and synchronization
  • High communication overhead for large networks
  • The size of SVs impacts the communication cost
Lu et al. [2008]
PESVM Non-linear and binary classifications Distributed big data architectures using map-reduce Implements an online learning algorithm in which the existing model is updated for new coming training data
  • Good scalability in terms of the number of training samples
  • Avoids re-training the old data for new coming data
  • Explainable results
  • Overhead
  • Benefits not fully clear for large number of cores
He et al. [2011]
Newton SVM Linear and binary classifications GPU-based parallelism Implements a Newton-based SVM classifier which requires only solving a linear system instead of a quadratic system
  • Reduces memory requirements compared to the similar approaches
  • Outperforms standard SVM software such as Libsvm, SVM-Perf, and similar approaches
  • Higher speedup (45 times) compared to the CPU-based parallel version and (over 100 times) LibSVM, SVM-Perf, and similar approaches
  • Stores data in memory in two successive steps
  • Good scalability in terms of number of training samples
  • Classifies large-scale problems on standard workstations for data with small enough dimension of input space
  • Only supports linear classifications
  • Performs worse than GPU-based LS-SVM
  • Benefits not fully clear for training data with large dimension of input space
Do et al. [2008]
GPU-based LS-SVM Linear and binary classifications GPU parallelism Implements least square SVM classifier
  • Fast convergence
  • Avoids loading entire training data into memory
  • Outperforms the Newton SVM by Do et al. [2008]
  • Outperforms Libsvm and SVM-Perf with high speedup (1000 times)
  • High scalability in terms of the number of training samples
  • High speed up (70 times) compared to its CPU implementation
  • Only supports linear classifications
  • Benefits not fully clear in kernalized SVMs
Do et al. [2008]