The comparison of articles and methods that perform incremental learning of SVMs in parallel. Here, we highlight the articles in the survey that have interesting findings in combining training SVMs with incremental learning. The description of each column is as follows,

"Algorithm" indicates the corresponding algorithm's name or approach.

"Problem Type" shows whether the algorithm solves binary, multi-class, linear or/and non-linear classifications.

"Parallelism Type" highlights whether the algorithm uses the shared memory parallelism, distributed big data architectures, distributed HPC architecture, or/and GPU parallelism.

"Main Idea" simply shows the main contribution of the author(s).

"Pros" explains the remarks and findings of the algorithm. Besides, it highlights the advantages of the algorithm compared to the similar approaches, if possible.

"Cons" describes the shortcomings or the place for further improvements of the corresponding algorithm. Besides, it shows the weak points of the algorithm compared to the similar approaches, if possible.

Note the table can be sorted by clicking on the title of each column.
Algorithm	Problem Type	Parallelism Type	Main Idea	Pros	Cons	Reference
Algorithm	Problem Type	Parallelism Type	Main Idea	Pros	Cons	Reference
StreamSVM	Linear and binary classifications	Shared memory parallelism	Implements the first training linear SVM algorithm which integrates caching with optimization and streaming data in a parallel manner	Increasing throughput High accuracy Reduces memory requirements using data blocks Outperforms linear SVM solvers by orders of magnitude Data expansion on the fly by generating features on demand Exploits the memory storage hierarchy Good scalability in terms of the number of input data	Multiple passes through the entire data High overhead	Matsushima et al. [2012]
Incremental LIBLinear	Linear and multi-class classifications	Hybrid shared-distributed memory parallelism	Implements a balanced bagging incremental algorithm without loss of classification accuracy	Solves imbalanced problems Good scalability in terms of the number of cores Good scalability in terms of the number of training data Efficient reduction of training data High speedup (732 times) against the original implementation and (1193 times) against LIBLinear Efficient memory usage	Only supports linear classifications	Doan et al. [2013]
LS-SVM	Linear, non-linear, and binary classifications	Distributed HPC architecture	Classifies large-scale problems using extended and boosting in least square SVM algorithm on a standard workstation	Adaptive distributing of data High scalability in terms of the number of training data (20 billion data) Robust for different data types Robust for high dimensional input space Complexity is linearly dependent on the number of training examples and the number of machines Complexity is linearly dependent on the second order of the number of dimensions of the input space High speedup for large-scale problems High performance computations of matrix operations	Requires a central computing unit to combine the resulting SVs Communication overhead Sequential computations Benefits not fully clear for complex non-linear classifications	Do and Poulet [2006]
Tree-based incremental SVM	Linear classifications	Distributed HPC architecture	Implements parallel heap-based tree topology for incremental SVM	High speedup for a large increment size Close to being linearly scalable for large increment size Good performance of parallel version on large increment size Efficient access of data for all nodes	Limited network IO bound on small increment size Accuracy degradation in unbalanced cases Only supports linear classifications Poor performance of parallel version on the small increment size Slightly poor performance of parallel version on the medium increment size	Tveit and Engum [2003]
Centralized SVM	Linear and binary classifications	Distributed HPC architecture	Learns from distributed data sources and improves the work proposed by Syed et al. [1999] to obtain the global optimum using distributed systems	Learns from distributed data sources Good potential to apply data privacy Fast convergence in a finite number of iterations	May suffer from communication and synchronization overheads Supports only linear classifications	Caragea et al. [2005]
Consensus ADMM	Non-linear and binary classifications	Distributed HPC architecture	Implements a consensus ADMM-based SVM	Node failure safe Reduces communication overhead by communicating with neighboring nodes only Good scalability in terms of the number of computing nodes Good parallelizability Learns online and offline Good potential to apply data privacy	Negative impact of poor choices of parameters on convergence May require many iterations before convergence Benefits not fully shown for learning large scale problems	Forero et al. [2010]
DPSVM	Non-linear and binary classifications	Distributed HPC architecture	Exchanges SVs in a strongly connected network in which several servers work fast on distributed data in parallel with low communication cost	Fast convergence Independence of computing time to the number of SVs at each iteration Good scalability in terms of the number of training data Limited communication cost for not large networks Global optimum Outperforms the standard cascade SVM Close to linear scalability for not large networks Configurable network Online implementation and synchronization	High communication overhead for large networks The size of SVs impacts the communication cost	Lu et al. [2008]
PESVM	Non-linear and binary classifications	Distributed big data architectures using map-reduce	Implements an online learning algorithm in which the existing model is updated for new coming training data	Good scalability in terms of the number of training samples Avoids re-training the old data for new coming data Explainable results	Overhead Benefits not fully clear for large number of cores	He et al. [2011]
Newton SVM	Linear and binary classifications	GPU-based parallelism	Implements a Newton-based SVM classifier which requires only solving a linear system instead of a quadratic system	Reduces memory requirements compared to the similar approaches Outperforms standard SVM software such as Libsvm, SVM-Perf, and similar approaches Higher speedup (45 times) compared to the CPU-based parallel version and (over 100 times) LibSVM, SVM-Perf, and similar approaches Stores data in memory in two successive steps Good scalability in terms of number of training samples Classifies large-scale problems on standard workstations for data with small enough dimension of input space	Only supports linear classifications Performs worse than GPU-based LS-SVM Benefits not fully clear for training data with large dimension of input space	Do et al. [2008]
GPU-based LS-SVM	Linear and binary classifications	GPU parallelism	Implements least square SVM classifier	Fast convergence Avoids loading entire training data into memory Outperforms the Newton SVM by Do et al. [2008] Outperforms Libsvm and SVM-Perf with high speedup (1000 times) High scalability in terms of the number of training samples High speed up (70 times) compared to its CPU implementation	Only supports linear classifications Benefits not fully clear in kernalized SVMs	Do et al. [2008]