Automatic parallel SVM |
GPU-based Parallelism |
Non-linear and binary classifications |
Develops a directive-based technique using a gradient ascent for the optimization problem. Performs a novel feature extraction in which pixel values are converted to a high dimensional input vector |
- Higher speedup (3 times) for a simple change of memory access pattern
- Higher speedup (10-fold) for changing the input data structure
- Higher memory throughput
- The fastest GPU-based SVM for datasets with high dimensional input space
- Dimension reduction of problem
- Maintaining high accuracy
- Efficient parameter tuning
|
- Benefits not fully clear regarding scalability
|
Codreanu et al. [2014] |
D-ADMM |
Linear and binary classifications |
Distributed HPC Architectures |
Uses a coloring scheme that leads to fewer communications between nodes than that in the state-of-the-art algorithms |
- Reduces communication between nodes
- Asynchronous computations
- Good parallelizability
|
- Unclear theoretical explanation of why this algorithm is more efficient than the similar algorithms
- Supports only linear classifications
- Slow convergence
|
Mota et al. [2013] |
DSVM |
Linear and binary classifications |
Distributed big data architectures using map-reduce |
Performs block minimizations and obtains final results using averaging and line search strategies |
- Scales well in terms of the number of training and testing samples
- Communication complexity is independent of the number of training samples
- Fast computation time for training large-scale problems (over 80 million samples in less than 11 minutes)
|
- Supports only linear classifications
- No theoretical proof of global convergence
|
Pechyony et al. [2011] |
Ensemble SVMs |
Non-linear and Multi-class classifications |
Shared memory parallelism |
Implements ensemble learning to train data and uses simple aggregation schemes to find the global result |
- Significant speedup compared to Libsvm with comparative accuracy
- Reduces training complexity
- Avoids duplicate storage
- High classification accuracy
- Good parallelizability
- An intuitive programming framework
|
- Needs more complex aggregation schemes for further performance improvement
- Deterioration of accuracy for low-dimensional problems
- Limited memory
|
Claesen et al. [2014] |
euroCAT |
Linear and binary classifications |
Distributed HPC Architectures |
Implements a multi-centric distributed learning in which data privacy is a top priority with application in personalized medicine |
- Preserves data privacy
- Higher speedup while having good accuracy
- Quality of the model is similar to the centralized models
- Asynchronous communications
- Efficient parameter tuning
|
- Supports only linear classifications
- Slow convergence
|
Deist et al. [2017] |
GaBP SVM |
Non-linear and binary classifications |
Distributed HPC Architectures |
Reformulates the standard SVM problem from algebraic to a probabilistic domain using Gaussian distribution |
- Reduces overheads using a few numbers of passing messages
- Good scalability in terms of the number of cores (up to 1024 cores)
- Comparable accuracy with SVMLight
- The largest parallel implementation of the Gaussian belief propagation ever done
- A linear speedup using 256 CPUs
- Robust to sparse and dense matrix operations
|
- Lacks support of asynchronous communications in heterogeneous computing environments
- Slow convergence of a large number of input data and large networks
- Communication overhead
- Computes the full kernel matrix
|
Bickson et al. [2008] |
GADMM |
Linear and binary classifications |
Distributed HPC Architectures |
Implements a group-based ADMM in which all slave nodes are divided into groups and uses a weighted average method to gather the results from all groups and update the global variable |
- Fast convergence
- Higher speedup (30%) compared to non-grouped ADMM
- Reduces the number of ADMM iterations
- Improves the global consensus
- Good scalability in terms of the number of training samples
|
- Supports only linear classifications
- Slightly degrades the accuracy
|
Wang et al. [2017] |
GPU-Tailored kernalized SVM |
GPU-based Parallelism |
Non-linear, binary, and multi-class classifications. |
Handles sparse data using sparsity clustering which groups training samples with similar sparsity patterns on GPU |
- Solves problems with sparse datasets that may not follow the GPUs' special memory pattern
- Robust to both sparse and dense data formats
- Efficiently handles sparse data
- Publicly available implementation
- Overcomes some of limited memory issues on GPUs
- Reduces runtime
- Efficient memory access pattern
- Easy configuration
- Orders of magnitude faster performance than existing CPUs and GPUs implementations
|
- Benefits not fully clear for other data structures
- May still suffers from limited memory
- Initial clustering of data can be computationally expensive
|
Cotter et al. [2011] |
Initial clustering |
Unclear |
Shared memory parallelism |
Minimizes the number of training examples that represent the original large-scale problem using k-means clustering |
- Reduces number of training examples using clustering
- Robust for unbalanced data
- Easy implementation
- Speedup (3 times) compared to the sequential version
|
- Possible deterioration of accuracy
- Limited explainability of results
- Benefits not fully clear for non-linear and multi-class classifications
- Limited memory
|
Shrivastava et al. [2011] |
kSVM |
Non-linear and Multi-class classifications |
Shared memory parallelism |
Distributes data into clusters in which clusters locally train data in parallel |
- Scales well with the size of training data
- Maintains the classification accuracy
- Good scalability in terms of the number of training samples
|
- Exists a trade-off between computation time and accuracy
- Finds local SVs instead of the global SVs
|
Do et al. [2015] |