PIRWLS and PSIRWLS |
Non-linear, binary, and multi-class classifications |
Shared memory parallelism |
Implements two parallel iterative re-weighted least square SVMs to solve a full SVM and a semi-parametric model of SVMs. Parallelizes a Cholesky decomposition to solve linear systems in each iteration |
- Reduces computationally expensive kernel evaluations
- Avoids computing a computationally expensive matrix inversion
- Outperforms LIBIRWLS proposed by Morales and Vazquez [2017]
- Higher speedup compared to the standard SVM libraries
- Similar accuracy to the standard SVM libraries, but in less time
|
- A large numbers of SVs in some cases
- Slower than the standard libraries using one core
- Benefits not fully clear for a large number of cores
- May suffer from limited memory
|
Morales and Vazquez [2016] |
LIBIRWLS |
Non-linear and binary classifications |
Shared memory parallelism |
Implements a parallel library that uses iterative re-weighted least square SVM instead of quadratic programming. Approximates the kernel matrix using a sparse greedy approximation |
- Efficiently controls the number of SVs
- Reduces training samples using random sampling
|
- Does not perform better than PSIRWLS proposed by Morales and Vazquez [2016] in terms of accuracy
- May suffer from limited memory
- Benefits not fully clear in terms of scalability
|
Morales and Vazquez [2017] |
P-packSVM |
Non-linear and multi-class classifications |
Distributed HPC Architectures |
Stochastically approximates the gradient using gradient descent, so that only a single training sample is used instead of using the whole training samples. Packs many consecutive gradient descent steps in one pack |
- Supports any arbitrary kernel function
- Good scalability in terms of the number of training samples and cores
- Outperforms the similar state-of-the-art algorithms
- Reduces communication overhead
- Reduces number of iterations
- Outperforms IPM- and SMO-based algorithms
- Reduces the problem size
- Trains large-scale problems in much shorter time and comparative accuracy compared to the similar approaches
|
- Lacks a convergence criterion
- Sub-linear speedup in some cases
|
Zhu et al. [2009] |
PmSVM |
Non-linear and multi-class classifications |
Hybrid shared and distributed HPC Architectures |
Builds balanced bagging classifiers using under-sampling strategy and trains only a fraction of training data |
- Fast convergence
- Scales well with the number of training samples
- Avoids training on full training samples
- Handles class-imbalance problems using an under-sampling strategy
- Higher speedup (286 times) compared to the sequential version and (1000 times) compared to Libsvm and LIBLINEAR.
|
- Suffers from limited memory
- Requires to load the whole training data into the memory
- Does not support heterogeneous computing environments
- Benefits not fully clear for a large number of cores
|
Nghi et al. [2015] |
Gradient system |
Non-linear and multi-class classifications |
Distributed HPC Architectures |
Uses neural networks for training SVMs. Solves high-frequency oscillations in the neighborhood of the discontinuity surface (chattering problem) using a boundary layer technique |
- Good parallelizability
- Calculates only the lower triangular matrix addressed by SVM
- Outperforms SVMLight and Libsvm
- Solves the chattering problem
|
- Benefits not fully clear for scalability of large-scale problems
- Benefits not fully clear for scalability in terms of the number of processors
- Complex configuration
|
Ferreira et al. [2006] |
ASGD |
Non-linear and multi-class classifications |
Distributed big data architectures using Hadoop map-reduce |
Performs feature extraction in parallel using a large number of mappers. Performs averaging stochastic gradient descent in parallel for multi-class classifications |
- Reduces dimension of the problem
- Fast extraction of fairly sophisticated features
- Fast training of redundant data using an averaging scheme
- Good parallelizability
- Reduces the traffic for loading data
- Good scalability in terms of the number of training samples
- Fast convergence
|
- Benefits not fully clear for non-redundant data
- May suffer from communication overheads
- Concerns regarding asymptotic convergence property
|
Lin et al. [2011] |
Map-Reduce SVM |
Unclear |
Distributed big data architectures |
Reformulates the optimization problem to perform better parallelization |
- Improves ASGD proposed by Lin et al. [2011] in terms of communication overhead
- Better parallelizability compared to the similar approaches
- Reduces the communication between processors using an averaging scheme
- Linear speedup while increasing the number of processors
- Outperforms the standard cascade SVMs in terms of speedup
- Performs other machine learning algorithms in parallel on the proposed platform
|
- Concerns regarding scalability in terms of the number of cores
- Requires using whole training samples in each iteration
- Concerns regarding scalability in terms of the number of training samples
- Unclear type of the classification problems
|
Chu et al. [chu2007] |
BMBPGD |
Distributed big data architectures using Apache Spark |
Non-linear and binary classifications |
Implements a budgeted mini-batch gradient descent in parallel on Apache Spark |
- Uses only a fraction of training examples in each iteration
- Good scalability in terms of the number of training samples
- Keeps the number of SVs under control
- Avoids the curse of dimensionality
- Outperforms SGD-based algorithms in terms of accuracy and time
|
- Concerns regarding accuracy in some cases
- Exists a trade-off between accuracy and time
|
Tao et al. [2014] |