Incremental Approximate Matrix Factorization IPM |
Non-linear and binary classifications |
Shared memory parallelism |
Employs incremental approximate matrix factorization using ICF and Kronecker |
- Reduces memory requirements
- Creates a balance between accuracy and computation time
|
- Suffers from numerical stableness
- Exists a trade-off between speedup and accuracy
- May suffer from limited shared memory
|
Wu et al. [2006] |
Hybrid MPI-OpenMP Linear IPM |
Linear and binary classifications |
Hybrid shared-distributed memory parallelism |
Distributes data amongst parallel nodes using block structure and uses factorization for numerical stability |
- Higher classification accuracy compared to LibLinear
- Removes a dense matrix from the computation by reformulating the problem
- Good parallelizability
- Faster than a solo MPI implementation
- Numerical stability compared to the similar approaches
- Efficient computation of computationally expensive operations
|
- Supports only linear classifications
- Requires loading of all samples into the memory
|
Woodsend et al. [2009] |
PSVM |
Non-linear and binary classifications |
Distributed HPC Architectures |
Reduces the problem size using matrix factorization and partitions the problem using row-based data blocks |
- Reduces memory requirements
- Higher speedup compared to the similar approaches
- Good scalability in terms of the number of cores
- Reduces the problem size
- Efficient computation of matrix inversion
|
- A small fraction of the problem in each iteration is sequential
- May suffer from synchronization and communication overheads
- Reducing problem size impacts the performance in terms of the classification accuracy
- The suggested reduced size of problem is not optimal in many cases
- Sub-linear speedup due to sequential steps
|
Chang et al. in [2009] |
SVM-PCG |
Linear and binary classifications |
Distributed HPC Architectures |
Uses a linear preconditioned method to compute each iteration |
- Competitive performance with SVMLight
- Higher speedup compared to other similar approaches
- Reduces the problem size
- Efficient computations of matrix inversion and matrix operations
- Open access
|
- No support for nonlinear kernels
- Exists sequential operations that have potential to be performed in parallel
- Benefits not fully clear for large-scale problems with a large number of features
|
Gertz et al. [2010] |
GPU-cluster SVM |
Non-linear and binary classifications |
GPU parallelism |
Accelerates a cross-validation process by performing multiple tasks in parallel |
- Higher speedup (90 times) compared to Libsvm using a single node
- Higher speedup (3 times) compared to the CPU implementations
- Scales well with the number of training samples
- Efficient memory access
- Efficient data transfer between the host and the device
- Efficient parameter tuning using cross-validation
- Avoids re-computation of computationally expensive tasks
|
- Does not take the maximum advantage of GPU features, e.g., the pinned memory and unified virtual address
- May suffers from limited memory
|
Li et al. [2013] |
HPSVM |
Non-linear and binary classifications |
GPU parallelism |
Implements a heterogeneous parallel SVM and accelerates the computationally expensive matrix operations using an approximate factorization |
- Low memory requirements
- Fast convergence
- Good parallelizability
- Takes the maximum advantage of CPU-GPU parallelism using dual buffer 3-stage pipeline mechanism
- Higher speedup (11 times) compared to the CPU implementation
- Improves the work proposed by Li et al. [2013] in terms of data transfer and memory access
- Efficient hierarchical memory access
- Good scalability in terms of the number of training samples
- High training accuracy
|
- Exists some sequential operations
- May suffer from limited memory
|
Li et al. [2016] |