Nuit Blanche: Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems

Thursday, February 25, 2016

Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems - implementation -

Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems by Xiao Liu, Jianlin Xia , and Maarten V. De Hoop

We design efficient and distributed-memory parallel randomized direct solvers for large structured dense linear systems, including a fully matrix-free version based on matrix-vector multiplications and a partially matrix-free one. The dense coefficient matrix A has an off-diagonal low-rank structure, as often encountered in practical applications such as Toeplitz systems and discretized integral and partial differential equations. A distributed-memory parallel framework for randomized structured solution is shown. Scalable adaptive randomized sampling and hierarchical compression algorithms are designed to approximate A by hierarchically semiseparable (HSS) matrices. Systematic process grid storage schemes are given for different HSS forms. Parallel hierarchical algorithms are proposed for the resulting HSS forms. As compared with existing work on parallel HSS methods, our algorithms have several remarkable advantages, including the matrix-free schemes that avoid directly using dense A, a synchronized adaptive numerical rank detection, the integration of additional structures into the HSS generators, and much more flexible choices of the number of processes. Comprehensive analysis is conducted and shows that the communication costs are significantly reduced by up to an order of magnitude. Furthermore, we improve the original matrix-free HSS construction algorithm by avoiding some instability issues and by better revealing the nested rank structures. Tests on large challenging dense discretized matrices related to 3D scattering fully demonstrate the superior efficiency and scalability of the direct solvers. For example, for a 106 × 106 dense discretized matrix, the partially matrix-free HSS construction takes about 4500 seconds with 512 processes, while the solution takes only 0.63 second. The storage saving is over 30 times. The fully matrix-free one takes slightly longer but is more flexible and accurate.

An implementation is here.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !