Improve the data communication pattern for PSelInv. The parallel scalability of PSelInv is much improved when more than 1000 processors are used. The variation of running time among different instances is also reduced.
For more details of the improvement see
M. Jacquelin, L. Lin, N. Wichmann and C. Yang, Enhancing the scalability and load balancing of the parallel selected inversion algorithm via tree-based asynchronous communication, submitted [arXiv]