PMatrix contains the main data structure and the computational routine for the parallel selected inversion. More...
#include <pselinv.hpp>
Public Member Functions | |
PMatrix (const GridType *g, const SuperNodeType *s, const PEXSI::SuperLUOptions *o) | |
Int | NumCol () const |
Int | NumSuper () const |
Int | NumLocalBlockCol () const |
NumLocalBlockCol returns the total number of block columns. | |
Int | NumLocalBlockRow () const |
NumLocalBlockRow returns the total number of block rows. | |
std::vector< Int > & | ColBlockIdx (Int jLocal) |
std::vector< Int > & | RowBlockIdx (Int iLocal) |
Int | NumBlockL (Int jLocal) const |
NumBlockL returns the number of nonzero L blocks for the local block column jLocal. | |
Int | NumBlockU (Int iLocal) const |
NumBlockU returns the number of nonzero U blocks for the local block row iLocal. | |
const GridType * | Grid () const |
Grid returns the GridType structure of the current PMatrix. | |
const SuperNodeType * | SuperNode () const |
SuperNode returns the supernodal partition of the current PMatrix. | |
std::vector< LBlock > & | L (Int jLocal) |
L returns the vector of nonzero L blocks for the local block column jLocal. | |
std::vector< UBlock > & | U (Int iLocal) |
U returns the vector of nonzero U blocks for the local block row iLocal. | |
std::vector< std::vector< int > > & | WorkingSet () |
WorkingSet returns the ordered list of supernodes which could be done in parallel. | |
Int | CountSendToRight (Int ksup) |
CountSendToRight returns the number of processors to the right of current processor with which it has to communicate. | |
Int | CountRecvFromBelow (Int ksup) |
CountRecvFromBelow returns the number of processors below the current processor from which it receives data. | |
Int | CountSendToCrossDiagonal (Int ksup) |
CountSendToCrossDiagonal returns the number of cross diagonal processors with which current processor has to communicate. | |
Int | CountRecvFromCrossDiagonal (Int ksup) |
CountRecvFromCrossDiagonal returns the number of cross diagonal processors with which current processor has to communicate. | |
void | GetEtree (std::vector< Int > &etree_supno) |
GetEtree computes the supernodal elimination tree to be used later in the pipelined selected inversion stage. | |
void | ConstructCommunicationPattern () |
ConstructCommunicationPattern constructs the communication pattern to be used later in the selected inversion stage. The supernodal elimination tree is used to add an additional level of parallelism between supernodes. ConstructCommunicationPattern_P2p is called by default. | |
void | ConstructCommunicationPattern_P2p () |
ConstructCommunicationPattern_P2p constructs the communication pattern to be used later in the selected inversion stage. The supernodal elimination tree is used to add an additional level of parallelism between supernodes. | |
void | ConstructCommunicationPattern_Collectives () |
ConstructCommunicationPattern_Collectives constructs the communication pattern to be used later in the selected inversion stage with the Collectives variant. The supernodal elimination tree is used to add an additional level of parallelism between supernodes. | |
void | ConstructCommunicationPattern_Hybrid () |
ConstructCommunicationPattern_Hybrid constructs the communication pattern to be used later in the selected inversion stage with the Hybrid variant. The supernodal elimination tree is used to schedule the pipelined supernodes. | |
void | PreSelInv () |
PreSelInv prepares the structure in L_ and U_ so that SelInv only involves matrix-matrix multiplication. More... | |
void | SelInv () |
SelInv is the main function for the selected inversion. More... | |
void | SelInv_Collectives () |
Collective communication version of the selected inversion. | |
void | SelInv_P2p () |
Point-to-point version of the selected inversion. | |
void | SelInv_Hybrid (Int threshold) |
Hybrid version of the selected inversion. This file is obsolete. | |
void | GetDiagonal (NumVec< Scalar > &diag) |
GetDiagonal extracts the diagonal elements of the PMatrix. More... | |
void | GetColumn (Int colIdx, NumVec< Scalar > &col) |
void | PMatrixToDistSparseMatrix (DistSparseMatrix< Scalar > &A) |
PMatrixToDistSparseMatrix converts the PMatrix into a distributed compressed sparse column matrix format. The DistSparseMatrix follows the natural order. More... | |
void | PMatrixToDistSparseMatrix (const DistSparseMatrix< Scalar > &A, DistSparseMatrix< Scalar > &B) |
PMatrixToDistSparseMatrix converts the PMatrix into a distributed compressed sparse column matrix format B, which has the same sparsity pattern as the DistSparseMatrix A. More... | |
void | PMatrixToDistSparseMatrix2 (const DistSparseMatrix< Scalar > &A, DistSparseMatrix< Scalar > &B) |
PMatrixToDistSparseMatrix2 is a more efficient version which performs the same job as PMatrixToDistSparseMatrix(A,B) especially when A contains much less nonzero elements than the current PMatrix. More... | |
Int | NnzLocal () |
NnzLocal computes the number of nonzero elements (L and U) saved locally. | |
LongInt | Nnz () |
Nnz computes the total number of nonzero elements in the PMatrix. | |
void | GetNegativeInertia (Real &inertia) |
GetNegativeInertia computes the negative inertia of a PMatrix. This can be used to estimate e.g. the number of eigenvalues of a matrix below a certain threshold. | |
PMatrix contains the main data structure and the computational routine for the parallel selected inversion.
After factorizing a SuperLUMatrix luMat (See SuperLUMatrix for information on how to perform factorization), perform the following steps for parallel selected inversion.
Conversion from SuperLU_DIST.
Symbolic information
SuperNodeType super; PMatrix PMloc; luMat.SymbolicToSuperNode( super );
Numerical information, both L and U.
luMat.LUstructToPMatrix( PMloc );
Preparation.
Construct the communication pattern for SelInv.
PMloc.ConstructCommunicationPattern(); or PMloc.ConstructCommunicationPattern_P2p(); or PMloc.ConstructCommunicationPattern_Collectives();
Numerical preparation so that SelInv only involves Gemm.
PMloc.PreSelInv();
PMloc.SelInv(); or PMloc.SelInv_P2p(); or PMloc.SelInv_Collectives();
Postprocessing.
Get the information in DistSparseMatrix format
DistSparseMatrix<Scalar> Ainv; PMloc.PMatrixToDistSparseMatrix( Ainv );
void PEXSI::PMatrix::GetDiagonal | ( | NumVec< Scalar > & | diag | ) |
GetDiagonal extracts the diagonal elements of the PMatrix.
1) diag is permuted back to the natural order
2) diag is shared by all processors in grid_->comm through a Allreduce procedure.
void PEXSI::PMatrix::PMatrixToDistSparseMatrix | ( | DistSparseMatrix< Scalar > & | A | ) |
PMatrixToDistSparseMatrix converts the PMatrix into a distributed compressed sparse column matrix format. The DistSparseMatrix follows the natural order.
[out] | A | Output sparse matrix. |
void PEXSI::PMatrix::PMatrixToDistSparseMatrix | ( | const DistSparseMatrix< Scalar > & | A, |
DistSparseMatrix< Scalar > & | B | ||
) |
PMatrixToDistSparseMatrix converts the PMatrix into a distributed compressed sparse column matrix format B, which has the same sparsity pattern as the DistSparseMatrix A.
The DistSparseMatrix follows the natural order.
[in] | A | Input sparse matrix to provide the sparsity pattern. |
[out] | B | Output sparse matrix. |
void PEXSI::PMatrix::PMatrixToDistSparseMatrix2 | ( | const DistSparseMatrix< Scalar > & | A, |
DistSparseMatrix< Scalar > & | B | ||
) |
PMatrixToDistSparseMatrix2 is a more efficient version which performs the same job as PMatrixToDistSparseMatrix(A,B) especially when A contains much less nonzero elements than the current PMatrix.
The DistSparseMatrix follows the natural order.
[in] | A | Input sparse matrix to provide the sparsity pattern. |
[out] | B | Output sparse matrix. |
void PEXSI::PMatrix::PreSelInv | ( | ) |
PreSelInv prepares the structure in L_ and U_ so that SelInv only involves matrix-matrix multiplication.
PreSelInv performs
Compute the inverse of the diagonal blocks
L_{kk} <- (L_{kk} U_{kk})^{-1}
Update the lower triangular L blocks
L_{ik} <- L_{ik} L_{kk}^{-1}
Update the upper triangular U blocks which saves redundant information as in L
U_{kj} <- L_{ik}
PreSelInv assumes that PEXSI::PMatrix::ConstructCommunicationPattern has been executed.
void PEXSI::PMatrix::SelInv | ( | ) |
SelInv is the main function for the selected inversion.
PSelInv is a right-looking based parallel selected inversion subroutine for sparse symmetric matrices. Static pivoting is used in this version.
Although the matrix is symmetric, the key idea of the current implementation of PSelInv is that the upper-triangular matrix is saved (in the form of UBlock). Such redundant information is effective for reducing the complexity for designing the communication pattern.
At each supernode ksup, the lower triangular part Ainv(isup, ksup) (isup > ksup) are first updated. The blocks in the processor row of ksup first sends the nonzero blocks in U(ksup, jsup) (which is L(isup, ksup)^T) to the Schur complements Ainv(isup, jsup). At the same time the blocks in the processor column of ksup sends the nonzero blocks (only nonzero row indices) to the Schur complement Ainv(isup, jsup). Then
sum_{jsup} Ainv(isup, jsup) U^{T}(ksup, jsup)
is performed. In this procedure, only processors with isRecvFromAbove[ksup] == true && isRecvFromLeft[ksup] == true participate in the computation.
The result is reduced to the processor column ksup within the same processor row. The diagonal block Ainv(ksup, ksup) is simply updated by a reduce procedure within the column processor group of ksup.
Then we update the Ainv(ksup, isup) blocks, simply via the update from the cross diagonal processors.
NOTE : The cross diagonal processor is only well here defined for square grids. For a P x P square grid, (ip, jp) is the cross diagonal processor of (jp, ip) if ip != jp. The current version of SelInv only works for square processor grids.
The communication is controlled by 3 sending varaibles and 3 receiving variables. The first dimension of all the sending and receiving variables are numSuper. The information contains redundancy since not all processors have access to all the supernodes. However, this increases the readability of the output significantly and only increases a small amount of memory cost for indexing. This set of sending / receiving mechanism avoids the double indexing of the supernodes and can scale to matrices of large size.
isSendToBelow:
Dimension: numSuper x numProcRow
Role : At supernode ksup, if isSendToBelow(ksup, ip) == true, send all local blocks {U(ksup, jsup) | jsup > ksup} to the processor row ip.
isRecvFromAbove:
Dimension: numSuper
Role :
isSendToRight:
Dimension: numSuper x numProcCol
Role : At supernode ksup, if isSendToRight(ksup, jp) == true, send all local blocks (mainly the nonzero row indicies, without the values to save the communication cost) {L(isup, ksup) | isup > ksup} to the processor column jp.
isRecvFromLeft:
Dimension: numSuper
Role :
isSendToCrossDiagonal:
Dimension: numSuper
Role : At supernode ksup, if isSendToCrossDiagonal(ksup) == true, send all local blocks {(isup, ksup) | isup > ksup} to the cross-diagonal processor. NOTE : This requires a square processor grid.
isRecvCrossDiagonal:
Dimension: numSuper
Role : At supernode ksup, if isRecvFromCrossDiagonal(ksup) == true, receive from the cross-diagonal processor. NOTE : This requires a square processor grid.