PMatrixUnsym contains the main data structure and the computational routine for the parallel selected inversion. More...
#include <pselinv_unsym.hpp>
Classes | |
struct | SuperNodeBufferTypeUnsym |
Public Member Functions | |
PMatrixUnsym (const GridType *g, const SuperNodeType *s, const PSelInvOptions *o, const SuperLUOptions *oLU) | |
void | Setup (const GridType *g, const SuperNodeType *s, const PSelInvOptions *o, const SuperLUOptions *oLU) |
std::vector< LBlock< T > > & | Lrow (Int iLocal) |
NumBlockL returns the number of nonzero L blocks for the local block column jLocal. More... | |
std::vector< UBlock< T > > & | Ucol (Int jLocal) |
Ucol returns the vector of nonzero U blocks for the local block col jLocal. | |
virtual void | ConstructCommunicationPattern () |
ConstructCommunicationPattern constructs the communication pattern to be used later in the selected inversion stage. The supernodal elimination tree is used to add an additional level of parallelism between supernodes. ConstructCommunicationPattern_P2p is called by default. | |
void | ConstructCommunicationPattern_P2p () |
ConstructCommunicationPattern_P2p constructs the communication pattern to be used later in the selected inversion stage. The supernodal elimination tree is used to add an additional level of parallelism between supernodes. | |
virtual void | PreSelInv () |
PreSelInv prepares the structure in L_ and U_ so that SelInv only involves matrix-matrix multiplication. More... | |
virtual void | SelInv () |
SelInv is the main function for the selected inversion. More... | |
void | SelInv_P2p () |
Point-to-point version of the selected inversion. | |
![]() | |
PMatrix (const GridType *g, const SuperNodeType *s, const PEXSI::PSelInvOptions *o, const PEXSI::SuperLUOptions *oLU) | |
void | deallocate () |
PMatrix (const PMatrix &C) | |
PMatrix & | operator= (const PMatrix &C) |
void | Setup (const GridType *g, const SuperNodeType *s, const PEXSI::PSelInvOptions *o, const PEXSI::SuperLUOptions *oLU) |
Int | NumCol () const |
Int | NumSuper () const |
Int | NumLocalBlockCol () const |
NumLocalBlockCol returns the total number of block columns. | |
Int | NumLocalBlockRow () const |
NumLocalBlockRow returns the total number of block rows. | |
std::vector< std::vector< Int > > & | ColBlockIdx () |
std::vector< std::vector< Int > > & | RowBlockIdx () |
std::vector< Int > & | ColBlockIdx (Int jLocal) |
std::vector< Int > & | RowBlockIdx (Int iLocal) |
Int | NumBlockL (Int jLocal) const |
NumBlockL returns the number of nonzero L blocks for the local block column jLocal. | |
Int | NumBlockU (Int iLocal) const |
NumBlockU returns the number of nonzero U blocks for the local block row iLocal. | |
const GridType * | Grid () const |
Grid returns the GridType structure of the current PMatrix. | |
const SuperNodeType * | SuperNode () const |
SuperNode returns the supernodal partition of the current PMatrix. | |
std::vector< LBlock< T > > & | L (Int jLocal) |
L returns the vector of nonzero L blocks for the local block column jLocal. | |
std::vector< UBlock< T > > & | U (Int iLocal) |
U returns the vector of nonzero U blocks for the local block row iLocal. | |
std::vector< std::vector< int > > & | WorkingSet () |
WorkingSet returns the ordered list of supernodes which could be done in parallel. | |
Int | CountSendToRight (Int ksup) |
CountSendToRight returns the number of processors to the right of current processor with which it has to communicate. | |
Int | CountSendToBelow (Int ksup) |
CountSendToBelow returns the number of processors below current processor with which it has to communicate. | |
Int | CountRecvFromBelow (Int ksup) |
CountRecvFromBelow returns the number of processors below the current processor from which it receives data. | |
Int | CountSendToCrossDiagonal (Int ksup) |
CountSendToCrossDiagonal returns the number of cross diagonal processors with which current processor has to communicate. | |
Int | CountRecvFromCrossDiagonal (Int ksup) |
CountRecvFromCrossDiagonal returns the number of cross diagonal processors with which current processor has to communicate. | |
void | GetEtree (std::vector< Int > &etree_supno) |
GetEtree computes the supernodal elimination tree to be used later in the pipelined selected inversion stage. | |
void | ConstructCommunicationPattern_P2p () |
ConstructCommunicationPattern_P2p constructs the communication pattern to be used later in the selected inversion stage. The supernodal elimination tree is used to add an additional level of parallelism between supernodes. | |
void | SelInv_P2p () |
Point-to-point version of the selected inversion. | |
void | GetDiagonal (NumVec< T > &diag) |
GetDiagonal extracts the diagonal elements of the PMatrix. More... | |
void | GetColumn (Int colIdx, NumVec< T > &col) |
void | PMatrixToDistSparseMatrix (DistSparseMatrix< T > &A) |
PMatrixToDistSparseMatrix converts the PMatrix into a distributed compressed sparse column matrix format. The DistSparseMatrix follows the natural order. More... | |
void | PMatrixToDistSparseMatrix_OLD (const DistSparseMatrix< T > &A, DistSparseMatrix< T > &B) |
PMatrixToDistSparseMatrix_OLD converts the PMatrix into a distributed compressed sparse column matrix format B, which has the same sparsity pattern as the DistSparseMatrix A. More... | |
void | PMatrixToDistSparseMatrix (const DistSparseMatrix< T > &A, DistSparseMatrix< T > &B) |
PMatrixToDistSparseMatrix converts the PMatrix into a distributed compressed sparse column matrix format B, which has the same sparsity pattern as the DistSparseMatrix A. More... | |
Int | NnzLocal () |
NnzLocal computes the number of nonzero elements (L and U) saved locally. | |
LongInt | Nnz () |
Nnz computes the total number of nonzero elements in the PMatrix. | |
void | GetNegativeInertia (Real &inertia) |
GetNegativeInertia computes the negative inertia of a PMatrix. This can be used to estimate e.g. the number of eigenvalues of a matrix below a certain threshold. | |
void | DumpLU () |
void | CopyLU (const PMatrix &C) |
template<> | |
void | GetNegativeInertia (Real &inertia) |
template<> | |
void | GetNegativeInertia (Real &inertia) |
Protected Member Functions | |
void | SelInvIntra_P2p (Int lidx) |
SelInvIntra_P2p. | |
void | SelInv_lookup_indexes (SuperNodeBufferTypeUnsym &snode, std::vector< LBlock< T > > &LcolRecv, std::vector< LBlock< T > > &LrowRecv, std::vector< UBlock< T > > &UcolRecv, std::vector< UBlock< T > > &UrowRecv, NumMat< T > &AinvBuf, NumMat< T > &LBuf, NumMat< T > &UBuf) |
SelInv_lookup_indexes. | |
void | UnpackData (SuperNodeBufferTypeUnsym &snode, std::vector< LBlock< T > > &LcolRecv, std::vector< LBlock< T > > &LrowRecv, std::vector< UBlock< T > > &UcolRecv, std::vector< UBlock< T > > &UrowRecv) |
UnpackData. | |
void | ComputeDiagUpdate (SuperNodeBufferTypeUnsym &snode) |
ComputeDiagUpdate. | |
void | SendRecvCD (std::vector< SuperNodeBufferTypeUnsym > &arrSuperNodes, Int stepSuper) |
SendRecvCD_UpdateU. | |
void | SendRecvSizesCD (std::vector< Int > &arrSuperNodes, Int stepSuper, CDBuffers &buffers) |
void | IRecvContentCD (std::vector< Int > &arrSuperNodes, Int stepSuper, CDBuffers &buffers) |
void | WaitContentLCD (std::vector< Int > &arrSuperNodes, Int stepSuper, CDBuffers &buffers) |
void | WaitContentUCD (std::vector< Int > &arrSuperNodes, Int stepSuper, CDBuffers &buffers) |
![]() | |
void | SelInvIntra_P2p (Int lidx, Int &rank) |
SelInvIntra_P2p. | |
void | SelInv_lookup_indexes (SuperNodeBufferType &snode, std::vector< LBlock< T > > &LcolRecv, std::vector< UBlock< T > > &UrowRecv, NumMat< T > &AinvBuf, NumMat< T > &UBuf) |
SelInv_lookup_indexes. | |
void | GetWorkSet (std::vector< Int > &snodeEtree, std::vector< std::vector< Int > > &WSet) |
GetWorkSet. | |
void | UnpackData (SuperNodeBufferType &snode, std::vector< LBlock< T > > &LcolRecv, std::vector< UBlock< T > > &UrowRecv) |
UnpackData. | |
void | ComputeDiagUpdate (SuperNodeBufferType &snode) |
ComputeDiagUpdate. | |
void | SendRecvCD_UpdateU (std::vector< SuperNodeBufferType > &arrSuperNodes, Int stepSuper) |
SendRecvCD_UpdateU. | |
Protected Attributes | |
std::vector< std::vector < UBlock< T > > > | Ucol_ |
std::vector< std::vector < LBlock< T > > > | Lrow_ |
std::vector< Int > | UcolSize_ |
std::vector< Int > | LrowSize_ |
![]() | |
const GridType * | grid_ |
const SuperNodeType * | super_ |
const PSelInvOptions * | options_ |
const SuperLUOptions * | optionsLU_ |
Int | limIndex_ |
Int | maxTag_ |
std::vector< std::vector< Int > > | ColBlockIdx_ |
std::vector< std::vector< Int > > | RowBlockIdx_ |
std::vector< std::vector < LBlock< T > > > | L_ |
std::vector< std::vector < UBlock< T > > > | U_ |
std::vector< std::vector< Int > > | workingSet_ |
BolNumMat | isSendToBelow_ |
BolNumMat | isSendToRight_ |
BolNumVec | isSendToDiagonal_ |
BolNumMat | isSendToCrossDiagonal_ |
BolNumMat | isRecvFromBelow_ |
BolNumVec | isRecvFromAbove_ |
BolNumVec | isRecvFromLeft_ |
BolNumMat | isRecvFromCrossDiagonal_ |
std::vector< TreeBcast * > | fwdToBelowTree_ |
std::vector< TreeBcast * > | fwdToRightTree_ |
std::vector< TreeReduce< T > * > | redToLeftTree_ |
std::vector< TreeReduce< T > * > | redToAboveTree_ |
Additional Inherited Members | |
![]() | |
enum | { SELINV_TAG_U_SIZE, SELINV_TAG_U_CONTENT, SELINV_TAG_L_SIZE, SELINV_TAG_L_CONTENT, SELINV_TAG_L_REDUCE, SELINV_TAG_D_SIZE, SELINV_TAG_D_CONTENT, SELINV_TAG_D_REDUCE, SELINV_TAG_L_SIZE_CD, SELINV_TAG_L_CONTENT_CD, SELINV_TAG_COUNT } |
![]() | |
static PMatrix< T > * | Create (const GridType *pGridType, const SuperNodeType *pSuper, const PSelInvOptions *pSelInvOpt, const SuperLUOptions *pLuOpt) |
Create is a factory method which returns a pointer either to a new PMatrix or to a new PMatrixUnsym depending on the pLuOpt parameter. | |
static PMatrix< T > * | Create (const SuperLUOptions *pLuOpt) |
PMatrixUnsym contains the main data structure and the computational routine for the parallel selected inversion.
NOTE The following is a bit obsolete.
After factorizing a SuperLUMatrix luMat (See SuperLUMatrix for information on how to perform factorization), perform the following steps for parallel selected inversion.
Conversion from SuperLU_DIST.
Symbolic information
SuperNodeType super; PMatrix PMloc; luMat.SymbolicToSuperNode( super );
Numerical information, both L and U.
luMat.LUstructToPMatrix( PMloc );
Preparation.
Construct the communication pattern for SelInv.
PMloc.ConstructCommunicationPattern(); or PMloc.ConstructCommunicationPattern_P2p(); or PMloc.ConstructCommunicationPattern_Collectives();
Numerical preparation so that SelInv only involves Gemm.
PMloc.PreSelInv();
PMloc.SelInv(); or PMloc.SelInv_P2p(); or PMloc.SelInv_Collectives();
Postprocessing.
Get the information in DistSparseMatrix format
DistSparseMatrix<Scalar> Ainv; PMloc.PMatrixToDistSparseMatrix( Ainv );
|
inline |
NumBlockL returns the number of nonzero L blocks for the local block column jLocal.
NumBlockU returns the number of nonzero U blocks for the local block row iLocal. Lrow returns the vector of nonzero L blocks for the local block row iLocal.
|
virtual |
PreSelInv prepares the structure in L_ and U_ so that SelInv only involves matrix-matrix multiplication.
PreSelInv performs
Compute the inverse of the diagonal blocks
L_{kk} <- (L_{kk} U_{kk})^{-1}
Update the lower triangular L blocks
L_{ik} <- L_{ik} L_{kk}^{-1}
Update the upper triangular U blocks which saves redundant information as in L
U_{kj} <- L_{ik}
PreSelInv assumes that PEXSI::PMatrix::ConstructCommunicationPattern has been executed.
Reimplemented from PEXSI::PMatrix< T >.
|
virtual |
SelInv is the main function for the selected inversion.
PSelInv is a right-looking based parallel selected inversion subroutine for sparse matrices. Static pivoting is used in this version.
At each supernode ksup, the lower triangular part Ainv(isup, ksup) (isup > ksup) are first updated. The blocks in the processor row of ksup first sends the nonzero blocks in U(ksup, jsup) (which is L(isup, ksup)^T) to the Schur complements Ainv(isup, jsup). At the same time the blocks in the processor column of ksup sends the nonzero blocks (only nonzero row indices) to the Schur complement Ainv(isup, jsup). Then
sum_{jsup} Ainv(isup, jsup) U^{T}(ksup, jsup)
is performed. In this procedure, only processors with isRecvFromAbove[ksup] == true && isRecvFromLeft[ksup] == true participate in the computation.
The result is reduced to the processor column ksup within the same processor row. The diagonal block Ainv(ksup, ksup) is simply updated by a reduce procedure within the column processor group of ksup.
Then we update the Ainv(ksup, isup) blocks, simply via the update from the cross diagonal processors.
NOTE : The cross diagonal processor is only well here defined for square grids. For a P x P square grid, (ip, jp) is the cross diagonal processor of (jp, ip) if ip != jp. The current version of SelInv only works for square processor grids.
The communication is controlled by 3 sending varaibles and 3 receiving variables. The first dimension of all the sending and receiving variables are numSuper. The information contains redundancy since not all processors have access to all the supernodes. However, this increases the readability of the output significantly and only increases a small amount of memory cost for indexing. This set of sending / receiving mechanism avoids the double indexing of the supernodes and can scale to matrices of large size.
isSendToBelow:
Dimension: numSuper x numProcRow
Role : At supernode ksup, if isSendToBelow(ksup, ip) == true, send all local blocks {U(ksup, jsup) | jsup > ksup} to the processor row ip.
isRecvFromAbove:
Dimension: numSuper
Role :
isSendToRight:
Dimension: numSuper x numProcCol
Role : At supernode ksup, if isSendToRight(ksup, jp) == true, send all local blocks (mainly the nonzero row indicies, without the values to save the communication cost) {L(isup, ksup) | isup > ksup} to the processor column jp.
isRecvFromLeft:
Dimension: numSuper
Role :
isSendToCrossDiagonal:
Dimension: numSuper
Role : At supernode ksup, if isSendToCrossDiagonal(ksup) == true, send all local blocks {(isup, ksup) | isup > ksup} to the cross-diagonal processor. NOTE : This requires a square processor grid.
isRecvCrossDiagonal:
Dimension: numSuper
Role : At supernode ksup, if isRecvFromCrossDiagonal(ksup) == true, receive from the cross-diagonal processor. NOTE : This requires a square processor grid.
Reimplemented from PEXSI::PMatrix< T >.