Gets matrix permutation. More...

#include <limits.h>
#include <math.h>
#include "superlu_ddefs.h"

Macros
#define	PRNTlevel 0

Functions
static float	a_plus_at_CompRow_loc (int, int_t , int, int_t , int_t, int_t , int_t , int, int_t , int_t , int_t , int_t , gridinfo_t *)

float	get_perm_c_parmetis (SuperMatrix A, int_t perm_r, int_t perm_c, int nprocs_i, int noDomains, int_t sizes, int_t fstVtxSep, gridinfo_t grid, MPI_Comm *metis_comm)

Detailed Description

Gets matrix permutation.

– Distributed symbolic factorization auxialiary routine  (version 2.1) –
Lawrence Berkeley National Lab, Univ. of California Berkeley - July 2003
INRIA France - January 2004
Laura Grigori

November 1, 2007

Function Documentation

static float a_plus_at_CompRow_loc	(	int	iam,
		int_t *	perm_r,
		int	nprocs_i,
		int_t *	vtxdist_i,
		int_t	n,
		int_t *	rowptr,
		int_t *	colind,
		int	nprocs_o,
		int_t *	vtxdist_o,
		int_t *	p_bnz,
		int_t **	p_b_rowptr,
		int_t **	p_b_colind,
		gridinfo_t *	grid
	)

static

Purpose

Form the structure of Pr*A +A'Pr'. A is an n-by-n matrix in
NRformat_loc format, represented by (rowptr, colind). The output
B=Pr*A +A'Pr' is in NRformat_loc format (symmetrically, also row
oriented), represented by (b_rowptr, b_colind).

The input matrix A is distributed in block row format on nprocs_i
processors.  The output matrix B is distributed in block row format
on nprocs_o processors, where nprocs_o <= nprocs_i.  On output, the
matrix B has its rows permuted according to perm_r.

Sketch of the algorithm

Let iam by my process number.  Let fst_row, lst_row = m_loc +
fst_row be the first/last row stored on iam.

Compute Pr' - the inverse row permutation, stored in iperm_r.

Compute the transpose  of the block row of Pr*A that iam owns:
   T[:,Pr(fst_row:lst_row)] = Pr' * A[:,fst_row:lst_row] * Pr'

All to all communication such that every processor iam receives all
the blocks of the transpose matrix that it needs, that is
          T[fst_row:lst_row, :]

Compute B = A[fst_row:lst_row, :] + T[fst_row:lst_row, :]

If Pr != I or nprocs_i != nprocs_o then permute the rows of B (that
is compute Pr*B) and redistribute from nprocs_i to nprocs_o
according to the block row distribution in vtxdist_i, vtxdist_o.

float get_perm_c_parmetis	(	SuperMatrix *	A,
		int_t *	perm_r,
		int_t *	perm_c,
		int	nprocs_i,
		int	noDomains,
		int_t **	sizes,
		int_t **	fstVtxSep,
		gridinfo_t *	grid,
		MPI_Comm *	metis_comm
	)

Purpose

GET_PERM_C_PARMETIS obtains a permutation matrix Pc, by applying a
graph partitioning algorithm to the symmetrized graph A+A'.  The
multilevel graph partitioning algorithm used is the
ParMETIS_V3_NodeND routine available in the parallel graph
partitioning package parMETIS.

The number of independent sub-domains noDomains computed by this
algorithm has to be a power of 2.  Hence noDomains is the larger
number power of 2 that is smaller than nprocs_i, where nprocs_i = nprow
* npcol is the number of processors used in SuperLU_DIST.

Arguments

A       (input) SuperMatrix*
        Matrix A in A*X=B, of dimension (A->nrow, A->ncol). The number
        of the linear equations is A->nrow.  Matrix A is distributed
        in NRformat_loc format.

perm_r  (input) int_t*
        Row permutation vector of size A->nrow, which defines the 
        permutation matrix Pr; perm_r[i] = j means row i of A is in 
        position j in Pr*A.

perm_c  (output) int_t*
        Column permutation vector of size A->ncol, which defines the 
        permutation matrix Pc; perm_c[i] = j means column i of A is 
        in position j in A*Pc.

nprocs_i (input) int*
        Number of processors the input matrix is distributed on in a block
        row format.  It corresponds to number of processors used in
        SuperLU_DIST.

noDomains (input) int*, must be power of 2
        Number of independent domains to be computed by the graph
        partitioning algorithm.  ( noDomains <= nprocs_i )

sizes   (output) int_t**, of size 2 * noDomains
        Returns pointer to an array containing the number of nodes
        for each sub-domain and each separator.  Separators are stored 
        from left to right.
        Memory for the array is allocated in this routine.

fstVtxSep (output) int_t**, of size 2 * noDomains
        Returns pointer to an array containing first node for each
        sub-domain and each separator.
        Memory for the array is allocated in this routine.

Return value

  < 0, number of bytes allocated on return from the symbolic factorization.


0, number of bytes allocated when out of memory.

Macros

Functions

Detailed Description

Function Documentation

Purpose

Sketch of the algorithm

Purpose

Arguments

Return value