An improved method of parallel computation of the electromagnetic-scattering characteristics of complexly shaped objects has been devised. This method belongs to a class of methods that involve the finite-element solution of Maxwell's equations on unstructured grids. ("Unstructured grids" in this context does not signify grids that lack structure; instead, it is a specialized term for grids with arbitrarily specified, complex, and/or irregular structures.) As explained below, the present method effects a simplification (relative to the older methods in the same class) in the use of parallel computers, and involves an algorithm that is scalable in the sense that it is readily useable on large, massively parallel computers. A finite-element mathematical model is needed to represent a typical electromagnetic-scattering structure that includes components made of various electromagnetically penetrable (e.g., dielectric) and/or impenetrable (electrically conductive) materials. An unstructured grid is needed to represent the complexity of the geometry of such a structure and its components. In the present method as in the other methods of the same class, the computational grid or mesh for a given problem must be truncated at a surface that surrounds the scattering structure at a suitable distance. The surface must be chosen consistently with the need to both maintain accuracy of the computed electromagnetic field and limit the meshed volume of free space. Maxwell's equations for the electromagnetic field are put in three-dimensional Helmholtz wave-equation form and solved on the mesh by a coupled finite-element/ integral-equation technique.

The specific integral-equation formulation is of a boundary-element type. This formulation results in efficient and accurate truncation of the computational domain. A system of equations in partitioned-matrix·partitioned-vector form results from the combination of (1) finite-element discretization of the volume in and around the scattering structure and (2) integral-equation discretization of the surface. The system of equations is solved by a combination of (1) an iterative sparse-matrix-equation-solving subalgorithm and (2) a dense-matrix-factorization subalgorithm. The assembly and solution of the matrix equation and the computation of observable quantities are all accomplished in parallel, using various numbers of processors at various stages of the calculation.

A common feature of the older methods is the need for mesh-partitioning algorithms to distribute the unstructured mesh and the sparse matrix entries among the available processors. This need arises from the distributed-memory architecture of typical parallel computers and the consequent lack of direct access, by every processor, to all the mesh and matrix data. In the present method, one does not explicitly partition the mesh; instead, one emphasizes decomposition of the sparse matrix entries among the processors, and such mesh partitioning as happens to occur becomes merely incidental to this decomposition. The choice of a specific decomposition is guided by recognition that what one needs to compute efficiently at each step of the iterative subalgorithm is the inner product of a sparse matrix and a dense vector.

The chosen decomposition is of a row-slab type. The figure illustrates aspects of the row-slab decomposition for an example problem. The top left part of the figure shows the original structure of the sparse matrix. The top right part of the figure shows the structure of the matrix as reordered for minimum bandwidth in preparation for the row-slab decomposition and the resulting matrix·vector multiplication Kx = y. Each processor handles one slab. Because of the minimum-bandwidth reordering, the minimum and maximum column indices of each slab are known. If the row indices of that piece of the dense vector x that is local to this processor contain the range from the minimum to the maximum column index for this slab, then the multiplication can be done purely locally, and the corresponding part of the product vector y will be purely local. In general, the column indices range beyond the local row indices; this gives rise to a need for a communication step to obtain the adjacent portions of x that are not local to this processor. This communication step involves data from a few processors to the left and right. The number of processors communicating data depends on the row bandwidth of the slab and the number of processors in use. Overall, the row-slab decomposition strikes a balance among (1) nearly perfect balance of data and computational load among processors, (2) minimal albeit suboptimal communication of data in the matrix·vector multiplication, and (3) scalability to larger problems on greater numbers of processors.

*This work was done by Tom Cwik, Cinzia Zuffada, and Vahraz Jamnejad of Caltech for *NASA's Jet Propulsion Laboratory.* For further information, access the Technical Support Package (TSP) **free on-line at www.techbriefs.com**under the Information Sciences category. NPO-20171*