Add caching for array-rebalance in multigrid
The multigrid solver regularly "rebalances" arrays, by which it takes an input array and moves complete rows (i fixed, j variable) such that the same global arrays divides differently. This is used on initial calling (to adjust the input problem specification), on coarsening, and finally to collapse "down" processors when near the coarsest level. However, this code was written naïvely: it built (and divided) the global array split at each invocation, using a number of MPI_Allgather calls on each processor. Profiling (courtesy Kris Rowe) has found that this can be agonizingly slow; in at least one test case on one particular system it consumed over 50% of wallclock-time. This patch fixes the problem by allowing rebalance_array (now moved into the multigrid solver class) to cache these array divisions. It requires the caller to specify one of a few categories, with the division being computed (with Allgathers) only on the first invocation per multigrid level and type. The divisions currently are: FFine, UFine, FCoarse, and UCoarse where 'F' and 'U' refer to the error and solution terms respectively. Confusingly, coefficients of the Laplacian itself (such as the uxx term) belong to the 'error' space in terms of call structure. This caching eliminates all but setup calls to Allgather.