Simplified tridiangonal solve algorithm; timing extensions
First and most trivially, this change adds a few more timing push/pop invocations relating to multigrid and especially to some of the MPI synchronization steps that might cause processors to wait on each other. More substantively, this change also adds a new tridiangonal solver based on the bog-standard forwards/backwards elimination Thompson algorithm (see Wikipedia). This should be acceptable because the 1D problems being solved themselves come from a 2D problem, so we don't expect ill-conditioning; calling out to the GMRES banded solver was surprisingly a computational bottleneck perhaps because of pivoting. This change seems to decrease the line-solve time by about 80%, which in turn decreases the overall runtime (tank_rho test case) by 40%.
Showing with 107 additions and 3 deletions