3

I have a 3D Laplace problem on quite a complicated geometry where I am using Discontinuous Galerkin method. My mesh is composed by hexas, hence I am employing classical tensor product basis functions $\mathcal{Q}^p$.

The mesh is not very fine, it is made by $1600$ elements. For every polynomial degree I am solving the same equation $- \Delta u =f$ with the symmetric interior penalty method, in order to perform a convergence test w.r.t the polynomial degree.

Here are the numbers of DoFs per polynomial degree:

  • $\mathcal{Q}^1: 12 800$
  • $\mathcal{Q}^2: 43 200$
  • $\mathcal{Q}^3: 102 400$
  • $\mathcal{Q}^4: 200 000 $

As solver I am using conjugate gradient, preconditioned by AMG implementation by Trilinos (with the option high order elements set to true). The absolute tolerance set inside the CG solver is $1e-11$.

Everything goes smoothly in terms of convergence rates until degree $3$ (even if with a lot of CG iterations, see next list), but with $\mathcal{Q}^4$ elements the solver takes simply forever even when I am running in parallel.

Iteration counts:

  • $\mathcal{Q}^1: 123$
  • $\mathcal{Q}^2: 534$
  • $\mathcal{Q}^3: 10869$

I am aware this is a typical problem with high order DG, but the size of the problem is not too large in my opinion, so I wanted to ask if the community is aware of some good alternative matrix-based solver (matrix-free is currently not an option) for such a middle-sized problem.

FEGirl
  • 313
  • 3
  • 9
  • Does your system matrix change during simulation? If not, are you able to store the sparse representation? – ConvexHull Mar 15 '24 at 16:27
  • I am not sure I follow you @ConvexHull What do you mean with "during simulation"? It's a Laplace problem which I solve with different polynomial degrees. There's no time dependency. – FEGirl Mar 15 '24 at 16:40
  • My intention was to ask you if something prevents you to precompute and store the LU-decomposition. – ConvexHull Mar 15 '24 at 17:31
  • I guess nothing stops me from doing that, but let me make sure: are you referring to ILU preconditioning? @ConvexHull – FEGirl Mar 15 '24 at 19:48
  • No I mean a direct solver. That's the reason for my first question. Do you have to solve the Laplacian ones or for many different RHSs? – ConvexHull Mar 15 '24 at 20:21
  • For every polynomial degree $p$, I am solving $- \Delta u = f$ with symmetric interior penalty with polynomial degree $p$ and a given manufactured $f$. They are all independent problems. I updated the question. I don't think a sparse direct solver could work for larger degrees such as $p=4$. @ConvexHull – FEGirl Mar 15 '24 at 20:44
  • Sparse direct solver would be fine for 200k-by-200k matrices given you have enough RAM. Otherwise I would suggest algebraic multigrid. PETSc, trilinos and hypre have good implementation. – Abdullah Ali Sivas Mar 16 '24 at 18:25
  • @AbdullahAliSivas I have a couple of question about AMG (I am using TrilinosML implementation of AMG)... Are you using it to precondition CG? Isn't AMG bad for high order elements? – FEGirl Mar 16 '24 at 20:16
  • @AbdullahAliSivas Sure does it work and is most often the right way to go, especially for such a simple PDE. The challenge here is neither the high (and different) polynomial degree, neither the DG method nor a complicated geometry. OP seems to be quite resistent to advise. Let's see if something better pops up. – ConvexHull Mar 17 '24 at 08:38
  • @ConvexHull I didn't mean to be resistent to advise, I'm sorry. To be honest, CG + AMG was my first choice and with $p=4$ it was taking forever with (10 MPI ranks), that's why I asked this question. Could you pleas elaborate a bit more on your approach? How would you use the LU decomposition of $A$? – FEGirl Mar 17 '24 at 14:09
  • @FEGirl Maybe this will help: https://scicomp.stackexchange.com/a/42712/36512 – ConvexHull Mar 17 '24 at 14:23
  • Ah, I see now. That's also what the sparse direct solver by trillions that I am using right now is doing (Amesos_Klu). Indeed, it works for $p=4$. – FEGirl Mar 17 '24 at 18:10
  • However, when the size of $A$ is 345.000 (corresponding to $p=5$), it appears it's quite slow... would you move to CG+AMG in this case? @ConvexHull – FEGirl Mar 17 '24 at 18:11
  • KLU(2) is, in my opinion, a single threaded reference implementation, it is a decent linear solver but not a good one. Maybe Basker is better but I haven't tried it myself. I would try using MKL Pardiso, SuperLU or MUMPS. If they are also slow (which I don't think they will be), you can try CG+AMG – Abdullah Ali Sivas Mar 17 '24 at 22:04
  • @FEGirl I would recommend the approach of Abdullah – ConvexHull Mar 18 '24 at 17:19
  • @ConvexHull Direct solvers in 3d for high-order polynomials are going to run out of memory real quick. They are not a good choice. – Wolfgang Bangerth Mar 19 '24 at 23:09
  • @WolfgangBangerth I appreciate your reply. I think it is common knowledge that there are limits for direct solvers in 3D. I would not recommend the usage for bigger problems. When a mesh hierarchy is available, geometric multigrid is a natural choice, especially for higher polynomial degrees. However, multgrid for small and medium problems such as $\mathcal{Q}_{1,2,3,4}$ seems quite off for me. – ConvexHull Mar 20 '24 at 04:22

1 Answers1

1

The canonical way to solve high-order problems efficiently is to use a multigrid method in which the first coarsening steps reduce the polynomial degree by one in each step, followed by (geometric) coarsening of the mesh. In many cases, if you don't have a mesh hierarchy, you can stop the hierarchy when you are at a $Q^1$ element on your mesh and simply apply CG+AMG at that level, for which you have shown that it converges efficiently. Alternatively, you can insert one more level to the hierarchy here where you go from the discontinuous $Q^1$ element to using continuous $Q^1$.

I will say that I think that 123 CG iterations for a $Q^1$ element on just 16,000 cells seems too much to me. I would have expected something in the range of perhaps 20 instead, at least if you used continuous elements. Perhaps the discontinuous ones are more complicated to solve, but I don't have that much experience with DG discretizations.

Wolfgang Bangerth
  • 55,373
  • 59
  • 119