1

I recently started coding a small library of 2D PDE solvers (time dependent and time independent), and my first attempt was a 2D Poisson equation of the form: $$\nabla(\epsilon\nabla\varphi)=\nabla\epsilon\nabla\varphi+\epsilon\nabla^2\varphi=\rho.$$ The general idea was to write a 2D Poisson solver capable of solving for $\varphi$ in arbitrary 2D geometry and with any combination of Dirichlet, Neumann, and periodic boundary conditions.

However, upon implementation and after initial testing, I noticed that when I simulate a rectangular box surrounded by 4 Neumann boundaries at the box edges, the convergence gets stuck at some high value, and it never reaches the desired level. The convergence criterion is that the square root of the sum of the squares of the residual is smaller than some small number (e.g. 1e-4) (see further down in the code example). During the stall, convergence gets stuck at value larger than 1 (the exact value depends on the number of points and the profile of the right-hand side [RHS]).

I have encountered several Stack Exchange posts discussing pure Neumann boundaries and their impact on the overall convergence progress, but none of the proposed solutions worked for me. I have tried removing the mean value of the RHS of the equation from the RHS during the residual calculation, which did not work. I have also tried to remove mean value of the main variable, which also did not resulted in any improvements.

The first version of the implementation was done in C++, based on the finite differences, where the derivatives were implemented in a central difference manner: $$ \frac{\partial\varphi}{\partial x}\to\frac{\varphi_{i+1,j}-\varphi_{i-1,j}}{2\Delta x} $$ and $$\frac{\partial^2\varphi}{\partial x^2}\to\frac{\varphi_{i+1,j}-2\varphi_{i,j}+\varphi_{i-1,j}}{\Delta x^2}.$$ The solver that I implemented is the iterative successive over-relaxation (SOR) method.

For the ease of use, during the discretization I introduced the following notation:

  1. Value $\varphi_{i,j}$ is the center, denoted as $\varphi_{c}$
  2. Values $\varphi_{i\pm 1,j}$ are right and left, denoted as $\varphi_{r,l}$, respectively
  3. Values $\varphi_{i,j\pm 1}$ are top and bottom, denoted as $\varphi_{t,b}$, respectively
  4. Coefficients multiplying $\varphi_{c,l,r,b,t}$ are denoted $C, L, R, B, T$, respectively, where in the simulation region where the calculations are to be done:$$C=2\epsilon_c(\frac{1}{\Delta x^2}+\frac{1}{\Delta y^2}),$$ $$R,L=\frac{1}{\Delta x^2}[\epsilon_c\pm 0.25(\epsilon_r-\epsilon_l)],$$ $$T,B=\frac{1}{\Delta y^2}[\epsilon_c\pm 0.25(\epsilon_t-\epsilon_b)].$$

I have also merged RHS of the equation with possible Neumann boundary conditions (if present): $mRHS_c=\rho_c+Nbc_c.$ In addition, to save some possible unnecessary multiplications, I also stored "normalized" RHS as $nRHS_c=mRHS_c/C.$

Regarding the boundary conditions, on places where Dirichlet and Neumann BCs are expected, solver will skip those points, as for Dirichlet BCs those points are already solved, and for Neumann BCs I assume a sort of "electrode" or "out of simulation space"-region is present. However, both Dirichlet and Neumann BCs do appear through $\varphi_{r,l,t,b}$. For Dirichlet this is straight-forward, but for Neumann, I had to introduce "ghost" points at the boundary of the simulation region. The Neumann BCs are given as: $$\frac{\partial\varphi(x=X_0)}{\partial x}=G_0\to\frac{\varphi_{1,j}-\varphi_{-1,j}}{2\Delta x}=G_0; \frac{\partial\varphi(x=X_1)}{\partial x}=G_1\to\frac{\varphi_{N+1,j}-\varphi_{N-1,j}}{2\Delta x}=G_1$$ $$\frac{\partial\varphi(y=Y_0)}{\partial y}=H_0\to\frac{\varphi_{i,1}-\varphi_{i,-1}}{2\Delta y}=H_0; \frac{\partial\varphi(y=Y_1)}{\partial y}=H_1\to\frac{\varphi_{i,M+1}-\varphi_{i,M-1}}{2\Delta y}=H_1$$ So, if there is Neumann boundary $G_0$ at starting point $x=X_0$, I will express $\varphi_l$ as $\varphi_r-2\Delta xG_0$, at final point $x=X_1$ $\varphi_r$ becomes $\varphi_l+2\Delta xG_1$ (and similar for $y$ direction). The terms $\pm2\Delta (x,y)(G_{0,1},H_{0,1})$ I store in the $NBC$ variable, which is then included in the $mRHS$, as exemplified here in the code snapshot: $$mRHS_c=\rho_c+2\Delta x(LG_{0c}-RG_{1c})+2\Delta y(BH_{0c}-TH_{1c})$$ Matrices $G_{0c}, G_{1c}, H_{0c}, H_{1c}$ are non-zero at locations of Neumann BCs, and zero everywhere else. Generally, the overview of the simulation region is done with another integer matrix lattice which is equal:

  1. 0 (denoted as EXT) at places external to the simulation region
  2. 1 (denoted as SIM) at places where simulation occurs
  3. 2 (denoted as PBC) at periodic boundaries
  4. -1 (denoted as DBC) at Dirichlet boundaries
  5. -2 (denoted as NBC) at Neumann boundaries.

For dummy example of 4x4 points (+ ghost points with Neumann BC at edges, Height times Width of the simulation box is 0.4x0.4 [a.u.]), lattice looks like:

-2 -2 -2 -2 -2 -2
-2  1  1  1  1 -2
-2  1  1  1  1 -2
-2  1  1  1  1 -2
-2  1  1  1  1 -2
-2 -2 -2 -2 -2 -2 

Finally, the discretized equation looks like: $$R\varphi_r+L\varphi_l+T\varphi_t+B\varphi_b-C\varphi_c=mRHS_c$$ and in SOR scheme, for iteration $k$: $$\varphi_c^{(k)}=\varphi_c^{(k-1)}+\omega_{SOR}(-\varphi_c^{(k-1)}-nRHS_c+[R\varphi_r^{(k-1)}+L\varphi_l^{(k-1)}+T\varphi_t^{(k-1)}+B\varphi_b^{(k-1)}]/C)=\varphi_c^{(k-1)}+\omega_{SOR}\frac{Residual_c^{(k-1)}}{C}$$ The parameter $\omega_{SOR}$ is the over-relaxation coefficient in range $(1,2)$. The initial guess at iteration $k=0$ is $\varphi_c=0$ (for all i,j in valid simulation region).

Typically, the convergence stall problem arises when I set $G_0=H_0=-G_1=-H_1=1$. Also, for the sake of simplicity, let $\rho_{i,j}=0$ for all i,j. Then matrix $mRHS_{i,j}$ is

0  0  0  0  0 0 
0 30 15 15 30 0 
0 15  0  0 15 0
0 15  0  0 15 0
0 30 15 15 30 0
0  0  0  0  0 0 

Typically, during the realistic test, the number of points and the box dimensions are larger, e.g. Nx=101; Ny=201, Lx=10; Ly=20;, respectively, with dx,y=Lx,y/(Nx,y-1). Here is the example of the code (for the sake of the argument, let matrix $\epsilon_{i,j}=1$ for all i,j):

void mergeNnorm_rhs(double* nRHS, double* mRHS, int Nx, int Ny, double dx, double dy, int* lattice, double* rho, double* epsilon, double* NBCl, double* NBCr, double* NBCb, double* NBCt, int ghost_region = 0)
{
    // Define local variables:
    int i, j, k;
    double h2x = 1.0 / dx / dx,
           h2y = 1.0 / dy / dy,
           hxy = 2.0 * (dx * dx + dy * dy) / dx / dx / dy / dy;
    double L, R, B, T, C;
    double eps_x, eps_y;

#pragma omp parallel for schedule(static) shared(lattice, nRHS, mRHS, rho, epsilon, NBCl, NBCr, NBCb, NBCt) private(i, j, k, L, R, B, T, C, eps_x, eps_y) firstprivate(Nx, Ny, dx, dy, h2x, h2y, hxy, ghost_region) default(none) for (i = ghost_region; i < Nx + ghost_region; i++) for (j = ghost_region; j < Ny + ghost_region; j++) { k = i * (Ny + 2 * ghost_region) + j; if (lattice[k] > EXT) { // Calculate matrix coefficients: eps_x = 0.25 * (epsilon[k + Ny] - epsilon[k - Ny]); eps_y = 0.25 * (epsilon[k + 1] - epsilon[k - 1]); // Calculate coefficients of the outter 2D stencil components: R = h2x * (epsilon[k] + eps_x); L = h2x * (epsilon[k] - eps_x); T = h2y * (epsilon[k] + eps_y); B = h2y * (epsilon[k] - eps_y); C = epsilon[k] * hxy; // Populate RHS matrices: mRHS[k] = rho[k] + 2.0 * dx * (L * NBCl[k] - R * NBCr[k]) + 2.0 * dy * (B * NBCb[k] - T * NBCt[k]); nRHS[k] = mRHS[k] / C; } } }

/*/ double residual_2D_PDE_cfd_direct(int Nx, int Ny, double dx, double dy, int lattice, double* phi, double* mRHS, double* epsilon) { // Define local variables: int i, j, k; double diff; double phi_l, phi_r, phi_b, phi_t; double L, R, B, T, C; double eps_x, eps_y; double res = 0.0; double h2x = 1.0 / dx / dx, h2y = 1.0 / dy / dy, hxy = 2.0 * (dx * dx + dy * dy) / dx / dx / dy / dy;

#pragma omp parallel for schedule(static) shared(lattice, phi, mRHS, epsilon) private(i, j, k, diff, phi_l, phi_r, phi_b, phi_t, L, R, B, T, C, eps_x, eps_y) firstprivate(Nx, Ny, h2x, h2y, hxy) reduction(+:res) default(none) for (i = 1; i < Nx - 1; i++) for (j = 1; j < Ny - 1; j++) { k = i * Ny + j; if (lattice[k] > EXT) { // Calculate outter 2D stencil components: phi_r = lattice[k + Ny] == PBC ? phi[Ny + j] : (lattice[k + Ny] == NBC ? phi[k - Ny] : phi[k + Ny]); phi_l = lattice[k - Ny] == PBC ? phi[Ny * (Nx - 2) + j] : (lattice[k - Ny] == NBC ? phi[k + Ny] : phi[k - Ny]); phi_t = lattice[k + 1] == PBC ? phi[Ny * i + 1] : (lattice[k + 1] == NBC ? phi[k - 1] : phi[k + 1]); phi_b = lattice[k - 1] == PBC ? phi[Ny * i + Ny - 2] : (lattice[k - 1] == NBC ? phi[k + 1] : phi[k - 1]); // Calculate matrix coefficients: eps_x = 0.25 * (epsilon[k + Ny] - epsilon[k - Ny]); eps_y = 0.25 * (epsilon[k + 1] - epsilon[k - 1]); // Calculate coefficients of the outter 2D stencil components: R = h2x * (epsilon[k] + eps_x); L = h2x * (epsilon[k] - eps_x); T = h2y * (epsilon[k] + eps_y); B = h2y * (epsilon[k] - eps_y); C = epsilon[k] * hxy; // Calculate current deviation between left- and right-hand sides and update residual calculations: diff = R * phi_r + L * phi_l + T * phi_t + B * phi_b - C * phi[k] - mRHS[k]; res += diff * diff; } }

// Represent residual as square root of the sum of the squares of the LHS and RHS difference:
return sqrt(res);

} /*/ void OddEven_2D_PDE_cfd_direct1D(void (solver)(int, int, double, double, int, double, int, double, double, double), int Nx, int Ny, double dx, double dy, int* lattice, double* phi, double* nRHS, double* mRHS, double* epsilon, double Rconv, int SORtype = MT0, int verbose = 0) { // Define local variables: int iter = 0; int Nmax = 100 * Nx * Ny; int Ncheck = 1; int Nstage = 2; double eps = 1e-4; double residual = 0.0; double w = 1.0;

do {
    // Calculate residual every Ncheck iterations:
    residual = residual_2D_PDE_cfd_direct(Nx, Ny, dx, dy, lattice, phi, mRHS, epsilon);

    // Do Ncheck iterations of solving the 2D elliptic PDE with designated solver:
    for (int loc_it = 0; loc_it &lt; Ncheck; loc_it++) {
        iter++;
        for (int stage = 0; stage &lt; Nstage; stage++) {
            switch (SORtype) {
            case MT0:
                w = Rconv;
                break;
            case NR1:
                w = (iter == 1 &amp;&amp; stage == 0) ? 1.0 : ((iter == 1) ? 1.0 / (1.0 - 0.5 * Rconv * Rconv) : 1.0 / (1.0 - 0.25 * Rconv * Rconv * w));
                break;
            case NR2:
                w = (iter == 1 &amp;&amp; stage == 0) ? 1.0 : ((iter == 1) ? 1.0 / (1.0 - 0.5 * Rconv * Rconv) : 1.0 / (1.0 - 0.25 * Rconv * Rconv * w));
                break;
            }
            (*solver)(Nx, Ny, dx, dy, stage, w, lattice, phi, nRHS, epsilon);
        }
    }
    //cout &lt;&lt; scientific &lt;&lt; setw(16) &lt;&lt; setprecision(12) &lt;&lt; setfill(' ') &lt;&lt; residual &lt;&lt; &quot;\n&quot;;
} while (residual &gt; eps &amp;&amp; iter &lt;= Nmax);

// Calculate residual of the final iteration:
residual = residual_2D_PDE_cfd_direct(Nx, Ny, dx, dy, lattice, phi, mRHS, epsilon);

// Report to user the status of the solution:
if (verbose) {
    if (iter &gt;= Nmax) printf(&quot;Warning: Maximum number of iterations reached! (%d)\nFinal convergence achieved:\t%.10e\n&quot;, iter, residual);
    else {
        printf(&quot;Solution converged!\n\tTotal number of iterations:\t%d\n\tConvergence achieved:\t%.10e\n&quot;, iter, residual);
    }
}

}

/*/ void SORoe_solver_cfd_direct1D(int Nx, int Ny, double dx, double dy, int stage, double w, int lattice, double* phi, double* nRHS, double* epsilon) { // Define local variables: int i, j, k; double gx = dx * dx * 0.5 / (dx * dx + dy * dy), gy = dy * dy * 0.5 / (dx * dx + dy * dy); double r, l, t, b; double phi_l, phi_r, phi_b, phi_t; double eps_x, eps_y;

#pragma omp parallel for schedule(static) shared(lattice, phi, nRHS, epsilon) private(i, j, k, phi_l, phi_r, phi_b, phi_t, l, r, b, t, eps_x, eps_y) firstprivate(Nx, Ny, gx, gy, stage, w) default(none) for (i = 1; i < Nx - 1; i++) for (j = 1 + ((i + stage) % 2); j < Ny - 1; j += 2) { k = i * Ny + j; if (lattice[k] > EXT) { // Calculate outter 2D stencil components: phi_r = lattice[k + Ny] == PBC ? phi[Ny + j] : (lattice[k + Ny] == NBC ? phi[k - Ny] : phi[k + Ny]); phi_l = lattice[k - Ny] == PBC ? phi[Ny * (Nx - 2) + j] : (lattice[k - Ny] == NBC ? phi[k + Ny] : phi[k - Ny]); phi_t = lattice[k + 1] == PBC ? phi[Ny * i + 1] : (lattice[k + 1] == NBC ? phi[k - 1] : phi[k + 1]); phi_b = lattice[k - 1] == PBC ? phi[Ny * i + Ny - 2] : (lattice[k - 1] == NBC ? phi[k + 1] : phi[k - 1]); // Calculate epsilon corrections eps_x = 0.25 * (epsilon[k + Ny] - epsilon[k - Ny]) / epsilon[k]; eps_y = 0.25 * (epsilon[k + 1] - epsilon[k - 1]) / epsilon[k]; // Calculate normalized coefficients of the outter 2D stencil components: r = gy * (1.0 + eps_x); l = gy * (1.0 - eps_x); t = gx * (1.0 + eps_y); b = gx * (1.0 - eps_y); // Update SOR iteration: phi[k] += w * (r * phi_r + l * phi_l + t * phi_t + b * phi_b - phi[k] - nRHS[k]); } } }

If you can please provide me with any kind of insight or advice on what should I do to resolve the possible convergence issue, please let me know. My guess is that SOR should be able to tackle this kind of pure Neumann problems, but I am stuck for some time with this, and I do not know how to resolve this problem. Many thanks in advance~

Akhaim
  • 83
  • 10

1 Answers1

4

For this equation, with the Neumann boundary conditions the problem becomes mathematically ill-posed; if there is a solution $\phi(x,y)$ then $\phi(x,y)+const$ is also a solution. This ill-posedness manifests itself in your solver's inability to converge.

Maxim Umansky
  • 2,525
  • 1
  • 12
  • 15
  • While this is correct generally speaking, from a practical point of view a different set of Neumann boundary conditions,, e.g. G0,1=H0,1=1, converges. So, it seems that the way the Neumann boundaries are defined (or possibly their mean value) impacts the solution. After all, there is a bunch of commercial solvers which can resolve pure Neumann boundaries, and I would like to understand if there is a generic way of dealing with them. – Akhaim Sep 23 '22 at 06:53
  • The general way to deal with the non-convergence is to add some extra constraint. For example, you could fix a single node in a corner to a Dirichlet value. Another option is to require the integral of $\phi$ in your domain to be a certain value (0 is a popular choice). – helloworld922 Sep 23 '22 at 11:07
  • @helloworld922, the second option you proposed, I have already mentioned in my question to not yield any results. The first option, though not mentioned explicitly in my question, is in the linked Stack Exchange post. That option was tested and it did not work, too, as I have also stated in my original post. It is for this reason I have went to great lengths to provide so many implementation details from my code and my discretization scheme. So please, if you can provide a bit more insightful comment, I would appreciate it greatly. – Akhaim Sep 23 '22 at 12:20
  • For the second solution did you remove one of the BC equations in your system and replace it with the integration condition? I don't think you can just perform an iteration of SOR on the original system then remove the mean value. – helloworld922 Sep 23 '22 at 12:27
  • A similar thing applies to fixing a single node to a Dirichlet value; you need to modify the A matrix by replacing one of the Neumann BCs with a Dirichlet BC. – helloworld922 Sep 23 '22 at 12:30
  • @helloworld922 do you mean, for instance, to replace boundary at e.g. entire left side with a value that is equal to a mean of the remaining boundaries (and possibly with inverse sign)? If so, won't that mess with the expected boundaries? My implementation was originally to modify the RHS prior to the input to the solver as RHS-mean(RHS), but that did messed up with the boundaries. – Akhaim Sep 23 '22 at 12:33
  • I mean you need to pick one equation (say the equation for the BC for the ghost node -1,0), and replace it with a row of all 1's and the RHS is 0. This is equivalent to applying a BC $\int \phi = 0$ – helloworld922 Sep 23 '22 at 12:41
  • 1
    @helloworld922 for the Dirichlet's BC (DBC), what I did was to select a single corner point and force DBC to be zero there, while preserving all Neumann boundaries, as they were. The resulting solution was then dominated by the DBC, and it's profile was overall incorrect. If neccessary, I could document these findings in a separate answer to this question – Akhaim Sep 23 '22 at 12:44
  • See my related answer: https://mathematica.stackexchange.com/a/271682/87543 – ConvexHull Sep 23 '22 at 18:41
  • @ConvexHull, the proposed ideas from your answer, are they general for any kind of solver that deal with problems of type Ax=b, or are the specific solver families the only ones benefiting from these approaches? My impression is that adding small positive constant to the diagonal will resolve the issue when one needs to find an inverse of a matrix. In SOR, method I use (which is mentioned in my question, if you read it) there is no matrix inversion, at least explicit. In fact, after trying this modification, I obtain all NaNs as a solution, which means it is not in fact a generic fix. – Akhaim Sep 24 '22 at 13:45
  • Yes, the proposed methods are quite general and applicable to other kinds of PDE's. Don't get me wrong, however, I think you have a wrong understanding what SOR does. The SOR algorithm is nothing more than a iterative matrix inversion. To be more correct, the SOR solves your system in an iterative sense in order to get $x$. – ConvexHull Sep 24 '22 at 13:58
  • @ConvexHull, you are correct about SOR being iterative matrix inversion method, and I have possibly expressed myself wrong. My thoughts were along the lines that perhaps due to the way how SOR is implemented that your first proposed solution (from your original answer) needs to be added in some-kind of a "SOR-friendly" way. Unfortunately, I still do not quite understand what would be the proper way of implementing such solution (Tikhonov regularization) in my code. I've tried directly, just to add some small epsilon (1e-6) in the term C (in my notation), but the solution just diverged to NaN – Akhaim Sep 24 '22 at 14:31
  • Generally this should work for pure Neumann BCs. It is simply a workaround to make your matrix invertible. Try to vary the size. If this does not work you may have another issue with your code. – ConvexHull Sep 24 '22 at 15:09
  • @ConvexHull, OK, I managed to fix NaN issue, as I didn't set the C parameter everywhere in the same way when I added epsilon. However, with this correction convergence stall was back on despite epsilon presence. Is there a closed-form way to estimate epsilon order of magnitude for given discretization parameters (unit cell sizes dx and dy), or do we just guess it as a small positive value? Currently, C=2(1/dx/dx+1/dy/dy+epsilon). If epsilon is 1e-6 (I went as high as 1e-3), and dx=dy=0.1, they completely overshadow epsilon, as C becomes 400.001 (epsilon = 1e-3) or 400.000001 (epsilon 1e-6) – Akhaim Sep 24 '22 at 15:22