First off, there are basically no scenarios where one would ever actually compute and store $A^{-1}$ in memory, even for small problems. An LU factorization offers both superior efficiency and stability. There are a few important reasons why iterative methods can be superior to factorization methods on some important problems.
One you have already mentioned is that for problems involving discretizations of PDEs, $A$ is typically very sparse. This means that the memory requirements for storing the LU factors of $A$ can be prohibitive compared to storing $A$, as these are typically dense. However, even if you have enough memory, it is often still more efficient to use iterative methods. The complexity of computing an LU factorization is $\mathcal{O}(n^3)$, which becomes intractible very quickly. Your problem has $n\approx 10^9$, so I feel confident in saying that computing and storing an LU factorization would be prohibitive. To my knowledge, there are no direct methods that outperform iterative methods on problems of this size. Optimized LU factorization algorithms will outperform iterative methods until about $n\approx 10^4$, and there are sparse direct methods that can push that a couple more orders of magnitude higher, but your problem is at the point where iterative methods are really the only feasible choice in my opinion.
On the other hand, the most performant iterative solvers used in practice (Krylov subspace methods) can compute $A^{-1}b$ to desired precision with $\mathcal{O}(n)$ complexity with proper algorithm selection and preconditioning. This is best-case behavior, where $A$ is sparse enough (or optimized enough, e.g., Toeplitz or circulant matrices) that a matvec costs $\mathcal{O}(n)$ operations and the preconditioner is good enough that the number of iterations required for convergence is roughly independent of $n$. This can be achieved for some elliptic problems and proper multigrid preconditioners. However, we are not always so lucky in practice. If you have poor preconditioning, then the number of Krylov iterations can scale like $\mathcal{O}(n)$ (with $n$ iterations being worst-case but guaranteed convergence), which makes the entire algorithm $\mathcal{O}(n^2)$. Where you end up between $\mathcal{O}(n)$ and $\mathcal{O}(n^2)$ is almost all due to preconditioning, regardless of $b$ (ignoring trivial cases where $b$ happens to be extremely close to an eigenvector of $A$.)
Even though iterative methods may be the best choice for computing $A^{-1}b$ once, you are correct that the cost of the Krylov iterations may become a burden if you have multiple $b$'s. Here is a recent post discussing a similar issue of solving $Ax=b$ for large systems and many different $b$'s. There are many helpful suggestions in the comments there for ways to speed up subsequent iterative solves using solutions of previous solves (Krylov subspace recycling, deflation spaces, starting iterations at the previous solution). There are also methods known as block Krylov methods that are built with multiple RHS's and are very efficient (see the papers linked here), but I think the best piece of advice for a problem of this size is to invest the time in finding the best preconditioner you can.