In the context of Surrogate Modelling and Bayesian Optimization - Acquisition Functions (https://tune.tidymodels.org/articles/acquisition_functions.html) are often used as a "compass" to suggest which point to evaluate the objective function at next.
In short, the objective function that you are trying to optimize is modelled using some (simpler) surrogate model (e.g. the surface of the objective function can be modelled using a Gaussian Process), and a separate Acquisition Function "suggests" how to navigate the surface of the Gaussian Process at each iteration. This "feedback loop" is repeated iteratively until some condition is met (e.g. convergence is met or some budget constraints are exhausted). Note: In reality, the Acquisition Function itself must be optimized at each iteration using some gradient based algorithm (e.g. BFGS), but it is said that this sub-optimization problem is not as difficult as the main optimization problem.
I am trying to understand why an Acquisition Function is necessary in this above procedure. I have heard the following argument being made for why an Acquisition Function is required : In many applications where we want to use Bayesian Optimization, we only have realizations from some partially observable objective function. This means that the Gaussian Process is not very "informative" and trying to optimize the Gaussian Process directly will also not be very informative. This is why an Acquisition Function should be used - and somehow, the use of an Acquisition Function helps circumvent this "un-informativeness problem".
However, I do not fully understand this argument. If the Gaussian Process itself is uninformative, how can the use of an Acquisition Function remedy this problem?
My Question: Why is an Acquisition Function required in Bayesian Optimization - after all, why can't we directly optimize the Gaussian Process without taking into consideration the advice from the Acquisition Function?
Thanks!