1

I had the following question relating to "Random Search Methods" in Optimization and Machine Learning - in short, are there theoretical results which show the obvious idea: Why are "Random Search Methods" slow and inefficient, and are not favored for estimating the parameters of statistical models compared to algorithms like Gradient Descent?

Suppose you have a function "f" : f(x, y, z). You are trying to optimize this function "f".

To illustrate this question, consider the following: For argument sake - let's say that we want to use "random search" to randomly query "f" at different points.

A) My (naive understanding) of "random search" is follows : we randomly query "f" at f(x=a, y = b, z = c) and then we record the value of "f". We repeat this process 1000 times and record the combination of "x,y,z" that results in the smallest value of "f".

B) However, it seems that there is an alternate way to do this: https://en.wikipedia.org/wiki/N-sphere#Generating_random_points . If I understand this correctly, Marsaglia showed an alternate way to generate random points. If the function "f" is in 3 dimensions, generate 3 random numbers between 0 and 1. Take the square root of the sum squares of these 3 numbers: call this "r". Then multiply a vector of these 3 random numbers by "r". This vector is the first point you will evaluate the function "f" at - now repeat this 1000 times, and choose the combination which results in the smallest value of "f". Note: apparently this method scales poorly when "f" is in many dimensions.

Question 1: I am a bit confused - when we talk about "random search", are we referring to to "A)" or are we referring to "B)"?

Question 2: I was always confused whether there existed any theoretical results about the "convergence of random search methods". Are there any mathematical theorems that state the obvious about "random search" - that if your dataset has too many rows and too many columns, statistically speaking, random search will become highly ineffective at optimizing a loss function?

Is there some math equations that statistically links the number of iterations required for a given number of rows/columns, in order to produce a certain error bound on a function (e.g. loss function) of a certain complexity? Are there some results that suggest "random search" will converge after a given number of iterations?

Thus, how can we answer the obvious question: Why do modern statistical models not use "Random Search Methods" for optimizing their loss functions and parameter estimation?

Thanks!

References

stats_noob
  • 1
  • 3
  • 32
  • 105
  • 2
  • (i) You didn't specify how a,b and c were chosen in (A) so there's no basis to compare with (B). (ii) You did not correctly reproduce how (B) works; please spend a little more time and care with your reading/thinking before spamming a large number of questions. By doing so you'll save asking trivial questions you could easily resolve on your own (you know enough to resolve some of these, with only a little thought/search/research; the latter two are usually required, per the help). (iii) Why would you regard generating points on an N-sphere as being a general algorithm for optimization ...
  • – Glen_b Jan 13 '22 at 05:00
  • 1
    ... unless you wanted to optimize the function only on that sphere? ... 2. Why do you assume that random search is not used in statistics? (Typically it isn't because statistical problems generally have exploitable structure, but "rarely used" is not identical to "not used".) ... 3. please quote relevant parts of your references with enough context so we understand where your premises are coming from. – Glen_b Jan 13 '22 at 05:00