The formula you show is an estimate of how large a sample you need to have a chance of $(1-\alpha)$ to find the sample mean within "error" of the true mean. That depends, as you say, on knowing the actual standard deviation $\sigma$ in the population (or the variance $\sigma^2$).
You can use the sample standard deviation as an estimate of the population standard deviation similarly to how you use the sample mean to estimate the population mean. There are two potential problems, however.
First, the sample standard deviation is a biased estimate of the population standard deviation. It is the square root of an unbiased estimate of the population variance.
Second, the sample standard deviation itself has a wide distribution. For a normal distribution, the variance of the sample variance is:
$$ \frac{2\sigma^4}{n-1},$$
where $\sigma^2$ is the population variance and $n$ is the sample size. There isn't any break at a particular value of $n$; the variance of the sample variance just decreases inversely with $(n-1)$.
To see the relative advantages of the 2 proposed methods for estimating $\sigma$, you can try a simulation. Here's an example in R. I made a population of 1000 normally distributed values having mean of 2 and standard deviation of 1.
set.seed(1234)
normData <- rnorm(1000,mean=2,sd=1)
sd(normData)
# [1] 0.9973377
I then took 1000 samples of size $n=10$ from that population and collected the distribution of the SD estimates based on the 2 methods, sample SD and range/4.
sdDist10 <-NULL
rangeDist10 <- NULL
for(i in 1:1000) {sam <- sample(normData,10,replace=TRUE);
sdDist10<-c(sdDist10,sd(sam));
rangeDist10 <- c(rangeDist10,(max(sam)-min(sam))/4)}
I plotted the distributions of the SD estimates obtained those ways:
plot(density(rangeDist10), col="red", xlab="SD estimate", main="For n = 10", [1]bty="n")
lines(density(sdDist10))
legend("topleft",legend="red, from range\nblack, from SD",bty="n")

As expected with a small sample, the estimates based on the sample SD had a wide distribution but were fairly symmetric about the true value. The estimates based on the range (in red) tended to be low and with a similarly wide distribution. I don't see that using the second option, based on range, is an improvement at all.
Play with this approach for different values of sample size $n$. If you repeat the above for 1000 samples of size $n = 100$, the sample SD values have a much narrower distribution around the peak near SD = 1 while the range-based estimates tend to over-estimate the true SD.
A take-home point is that you can need a surprisingly large sample size just to get a reliable estimate of the variance, an estimate you need to perform power calculations for study design. You can also use this approach to see how frequently your sample mean values at different sample sizes are within a specified "error" of the actual mean.
It's good to learn how to do quick simulations like this. In complex designs, simulations are often the best way to estimate the sample size needed for a study.