You need not only the expected difference in means but also the associated variances among measurements to estimate sample sizes. Say that you have transformed the time axis so that the average time per "control device" test is 1 unit. Then you want to document that the average time per "test device" test is no greater than 0.8 units on that same scale.
As this isn't a paired design, the sample sizes needed will depend on the variability of test times within each device type. If the variances of test times are similar between the two devices, what matters is the ratio of that difference of 0.2 transformed-time units to the standard deviation $\sigma$ of test times (in those time units) within a type of device. That ratio is the hypothesized "effect size" that you want to be able to detect, and is often represented as d.
That variability thus might be determined by the characteristics of whatever it is you're testing even more than the variability introduced by the test systems themselves. That's the variability due to the "other factors" that "blur" the results. You presumably have some information on that variability of test times among the test objects, based on your understanding of the subject matter.
This PubMed page has a link to a free copy of "Practical guide to sample size calculations: superiority trials" by L. Flight and S. A. Julious, Pharmaceutical Statistics, 15 (1). pp. 75-79.* You choose the risk trade-offs you are willing to take, in terms of the balance between false-positive (Type I) and false-negative (Type II) errors from the study. That combination of variance, effect size, and risk trade-offs determines the sample size.
If the underlying transformed-time values are normally distributed with the same variance $\sigma^2$, Equation 3 of that paper provides a simple value for the size of each group if there is 1:1 allocation, based on Type I error of 0.05 and Type II error of 0.1 (i.e., 90% power). Multiplied by 2 to get the total sample size, you find:
$$ N = \frac{42 \sigma^2}{d^2}, $$
where d and $\sigma^2$ are as above. The paper provides much detail for other scenarios, but that is probably a good guide to initial design. For your desired difference of 0.2 units in transformed-time units, that's equivalent to:
$$N=210 \sigma^4, $$
with $\sigma$ in transformed-time units that bring the mean "control device" test time to a value of 1.
The problem, as pointed out by Michael Chernick, is:
In practice you never know the true variance. What you do is guess at it. This can be done by looking at how the answer changes as the population variance would vary over a range of plausible values. You might find the plausible values through literature on similar studies or through your own pilot study or you can do a two stage adaptive design where the initial stage is used primarily to determine if you need additional data and if so how much.
If you have the resources to do a pilot study just on the "test device" or already have enough data on it to estimate the variance of measurements made by it you could use that as the basis for that guess. For a start you could assume the same variance for the "control device". Perhaps there is information in the literature about the variance with the established "control device."
As Chernick said, this type of problem is sometimes handled by an adaptive design. This answer provides links to related resources. In particular, if you use the first stage of the adaptive design only to estimate variances better, without evaluating treatment differences, there is little risk of inflating the false-positive rate by adjusting the total sample size for the second stage.
Finally, consider whether you really want to do a superiority test. If your "test device" is simpler or cheaper than the "control device," a non-inferiority test could support its preferred use with a much smaller sample size. This web page illustrates how equivalence, superiority, and non-inferiority tests require different sample sizes.
*The page also contains links to similar articles on equivalence and non-inferiority tests.
However, have no idea what the variance would be - to give some estimates of this would be completely guessing. Therefore, I'm worrying that a sample size calculation would be based on so much guessing, that it might be just as plausible to say, like "we run this study for X months".
Good point - but we KNOW that the test will be quicker (simple logic, since it do less steps) - question is only how much quicker.
– user12541161 Jun 05 '23 at 08:54Regarding your second comment: you are completely right. We don't know if the test device is as good as the control device. And this is of course also a part of the study. However, our main objective is the time difference, since this is the biggest concern at the moment.
– user12541161 Jun 05 '23 at 10:32