I am fitting a function to data. I want to tell whether the fit is good or not. Consider this example (which is actually my data):
Despite the definition of 'fit is good' being totally ambiguous, most humans will agree in that the fit in the plot is reasonable. On the other hand, the 'bad fit example' shows a case in which most humans will agree in that this fit is not good. As a human, I am capable of performing such 'statistical eye test' to tell whether the fit is good looking at the plot.
Now I want to automate this process, because I have tons of data sets and fits and simply cannot look at each of them individually. I am using a chi squared test, but it seems to be very much sensitive and is always rejecting all the fits, no matter what significance I choose, even though the fits are 'not that bad'. For example a chi square test with a significance of 1e-10 rejected the fit from the plot above, which is not what I want as it looks 'reasonably good' to me.
So my specific question is: What kind of test or procedure is usually done to filter between 'decent fits' and 'bad fits'?
This question is a follow up of this other question.
