Why is the lower bound of the confidence interval of a model's error relatively constant compared to the upper bound?

Question

I am interested in studying the effect of increasing data samples for a regression model on train error and test error. For this I have used 95% confidence intervals for different values of a sample data. I found something that i couldn't understand and couldn't find an explanation by looking up in the internet : The lower bound of the confidence interval of the test error stays constant by increasing the number of samples.

The x-axis is the number of samples and the y-axis is the error

The blue line is the lower bound of the confidence interval and the green one is the higher bound

Define what you mean by "error". What values do the y-axis tick marks represent? Is the blue line at error = zero? — user20637, Nov 27 '21 at 19:00
@Henry. OP says "The blue line is the lower bound of the confidence interval and the green one is the higher bound". I deduce that they both correspond to "test error". — user20637, Nov 27 '21 at 19:01
@user20637: that phrase was edited in after my comment. It answers my question, but raises new ones — Henry, Nov 27 '21 at 19:06

Galen · Answer 1 · 2021-12-01T00:53:54.190

I'm squinting at the plot a little bit, but it appears that the lower bound is relatively constant compared to the upper bound, but maybe it isn't exactly constant. Here are a couple of separate options to check this further:

Try using a log scale on the vertical axis.
- matplotlib.pyplot.yscale('log')
Try plotting in different panels with shared x-axis.

fig, axes = plt.subplots(2,1, sharex=True)
axes[0].plot(x, upper_bound_y)
axes[1].plot(x, lower_bound_y)

The above are leads for figuring out whether the lower bound is truly constant, or just relatively constant compared to the changes happening in the upper bound.

I imagine that you might like to know why the lower bound is at least relatively constant. Unfortunately I do not know precisely. I suspect it has to do with

the model already having been optimized to minimize error that leads to this asymmetry
combined with some bound on the predictive accuracy with the given model and data.

Why is the lower bound of the confidence interval of a model's error relatively constant compared to the upper bound?

1 Answers1

Linked