Just because a model is highly accurate, it doesn't guarantee a high level of confidence. Take for example the predicted class probabilities for the e.g. $i$th object in a 3-class problem from the softmax function, such as $\hat{\pi}_i=\{0.3,0.6,0.1\}$ with high misclassification rates -- low confidence. We know the predicted class membership is in class 2; however, look at how imprecise the prediction is. Now consider the target prediction $\hat{\pi}_i=\{0.01,0.98,0.01\}$ with low misclassification rates -- high confidence. Class prediction accuracy won't increase for either case, since the final class prediction is based on the maximum probability, not the value of probability. However, the paper you cited looks at the actual log(probabilities) of the prediction, so everything can fall apart for accuracy with certain classifiers. Obviously, some classifiers don't use probabilities and instead use closest distance to a cluster like RBF networks, kernel regression, or support vector machines. Random forests uses the majority class label of test objects that end up in a daughter (tree) node for the assigned class label - very different from distance and probability.
Regarding confidence limits, the 90% confidence intervals implies that accuracy value lies within the lower and upper bound 90% of the time. As you ramp up the limits, i.e. 95% confidence limits, the misclassification rates have to decrease to where you're preferable dealing with target predictions like $\hat{\pi}_i=\{0,1,0\}$. So for increasing confidence, I would suspect you would find fewer predictions, making the classifier less "confident." (apologies for not mentioning "overconfidence" - to me, how can a classifier be overconfident? I use ROC-AUC to compare classifiers, which bundles in sensitivity and specificity).
Last, I don't think you'd be able to apply this approach to a wide array of classifiers, since many don't deal with prediction probabilities. Initially, when I looked at the first paper you cited it looked a little like a boosting approach, which is a technique used for making a "weak classifier" better. However, the approach directly exploits misclassification rates and imprecision, which causes confidence to drop independent of accuracy.