Implementing the 0.632+ bootstrap method using the Weka Java API

Question

I am trying to implement the 0.632+ bootstrap estimator (as proposed by Efron and Tibshirani 1997) in order to perform certain benchmarks and compare it with other cross-validation methods, such as the k-fold CV.

Currently, all of my machine learning code is written in Java and all (sub)sampling, stratification, machine learning, and cross-validation is performed using the WEKA Java API. Therefore I would also like to implement this estimator using the same API.

I have performed an extensive search and haven't found any useful algorithm or pseudocode regarding the 0.632+ estimator. Unfortunately, I am unable to decypher the original paper and rewrite the mathematical formulas into software code.

Could please someone explain the basic steps that are needed in order to perform the 0.632+ estimator?

Currently, the only thing that I have implemented is the basic (0.632??) bootstrap method, in which I sample n instances (with replacement) from a dataset containing n instances. Then, in average, I get the train set, containing 63.2% (unique) instances from the dataset. And the missed-out instances are assigned to the test set.

Do I get it right until this part?

Then, as far as I can understand, I have to measure the training error. This is where I get lost. I have a Computer Science background (if this even matters).

score 2 · Accepted Answer · answered Mar 23 '15 at 11:54

I think I have managed to rewrite the Bootstrap 0.632+ from R to Java using the Weka Java API. The original R function can be found in the bootpred method inside the bootstrap package (link).

As you can see from the source code, I have used the corrected $\hat{R'}$ and $\hat{Err}^{(1)'}$ with the final (corrected) equation (32) from the original article.

However, despite correcting for abnormal values, I sometimes still get negative error rate, which is of course impossible and therefore invalid. Also, I have noticed that the difference between the 0.632 and the 0.632+ method is minimal, if any.

If someone finds any errors in my source code, I would be really grateful if you could point them out.

public class Bootstrap632plus extends AbstractPerformance {

    private final int repeats;
    private double Err632;
    private double resub;

    public Bootstrap632plus(Instances instances, int repeats) {
        super(instances);
        this.repeats = repeats;
    }

    public double getErr632() {
        return Err632;
    }

    public double getResub() {
        return resub;
    }

    @Override
    public double getErrorRate(final MachineLearningAlgorithm machineLearningAlgorithm, final Random seed) throws Exception {

        // First component
        double err = predictionError(machineLearningAlgorithm);
        this.resub = err;

        // Error rates
        List<Double> errorRates = Collections.synchronizedList(new ArrayList<>());

        // GAMA related stuff
        final int numClasses = instances.numClasses();
        AtomicIntegerArray p_l = new AtomicIntegerArray(numClasses);
        AtomicIntegerArray q_l = new AtomicIntegerArray(numClasses);

        // Bootstrap iterations
        seed.ints(repeats).parallel().forEach(randomSeed -> {

            // Get error rate
            Evaluation evaluation = bootstrapIteration(machineLearningAlgorithm, randomSeed);
            errorRates.add(evaluation.errorRate());

            /*
             GAMA VARIABLE

                Confusion matrix:
                 - first dimension (rows): real distribution for first class
                 - second dimension (columns): predicted distribution for first class

                p_l = observed proportions of responses where y_i equals l
                    - sum by l-th row (first dimension)

                q_l = observer proportions of predicted responses where y_i equals l
                    - sum by l-th column (second dimension)

                GAMA = SUM_by_l(p_l * (1 - q_l))

             */
            double[][] confusionMatrix = evaluation.confusionMatrix();

            for(int l = 0; l < numClasses; l++) {

                int p_tmp = 0, q_tmp = 0;

                for(int n = 0; n < numClasses; n++) {

                    // Sum for l-th class
                    p_tmp += confusionMatrix[l][n];
                    q_tmp += confusionMatrix[n][l];

                }

                // Add data for l-th class
                p_l.addAndGet(l, p_tmp);
                q_l.addAndGet(l, q_tmp);

            }

        });

        // Second component
        double Err1 = errorRates.stream().mapToDouble(i -> i).average().orElse(0);

        // Plain 0.632 bootstrap
        Err632 = .368*err + .632*Err1;

        // GAMA
        final double observations = instances.size() * repeats;
        double gama = 0;
        for(int l = 0; l < numClasses; l++) {

            // Normalize numbers -> divide by number of all observations (repeats * dataset size)
            gama += ((double)p_l.get(l) / observations) * (1 - ((double)q_l.get(l) / observations));

        }

        // Relative overfitting rate (R)
        double R = (Err1 - err) / (gama - err);

        // Modified variables (according to original journal article)
        double Err1_ = Double.min(Err1, gama);
        double R_ = R;

        // R can fall out of [0, 1] -> set it to 0
        if(!(Err1 > err && gama > err)) {
            R_ = 0;
        }

        // The 0.632+ bootstrap (as used in original article)
        double Err632plus = Err632 + (Err1_ - err) * (.368 * .632 * R_) / (1 - .368 * R_);

        return Err632plus;

    }

    /**
     * Prediction error: first component of the 0.632+ bootstrap.
     * Train the classifier on the whole dataset and then also test it on the whole dataset.
     * 
     * @param machineLearningAlgorithm Specified machine learning algorithm
     * @return prediction error [0, 1]
     * @throws Exception exception
     */
    private double predictionError(final MachineLearningAlgorithm machineLearningAlgorithm) throws Exception {

        // Train
        Classifier classifier = ClassifierFactory.instantiate(machineLearningAlgorithm);
        classifier.buildClassifier(instances);

        // Test
        Evaluation evaluation = new Evaluation(instances);
        evaluation.evaluateModel(classifier, instances);

        // Return error rate
        return evaluation.errorRate();

    }

    /**
     * One iteration of the Leave-one-out Bootstrap Cross-Validation.
     * @return
     * @throws Exception 
     */
    private Evaluation bootstrapIteration(final MachineLearningAlgorithm machineLearningAlgorithm, final int randomSeed) {

        try {

            final int SIZE = instances.size();
            final Random r = new Random(randomSeed);

            // Custom sampling (100%, with replacement)
            List<Instance> TRAIN = new ArrayList<>(SIZE); // Empty list (add one-by-one)
            List<Instance> TEST = new ArrayList<>(instances); // Full (remove one-by-one)

            for(int i = 0; i < SIZE; i++) {

                // Random select instance
                Instance instance = instances.get(r.nextInt(SIZE));

                // Add to TRAIN, remove from TEST
                TRAIN.add(instance);
                TEST.remove(instance);

            }

            // Train
            Instances trainSet = new Instances(instances, TRAIN.size());
            trainSet.addAll(TRAIN);

            Classifier classifier = ClassifierFactory.instantiate(machineLearningAlgorithm);
            classifier.buildClassifier(trainSet);

            // Test set
            Instances testSet = new Instances(instances, TEST.size());
            testSet.addAll(TEST);

            // Test
            Evaluation evaluation = new Evaluation(instances);
            evaluation.evaluateModel(classifier, testSet);

            // Return the evaluation (for further processing)
            return evaluation;

        } catch(Exception e) {

            throw new RuntimeException(e);

        }

    }

}

I don't have enough reputation to comment on your answer but I see 2 problems in your code: 1. the bootstrapIteration method evaluates the model on the samples not in the training set (i.e. out of the bootstrap sample). If you look at the R code in bootpred, you can clearly see that they evaluate each model on the whole dataset (this is the line with yhat2). In fact, you can reconstruct the variable yhat1 (which is the error on the bootstrap sample) from this yhat2 (for instance with the variables $N_i^b$ and $N_i^b$ defined in the article). 2. you compute $\hat{\gamma}$ with the for — Mathieu Dubois, Oct 20 '15 at 08:30
Thank you for your input. Regarding #1: I was referring to http://stats.stackexchange.com/questions/96739/what-is-the-632-rule-in-bootstrapping post, where the train and test datasets are strictly separated (as they should be). The only exception is when I calculate the resubstitution rate. Regarding #2: I am only using n-ary classification, so I don't face the problems that you mentioned. Regarding implementation: I myself am not confident if the implementation is correct, so I cannot give you a confident answer. — alesc, Oct 20 '15 at 08:58

score 1 · Answer 2 · answered Mar 23 '15 at 12:12

1

The 0.632 and 0.632+ bootstraps were necessary because a discontinuous improper accuracy score (proportion classified correctly) was being used. If you use a proper accuracy score, the ordinary Efron-Gong optimism bootstrap works just fine, and is much easier to program.

answered Mar 23 '15 at 12:12

Frank Harrell

91,879
6
178
397

I am doing a research on comparison of CV and bootstrap methods. Currently, I have selected the following methods: train/test split (ratio: 80/20), 10-fold CV, 10-fold repeated CV (5 repeats), bootstrap 0.632, and bootstrap 0.632+. I am comparing all selected methods by bias (I also calculate the true error rate), variability and CPU time. I have selected the mentioned methods because I found them being mentioned in several papers. I will also look into your suggested method - at this stage, I still have the option to change my selected methods. – alesc Mar 23 '15 at 13:12

score 1 · Answer 3 · answered Oct 27 '15 at 00:40

I still don't have enough reputation to comment (even on the question) so it's the only way to try to help you. You are right for point 1: you don't need to evaluate on the whole dataset for the 0.632 or 0.632+ bootstrap (it's needed for the optimist bootstrap). However, there is another problem with your code: you compute the gama variable (which should be spelled gamma BTW) for every bootstrap iteration while in fact it only uses the output of the model trained and tested on the whole dataset (named $r_{\mathbf{x}}$ in the article). It's a bit complicated to understand that from the article but this is how I implemented it and how it is implemented in the Daim package (look for function Daim). Therefore you can compute gamma in function predictionError if it's possible to return two double or just return the evaluation variable and compute the apparent error rate and gamma in getErrorRate.

@alesc: I wanted to test your code in weka but I don't know how to proceed. Do you have any clue? I have grep trough weka 3.6.10 (default in Ubuntu 14.04) but haven't found any class named AbstractInterface. Which version do you use? — Mathieu Dubois, Oct 27 '15 at 21:12

Implementing the 0.632+ bootstrap method using the Weka Java API

3 Answers3