This supplementary material includes detailed experiment settings for regularization and replay methods in continual learning, with a focus on ResNet-18 and the Adam optimizer. We present additional results for different replay strategies, including a modified approach and comparisons to our method, revealing key insights into performance and hyperparameter effects.
(2) Çagatay Yıldız, University of Tübingen;
(3) Gido M. van de Ven, KU Leuven;
(4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox;
(5) Tinne Tuytelaars, KU Leuven;
(6) Matthias Bethge, University of Tübingen.
All models employ the same ResNet-18 [9] backbone and are trained using the Adam optimizer [14] with default PyTorch [27] parameter values (λ = 0.001, β1 = 0.9, β2 = 0.999). For our method we always report the average test accuracy over 5 runs with 5 epochs per task. For other methods, due to computational constraints, we only report the accuracy of a single run.
1.1. Regularization methods
We ran a grid search to set the loss balance weight λo in LwF [18] and the strength parameter c in SI [39], but found the choice of hyperparameter did not influence the result. For the run shown in Figure Fig. 4, we used λo = 0.1 and c = 0.1.
1.2. Replay methods
For the experiments presented in the main text, we use a modified version of experience replay, inspired by the approach of GDumb [28]. At the beginning of each task t, we add all the training data from the current task to the buffer. If the buffer is full, we employ reservoir sampling so that the memory buffer contains an equal number of examples of every class seen so far, including the classes in task t. We then train the model on the memory buffer until convergence and report test accuracy.
Here we present results for a different rehearsal approach. For each task t, we extend every mini-batch of the training set with an equal number of samples chosen randomly (with replacement) from the buffer. We train the model for five epochs per task. Figure 1 shows a comparison of this training protocol to our method. Surprisingly, this way of doing replay yields better test set accuracy than the replay baseline we used in the main text, despite putting a disproportionate weight on the current task.
Figure 2 shows a comparison of our method to experience replay with no limit on the buffer size. Here we also mix the samples from the current task with randomly chosen buffer samples and train for five epochs per task.
1.3. Contrastive baseline
This paper is under CC 4.0 license.
L O A D I N G . . . comments & more!
About Author
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community