visit
For us to take advantage of transfer learning, we can use fine-tuning to adopt the model to our new problem. In many cases, we start by replacing the last layer of the model. With the AlexNet example, this might mean the last layer was previously used to classify cars, but our new problem is classifying animals.
In the pretrained_model_tuner.py
file, you'll find the code that defines both the AlexNet and SqueezeNet models. We start by initializing these models to get the number of model features and the input size we need for fine-tuning.
We currently have a train
stage in the dvc.yaml
file. If you take a look at it, you'll see something like:
stages:
train:
cmd: python pretrained_model_tuner.py
deps:
- data/hymenoptera_data
- pretrained_model_tuner.py
params:
- lr
- momentum
- model_name
- num_classes
- batch_size
- num_epochs
outs:
- model.pt:
checkpoint: true
live:
results:
summary: true
html: true
The reason we need this dvc.yaml
file is, so DVC knows what to pay attention to in our workflow. It will start managing data, understanding which metrics to pay attention to, and the expected output for each step.
You'll typically add stages to dvc.yaml
using the dvc stage add
command, and this is one of the ways you can add new stages or update existing ones. With the train
stage defined, let's look at where the metrics actually come from in the code.
If you open pretrained_model_tuner
, you'll see a line where we dump the accuracy and loss for the training epochs into the results.json
file. We're also saving the model on the epoch run and recording metrics for each epoch using dvclive
logging.
if phase == 'train':
torch.save(model.state_dict(), "model.pt")
dvclive.log('acc', epoch_acc.item())
dvclive.log('loss', epoch_loss)
dvclive.log('training_time', epoch_time_elapsed)
if phase == 'val':
dvclive.log('val_acc', epoch_acc.item())
dvclive.log('val_loss', epoch_loss)
val_acc_history.append(epoch_acc)
dvclive.next_step()
This code is needed to let DVC access the metrics in the project because it will read the metrics from the dvclive.json
file.
Since we have several hyperparameters set in the params.yaml
, we need to use those values when we run the training stage. The following code makes the hyperparameter values accessible in the train
function.
with open("params.yaml") as f:
yaml=YAML(typ='safe')
params = yaml.load(f)
You can find the code that initializes the AlexNet model in the initialize_model
function in pretrained_model_tuner.py
. Since we have DVC set up, we can jump straight into fine-tuning this model to see which hyperparameters give us the best accuracy.
$ dvc exp run
This will execute the pretrained_model_tuner.py
script and run for 5 epochs since that's what we defined in params.yaml
. When this finishes, you can check out the metrics from this run with the current hyperparameter values.
$ dvc exp show
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Experiment ┃ Created ┃ step ┃ acc ┃ loss ┃ training_time ┃ val_acc ┃ val_loss ┃ lr ┃ momentum ┃ model_name ┃ num_classes ┃ batch_size ┃ num_epochs ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ workspace │ - │ 4 │ 0.92623 │ 0.19567 │ 29.18 │ 0.9085 │ 0.25145 │ 0.001 │ 0.09 │ alexnet │ 2 │ 8 │ 5 │
│ main │ 01:58 PM │ - │ - │ - │ - │ - │ - │ 0.001 │ 0.09 │ alexnet │ 2 │ 8 │ 5 │
│ │ ╓ bf81637 [exp-a1f53] │ 02:05 PM │ 4 │ 0.92623 │ 0.19567 │ 229.18 │ 0.9085 │ 0.25145 │ 0.001 │ 0.09 │ alexnet │ 2 │ 8 │ 5 │
│ │ ╟ 9ca3fb8 │ 02:04 PM │ 3 │ 0.89344 │ 0.27423 │ 178.34 │ 0.90196 │ 0.26965 │ 0.001 │ 0.09 │ alexnet │ 2 │ 8 │ 5 │
│ │ ╟ a34ead1 │ 02:03 PM │ 2 │ 0.87295 │ 0.29018 │ 127.36 │ 0.9085 │ 0.2796 │ 0.001 │ 0.09 │ alexnet │ 2 │ 8 │ 5 │
│ │ ╟ ae382c7 │ 02:02 PM │ 1 │ 0.89754 │ 0.26993 │ 76.419 │ 0.89542 │ 0.31113 │ 0.001 │ 0.09 │ alexnet │ 2 │ 8 │ 5 │
│ ├─╨ a95260d │ 02:01 PM │ 0 │ 0.73361 │ 0.5271 │ 25.71 │ 0.86928 │ 0.36408 │ 0.001 │ 0.09 │ alexnet │ 2 │ 8 │ 5 │
└─────────────────────────┴──────────┴──────┴─────────┴─────────┴───────────────┴─────────┴──────────┴───────┴──────────┴────────────┴─────────────┴────────────┴────────────┘
params.yaml
--set-param
or the shorthand -S
option on
dvc exp run
--queue
option on
dvc exp run
We'll do an example of each of these throughout the rest of this article. Let's start by updating the hyperparameter values in params.yaml
. You should have these values in your file.
lr: 0.009
momentum: 0.017
Now run another experiment with dvc exp run
. To make the table more readable, we're going to specify the parameters we want to show and take a look at the metrics with:
$ dvc exp show --no-timestamp --include-params lr,momentum,model_name
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Experiment ┃ step ┃ acc ┃ loss ┃ training_time ┃ val_acc ┃ val_loss ┃ lr ┃ momentum ┃ model_name ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━┩
│ workspace │ 9 │ 0.91803 │ 0.27989 │ 228.59 │ 0.82353 │ 0.69077 │ 0.009 │ 0.017 │ alexnet │
│ main │ - │ - │ - │ - │ - │ - │ 0.001 │ 0.09 │ alexnet │
│ │ ╓ 2361cff [exp-c0b11] │ 9 │ 0.91803 │ 0.27989 │ 228.59 │ 0.82353 │ 0.69077 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ 7686d2f │ 8 │ 0.90984 │ 0.23496 │ 177.65 │ 0.87582 │ 0.50887 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ 671f8cd │ 7 │ 0.88934 │ 0.39237 │ 126.7 │ 0.86928 │ 0.47856 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ ea1bf61 │ 6 │ 0.84836 │ 0.4195 │ 75.834 │ 0.91503 │ 0.30885 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ a9f8dab (bf81637) │ 5 │ 0.79508 │ 0.72891 │ 25.219 │ 0.66667 │ 1.0311 │ 0.009 │ 0.017 │ alexnet │
│ │ ╓ bf81637 [exp-a1f53] │ 4 │ 0.92623 │ 0.19567 │ 229.18 │ 0.9085 │ 0.25145 │ 0.001 │ 0.09 │ alexnet │
│ │ ╟ 9ca3fb8 │ 3 │ 0.89344 │ 0.27423 │ 178.34 │ 0.90196 │ 0.26965 │ 0.001 │ 0.09 │ alexnet │
│ │ ╟ a34ead1 │ 2 │ 0.87295 │ 0.29018 │ 127.36 │ 0.9085 │ 0.2796 │ 0.001 │ 0.09 │ alexnet │
│ │ ╟ ae382c7 │ 1 │ 0.89754 │ 0.26993 │ 76.419 │ 0.89542 │ 0.31113 │ 0.001 │ 0.09 │ alexnet │
│ ├─╨ a95260d │ 0 │ 0.73361 │ 0.5271 │ 25.71 │ 0.86928 │ 0.36408 │ 0.001 │ 0.09 │ alexnet │
└─────────────────────────┴──────┴─────────┴─────────┴───────────────┴─────────┴──────────┴───────┴──────────┴────────────┘
Finding good values for hyperparameters can take a few iterations, even when you're working with a pre-trained model. So we'll run one more experiment to fine-tune this AlexNet model. This time we'll do it using the -S
option.
$ dvc exp run -S lr=0.025 -S momentum=0.5 -S num_epochs=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Experiment ┃ step ┃ acc ┃ loss ┃ training_time ┃ val_acc ┃ val_loss ┃ lr ┃ momentum ┃ model_name ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━┩
│ workspace │ 11 │ 0.88525 │ 1.1355 │ 76.799 │ 0.9085 │ 1.7642 │ 0.025 │ 0.5 │ alexnet │
│ main │ - │ - │ - │ - │ - │ - │ 0.001 │ 0.09 │ alexnet │
│ │ ╓ 54e87bc [exp-52406] │ 11 │ 0.88525 │ 1.1355 │ 76.799 │ 0.9085 │ 1.7642 │ 0.025 │ 0.5 │ alexnet │
│ │ ╟ b2b9ad0 (2361cff) │ 10 │ 0.79098 │ 2.9427 │ 25.715 │ 0.8366 │ 1.4148 │ 0.025 │ 0.5 │ alexnet │
│ │ ╓ 2361cff [exp-c0b11] │ 9 │ 0.91803 │ 0.27989 │ 228.59 │ 0.82353 │ 0.69077 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ 7686d2f │ 8 │ 0.90984 │ 0.23496 │ 177.65 │ 0.87582 │ 0.50887 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ 671f8cd │ 7 │ 0.88934 │ 0.39237 │ 126.7 │ 0.86928 │ 0.47856 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ ea1bf61 │ 6 │ 0.84836 │ 0.4195 │ 75.834 │ 0.91503 │ 0.30885 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ a9f8dab (bf81637) │ 5 │ 0.79508 │ 0.72891 │ 25.219 │ 0.66667 │ 1.0311 │ 0.009 │ 0.017 │ alexnet │
│ │ ╓ bf81637 [exp-a1f53] │ 4 │ 0.92623 │ 0.19567 │ 229.18 │ 0.9085 │ 0.25145 │ 0.001 │ 0.09 │ alexnet │
We'll switch over to fine-tuning SqueezeNet now that you've seen how the process works in DVC. You'll need to update the model_name
hyperparameter in params.yaml
to squeezenet
if you're following along. The other hyperparameter values can stay the same for now.
Let's run one experiment with dvc exp run --reset
to show the difference in the metrics between the two models. Remember, since we're using checkpoints, it continues training on top of the previous experiment. That's why we're using the --reset
option here to start a fresh experiment for the new model. You should see results similar to this in your table.
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Experiment ┃ step ┃ acc ┃ loss ┃ training_time ┃ val_acc ┃ val_loss ┃ lr ┃ momentum ┃ model_name ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━┩
│ workspace │ 1 │ 0.85656 │ 0.35667 │ 83.414 │ 0.87582 │ 0.34273 │ 0.025 │ 0.5 │ squeezenet │
│ main │ - │ - │ - │ - │ - │ - │ 0.001 │ 0.09 │ squeezenet │
│ │ ╓ 87ccd2e [exp-95f0f] │ 1 │ 0.85656 │ 0.35667 │ 83.414 │ 0.87582 │ 0.34273 │ 0.025 │ 0.5 │ squeezenet │
│ ├─╨ 7d2fafc │ 0 │ 0.80328 │ 0.50723 │ 29.165 │ 0.89542 │ 0.3987 │ 0.025 │ 0.5 │ squeezenet │
│ │ ╓ 54e87bc [exp-52406] │ 11 │ 0.88525 │ 1.1355 │ 76.799 │ 0.9085 │ 1.7642 │ 0.025 │ 0.5 │ alexnet │
│ │ ╟ b2b9ad0 (2361cff) │ 10 │ 0.79098 │ 2.9427 │ 25.715 │ 0.8366 │ 1.4148 │ 0.025 │ 0.5 │ alexnet │
│ │ ╓ 2361cff [exp-c0b11] │ 9 │ 0.91803 │ 0.27989 │ 228.59 │ 0.82353 │ 0.69077 │ 0.009 │ 0.017 │ alexnet │
$ dvc exp run --queue -S lr=0.0001 -S momentum=0.9 -S num_epochs=2
$ dvc exp run --queue -S lr=0.001 -S momentum=0.09 -S num_epochs=2
You can check out the details for the queues you have in place by looking at the
experiments table with dvc exp show
. You'll see something like this.
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Experiment ┃ step ┃ acc ┃ loss ┃ training_time ┃ val_acc ┃ val_loss ┃ lr ┃ momentum ┃ model_name ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━┩
│ workspace │ 1 │ 0.85656 │ 0.35667 │ 83.414 │ 0.87582 │ 0.34273 │ 0.025 │ 0.5 │ squeezenet │
│ main │ - │ - │ - │ - │ - │ - │ 0.001 │ 0.09 │ squeezenet │
│ │ ╓ 87ccd2e [exp-95f0f] │ 1 │ 0.85656 │ 0.35667 │ 83.414 │ 0.87582 │ 0.34273 │ 0.025 │ 0.5 │ squeezenet │
│ ├─╨ 7d2fafc │ 0 │ 0.80328 │ 0.50723 │ 29.165 │ 0.89542 │ 0.3987 │ 0.025 │ 0.5 │ squeezenet │
│ │ ╓ 54e87bc [exp-52406] │ 11 │ 0.88525 │ 1.1355 │ 76.799 │ 0.9085 │ 1.7642 │ 0.025 │ 0.5 │ alexnet │
│ │ ╟ b2b9ad0 (2361cff) │ 10 │ 0.79098 │ 2.9427 │ 25.715 │ 0.8366 │ 1.4148 │ 0.025 │ 0.5 │ alexnet │
│ │ ╓ 2361cff [exp-c0b11] │ 9 │ 0.91803 │ 0.27989 │ 228.59 │ 0.82353 │ 0.69077 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ 7686d2f │ 8 │ 0.90984 │ 0.23496 │ 177.65 │ 0.87582 │ 0.50887 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ 671f8cd │ 7 │ 0.88934 │ 0.39237 │ 126.7 │ 0.86928 │ 0.47856 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ ea1bf61 │ 6 │ 0.84836 │ 0.4195 │ 75.834 │ 0.91503 │ 0.30885 │ 0.009 │ 0.017 │ alexnet │
...
│ ├── *2df7fa5 │ - │ - │ - │ - │ - │ - │ 0.0001│ 0.9 │ squeezenet │
│ ├── *699dcae │ - │ - │ - │ - │ - │ - │ 0.001 │ 0.09 │ squeezenet │
└─────────────────────────┴──────┴─────────┴──────────┴─────────┴─────────┴───────────────┴───────┴──────────┴────────────┘
$ dvc exp run --run-all
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Experiment ┃ step ┃ acc ┃ loss ┃ training_time ┃ val_acc ┃ val_loss ┃ lr ┃ momentum ┃ model_name ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━┩
│ workspace │ 5 │ 0.76639 │ 0.49865 │ 85.705 │ 0.81699 │ 0.4518 │ 0.001 │ 0.09 │ squeezenet │
│ main │ - │ - │ - │ - │ - │ - │ 0.001 │ 0.09 │ squeezenet │
│ │ ╓ 699dcae [exp-8322f] │ 5 │ 0.76639 │ 0.49865 │ 85.705 │ 0.81699 │ 0.4518 │ 0.001 │ 0.09 │ squeezenet │
│ │ ╟ d26c25b (2df7fa5) │ 4 │ 0.60246 │ 0.68464 │ 29.243 │ 0.69935 │ 0.55156 │ 0.001 │ 0.09 │ squeezenet │
│ │ ╓ 2df7fa5 [exp-d1c65] │ 3 │ 0.78689 │ 0.488 │ 83.929 │ 0.83007 │ 0.41527 │ 0.0001 │ 0.9 │ squeezenet │
│ │ ╟ 05e1b41 (87ccd2e) │ 2 │ 0.59016 │ 0.76999 │ 28.455 │ 0.75163 │ 0.49807 │ 0.0001 │ 0.9 │ squeezenet │
│ │ ╓ 87ccd2e [exp-95f0f] │ 1 │ 0.85656 │ 0.35667 │ 83.414 │ 0.87582 │ 0.34273 │ 0.025 │ 0.5 │ squeezenet │
│ ├─╨ 7d2fafc │ 0 │ 0.80328 │ 0.50723 │ 29.165 │ 0.89542 │ 0.3987 │ 0.025 │ 0.5 │ squeezenet │
│ │ ╓ 54e87bc [exp-52406] │ 11 │ 0.88525 │ 1.1355 │ 76.799 │ 0.9085 │ 1.7642 │ 0.025 │ 0.5 │ alexnet │
│ │ ╟ b2b9ad0 (2361cff) │ 10 │ 0.79098 │ 2.9427 │ 25.715 │ 0.8366 │ 1.4148 │ 0.025 │ 0.5 │ alexnet │
│ │ ╓ 2361cff [exp-c0b11] │ 9 │ 0.91803 │ 0.27989 │ 228.59 │ 0.82353 │ 0.69077 │ 0.009 │ 0.017 │ alexnet │
│ │ ╟ 7686d2f │ 8 │ 0.90984 │ 0.23496 │ 177.65 │ 0.87582 │ 0.50887 │ 0.009 │ 0.017 │ alexnet │
Also Published On: