Fix load_state_dict for all timm models #1084

nilsleh · 2023-02-03T10:42:46Z

This PR closes #1049 , by implementing Isaac's solution.

tests/trainers/test_classification.py

torchgeo/trainers/utils.py

torchgeo/trainers/byol.py

nilsleh · 2023-02-08T21:37:16Z

I looked into the seco weights again. Since they are originally saved as part of a pytorch-lightning module, the keys have different names then the default timm keys. Looking at the seco code there are a "q" and a "k" network and the "q" network is used as a pretrained backbone for downstream tasks. The encoder_q.0.weight is the only weight that has the same shape as the default timm model and there is also no bias at this stage, so I think encoder_q.0.weight is the conv1.weight we look for. I can rename it and reupload to huggingface and then I think we should be good. Or do you have a suggestion?

import torch
import timm

timm_model = timm.create_model("resnet18")
checkpoint = torch.load("path/to/seco_resnet18_1m.ckpt", map_location="cpu")

state_dict = checkpoint["state_dict"]
assert timm_model.conv1.weight.shape == state_dict["encoder_q.0.weight"].shape

for key, val in checkpoint["state_dict"].items():
    if key not in ["encoder_q.0.weight", "encoder_k.0.weight"]:
        if state_dict["encoder_q.0.weight"].shape == val.shape:
            print(key)

state_dict["conv1.weight"] = state_dict.pop("encoder_q.0.weight")
# save this state dict with just the encoder_q weights and renamed 0.weight and upload to huggingface

adamjstewart · 2023-02-08T21:49:24Z

Your guess is as good as mine. Do the authors have any code for loading the pretrained model like your group does? If not, then I think your analysis makes sense. If you really want to be sure, you can try to train a model with those pretrained weights and make sure if converges quickly.

calebrob6 · 2023-02-08T21:59:57Z

You could use something like this -- https://gist.github.com/calebrob6/44f2e42017e2d192e837f0a1cd526c50 -- to make sure that linear-probing on a downstream dataset with the model achieves good performance. This is the notebook that I used to verify one of the SSL4EO weights I think.

nilsleh · 2023-02-08T22:08:56Z

Yes, so the encoder_q I got from this code where they load a backbone from their pretrained model for a downstream task.

nilsleh · 2023-02-09T20:31:07Z

I did the above described extraction of the weights and tried @calebrob6 script with the extracted seco resnet18 weights. As they are pretrained on RGB I only use the Eurosat RGB bands. Here are the scores I get:

Edit:
Correction, I forgot to call model.eval()...

This is with preprocessing step of just dividing image values by 1000:

seco resnet18 weights 0.7625
timm resnet imagenet 0.79
timm resnet random init 0.46

This is with preprocessing step of using the provided normalization stats for bands

seco resnet18 weights 0.75
timm resnet imagenet 0.89
timm resnet random init 0.65

nilsleh · 2023-02-09T21:20:32Z

I will try to investigate again tomorrow. This is what I am using as a script (takes about 3 minutes to run on cpu locally). And for some reason one needs pytorch-lightning==1.1.8 in order to do load the original checkpoint file.

nilsleh · 2023-02-10T18:45:46Z

I downloaded the bigearthnet dataset to try the linear probing script on that since that is the dataset they also report in their paper. For bigearthnet with 10,000 samples (not the full dataset), I get the following scores:

seco resnet18 weights: 0.4983
timm imagenet: 0.4512
timm random: 0.442

adamjstewart · 2023-02-10T18:51:21Z

Interesting evidence that the weights may not be very transferable...

adamjstewart · 2023-02-10T18:52:11Z

But this should be sufficient proof that your approach to extracting the first layer of weights is correct and we can move forward with this PR.

nilsleh · 2023-02-10T19:00:25Z

In the paper, they report quiet a significant improvement when using Seco in linear probing, so I must be doing something wrong. I can also contact the authors to sort it out.

adamjstewart · 2023-02-10T19:14:38Z

That's prob for MSI, not RGB

nilsleh · 2023-02-10T19:16:43Z

I think everything is only RGB, at least that is how I interpret it when they state "Although the collected dataset contains up to 12 spectral bands, in this work we focus on the RGB channels since it is a more general modality."

nilsleh · 2023-02-10T21:16:48Z

For the moment I updated the seco weights on huggingface, and the loading works for all weights now.

adamjstewart · 2023-02-11T20:20:21Z

I'm still seeing the same issue:

$ pytest -m slow tests/trainers/
...
FAILED tests/trainers/test_byol.py::TestBYOLTask::test_weight_enum_download[ResNet18_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_byol.py::TestBYOLTask::test_weight_enum_download[ResNet50_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_byol.py::TestBYOLTask::test_weight_str_download[ResNet18_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_byol.py::TestBYOLTask::test_weight_str_download[ResNet50_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_classification.py::TestClassificationTask::test_weight_enum_download[ResNet18_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_classification.py::TestClassificationTask::test_weight_enum_download[ResNet50_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_classification.py::TestClassificationTask::test_weight_str_download[ResNet18_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_classification.py::TestClassificationTask::test_weight_str_download[ResNet50_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_regression.py::TestRegressionTask::test_weight_enum_download[ResNet18_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_regression.py::TestRegressionTask::test_weight_enum_download[ResNet50_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_regression.py::TestRegressionTask::test_weight_str_download[ResNet18_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'
FAILED tests/trainers/test_regression.py::TestRegressionTask::test_weight_str_download[ResNet50_Weights.SENTINEL2_RGB_SECO] - KeyError: 'conv1.weight'

nilsleh · 2023-02-11T20:46:17Z

Mhm I am not getting those errors. Maybe, the old weights are still cached in your torch/hub? I had to delete those so it would reload the new ones after I uploaded them to huggingface.

adamjstewart · 2023-02-11T21:04:53Z

Oh mine are prob cached, let me delete

adamjstewart · 2023-02-11T21:06:35Z

Yep, works now. Thanks!

* implement isaacs solution * simple test for function * private function but failing tests * Fix in_channels * Fix model * Test real weights * Real weights have no final layer * Style fixes * expand test coverage of other trainers * revert byol image_size --------- Co-authored-by: Adam J. Stewart <[email protected]>

nilsleh added 2 commits January 28, 2023 14:05

implement isaacs solution

406814d

simple test for function

8cccb69

nilsleh marked this pull request as draft February 3, 2023 10:43

github-actions bot added testing Continuous integration testing trainers PyTorch Lightning trainers labels Feb 3, 2023

adamjstewart reviewed Feb 3, 2023

View reviewed changes

tests/trainers/test_classification.py Show resolved Hide resolved

torchgeo/trainers/utils.py Outdated Show resolved Hide resolved

adamjstewart added this to the 0.4.1 milestone Feb 3, 2023

nilsleh and others added 7 commits February 6, 2023 15:56

private function but failing tests

f4bc4fe

Fix in_channels

7cfce01

Fix model

8857890

Test real weights

503f8b8

Real weights have no final layer

a5541a2

Style fixes

912b4b3

expand test coverage of other trainers

a8cffdb

adamjstewart reviewed Feb 7, 2023

View reviewed changes

torchgeo/trainers/byol.py Outdated Show resolved Hide resolved

revert byol image_size

20601f6

nilsleh marked this pull request as ready for review February 10, 2023 21:14

adamjstewart approved these changes Feb 11, 2023

View reviewed changes

adamjstewart merged commit a461d58 into microsoft:main Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix load_state_dict for all timm models #1084

Fix load_state_dict for all timm models #1084

nilsleh commented Feb 3, 2023 •

edited by adamjstewart

Loading

nilsleh commented Feb 8, 2023

adamjstewart commented Feb 8, 2023

calebrob6 commented Feb 8, 2023

nilsleh commented Feb 8, 2023

nilsleh commented Feb 9, 2023 •

edited

Loading

nilsleh commented Feb 9, 2023 •

edited

Loading

nilsleh commented Feb 10, 2023

adamjstewart commented Feb 10, 2023

adamjstewart commented Feb 10, 2023

nilsleh commented Feb 10, 2023

adamjstewart commented Feb 10, 2023

nilsleh commented Feb 10, 2023

nilsleh commented Feb 10, 2023

adamjstewart commented Feb 11, 2023

nilsleh commented Feb 11, 2023

adamjstewart commented Feb 11, 2023

adamjstewart commented Feb 11, 2023

Fix load_state_dict for all timm models #1084

Fix load_state_dict for all timm models #1084

Conversation

nilsleh commented Feb 3, 2023 • edited by adamjstewart Loading

nilsleh commented Feb 8, 2023

adamjstewart commented Feb 8, 2023

calebrob6 commented Feb 8, 2023

nilsleh commented Feb 8, 2023

nilsleh commented Feb 9, 2023 • edited Loading

nilsleh commented Feb 9, 2023 • edited Loading

nilsleh commented Feb 10, 2023

adamjstewart commented Feb 10, 2023

adamjstewart commented Feb 10, 2023

nilsleh commented Feb 10, 2023

adamjstewart commented Feb 10, 2023

nilsleh commented Feb 10, 2023

nilsleh commented Feb 10, 2023

adamjstewart commented Feb 11, 2023

nilsleh commented Feb 11, 2023

adamjstewart commented Feb 11, 2023

adamjstewart commented Feb 11, 2023

nilsleh commented Feb 3, 2023 •

edited by adamjstewart

Loading

nilsleh commented Feb 9, 2023 •

edited

Loading

nilsleh commented Feb 9, 2023 •

edited

Loading