Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TorchFix] Add weights_only to torch.load #8105

Merged
merged 9 commits into from
Dec 14, 2023
Merged

[TorchFix] Add weights_only to torch.load #8105

merged 9 commits into from
Dec 14, 2023

Conversation

kit1980
Copy link
Member

@kit1980 kit1980 commented Nov 9, 2023

torch.load without weights_only is a potential security issue (see pytorch/test-infra#4671)

Adding weights_only=True is potentially unsafe correctness-wise if full pickling functionality is needed, but it should be rare and the tests should catch it (and I changed couple of places to weights_only=False after the tests failures).

The "unittests-macos (3.8, macos-m1-12)" failure is preexisting.

Copy link

pytorch-bot bot commented Nov 9, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8105

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit cd17926 with merge base 01dca0e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@kit1980 kit1980 marked this pull request as ready for review November 9, 2023 04:34
kit1980 added a commit to pytorch/test-infra that referenced this pull request Nov 9, 2023
#4671 added linter-only
`TorchUnsafeLoadVisitor`, but it turned out that the issue is so
widespread that manual fixes would be tedious.

The codemod is somewhat unsafe correctness-wise because full pickling
functionality may still be needed even without `pickle_module`, but I
think it's OK because it fixes a security-related issue and the codemods
need to be verified anyway.

Maybe later we should add something like Ruff's recently added
`--unsafe-fixes`: https://docs.astral.sh/ruff/linter/#fix-safety

I used this for pytorch/vision#8105
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other reviewers: the parameter weights_only is a misnomer. Setting weights_only=True means that on unpickling only certain types are deserialized. However this is not restricted to "weights", but rather

Indicates whether unpickler should be restricted to loading only tensors, primitive types and dictionaries

The use case here is to avoid security issues by potentially unpickling and executing arbitrary code.

However, in all instances in this PR that is not an issue. We have generated the pickled file either dynamically at runtime or statically as part of the repository. Thus, there is no security issue since we are not loading third-party stuff.

I guess we could go for this to lead by example, but there is no other benefit to it. @NicolasHug thoughts?

test/test_functional_tensor.py Outdated Show resolved Hide resolved
test/test_transforms_v2.py Outdated Show resolved Hide resolved
@@ -127,7 +127,7 @@ def load_data(traindir, valdir, args):
if args.cache_dataset and os.path.exists(cache_path):
# Attention, as the transforms are also cached!
print(f"Loading dataset_train from {cache_path}")
dataset, _ = torch.load(cache_path)
dataset, _ = torch.load(cache_path, weights_only=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if ref script is still working with this update?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, certainly. Caching the dataset is not that common I guess and there were some discussion regarding removing it (#6727 (comment)). However, I agree that this is likely something that could break with weights_only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change to weights_only=False if it's required, just be explicit about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dataset, _ = torch.load(cache_path, weights_only=True)
# TODO: this could probably be weights_only=True
dataset, _ = torch.load(cache_path, weights_only=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't have much bandwidth to check that at the time but to unblock the PR, let's set it to False as suggested, with a TODO.

@kit1980
Copy link
Member Author

kit1980 commented Nov 9, 2023

I guess we could go for this to lead by example, but there is no other benefit to it

@pmeier The thing is that very likely soon weights_only=True will become default in PyTorch.
So specifying appropriate value for weights_only now will also prevent surprises.

kit1980 added a commit to pytorch-labs/torchfix that referenced this pull request Nov 23, 2023
pytorch/test-infra#4671 added linter-only
`TorchUnsafeLoadVisitor`, but it turned out that the issue is so
widespread that manual fixes would be tedious.

The codemod is somewhat unsafe correctness-wise because full pickling
functionality may still be needed even without `pickle_module`, but I
think it's OK because it fixes a security-related issue and the codemods
need to be verified anyway.

Maybe later we should add something like Ruff's recently added
`--unsafe-fixes`: https://docs.astral.sh/ruff/linter/#fix-safety

I used this for pytorch/vision#8105
@malfet
Copy link
Contributor

malfet commented Dec 6, 2023

However, in all instances in this PR that is not an issue. We have generated the pickled file either dynamically at runtime or statically as part of the repository. Thus, there is no security issue since we are not loading third-party stuff.

I agree that weigths_only should not be a concern for training checkpoints (as they are not persistent), but I would argue that any potentially executable binary code checked into the repository/torch.hub is a security treat and for that codepath I would insist on using the weighs_only

@kit1980 do you mind updating the checkpoint loading path to weigths_only=False as @NicolasHug suggested or test that it's fine to use True (and mention it in the test plan)

@kit1980
Copy link
Member Author

kit1980 commented Dec 14, 2023

Committed the suggestions by @NicolasHug and rebased the PR.

@kit1980 kit1980 merged commit c35d385 into main Dec 14, 2023
64 checks passed
Copy link

Hey @kit1980!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Jan 16, 2024
Reviewed By: vmoens

Differential Revision: D52538999

fbshipit-source-id: 656cea3784918905cdb8db302ae3f081ed3a8b28

Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants