-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TorchFix] Add weights_only to torch.load #8105
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8105
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit cd17926 with merge base 01dca0e (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
#4671 added linter-only `TorchUnsafeLoadVisitor`, but it turned out that the issue is so widespread that manual fixes would be tedious. The codemod is somewhat unsafe correctness-wise because full pickling functionality may still be needed even without `pickle_module`, but I think it's OK because it fixes a security-related issue and the codemods need to be verified anyway. Maybe later we should add something like Ruff's recently added `--unsafe-fixes`: https://docs.astral.sh/ruff/linter/#fix-safety I used this for pytorch/vision#8105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other reviewers: the parameter weights_only
is a misnomer. Setting weights_only=True
means that on unpickling only certain types are deserialized. However this is not restricted to "weights", but rather
Indicates whether unpickler should be restricted to loading only tensors, primitive types and dictionaries
The use case here is to avoid security issues by potentially unpickling and executing arbitrary code.
However, in all instances in this PR that is not an issue. We have generated the pickled file either dynamically at runtime or statically as part of the repository. Thus, there is no security issue since we are not loading third-party stuff.
I guess we could go for this to lead by example, but there is no other benefit to it. @NicolasHug thoughts?
references/classification/train.py
Outdated
@@ -127,7 +127,7 @@ def load_data(traindir, valdir, args): | |||
if args.cache_dataset and os.path.exists(cache_path): | |||
# Attention, as the transforms are also cached! | |||
print(f"Loading dataset_train from {cache_path}") | |||
dataset, _ = torch.load(cache_path) | |||
dataset, _ = torch.load(cache_path, weights_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check if ref script is still working with this update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, certainly. Caching the dataset is not that common I guess and there were some discussion regarding removing it (#6727 (comment)). However, I agree that this is likely something that could break with weights_only
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change to weights_only=False
if it's required, just be explicit about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset, _ = torch.load(cache_path, weights_only=True) | |
# TODO: this could probably be weights_only=True | |
dataset, _ = torch.load(cache_path, weights_only=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't have much bandwidth to check that at the time but to unblock the PR, let's set it to False as suggested, with a TODO.
@pmeier The thing is that very likely soon |
pytorch/test-infra#4671 added linter-only `TorchUnsafeLoadVisitor`, but it turned out that the issue is so widespread that manual fixes would be tedious. The codemod is somewhat unsafe correctness-wise because full pickling functionality may still be needed even without `pickle_module`, but I think it's OK because it fixes a security-related issue and the codemods need to be verified anyway. Maybe later we should add something like Ruff's recently added `--unsafe-fixes`: https://docs.astral.sh/ruff/linter/#fix-safety I used this for pytorch/vision#8105
I agree that @kit1980 do you mind updating the checkpoint loading path to |
97ac576
to
00623a4
Compare
Committed the suggestions by @NicolasHug and rebased the PR. |
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
00623a4
to
cd17926
Compare
Hey @kit1980! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Reviewed By: vmoens Differential Revision: D52538999 fbshipit-source-id: 656cea3784918905cdb8db302ae3f081ed3a8b28 Co-authored-by: Philip Meier <[email protected]> Co-authored-by: Nicolas Hug <[email protected]>
torch.load
withoutweights_only
is a potential security issue (see pytorch/test-infra#4671)Adding
weights_only=True
is potentially unsafe correctness-wise if full pickling functionality is needed, but it should be rare and the tests should catch it (and I changed couple of places toweights_only=False
after the tests failures).The "unittests-macos (3.8, macos-m1-12)" failure is preexisting.