-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip tied weights disk offload test #2782
Conversation
tests/test_big_modeling.py
Outdated
# This test fails because sometimes data_ptr() of compute2.weight is the same of compute1.weight. | ||
# I check that the values are not the same but it gives the same address. This does not happen on my local machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have enough coverage that seems okay, however we can also do a @flakey
decorator that retries the test some amount of times first (maybe 3?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that but I guess it depends more on the hardware. I launch the test a lot of time when debugging and it was failing each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, we can just skip then :)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* skip * fix * quality * fix comment
What does this do ?
This PR skips a flaky test. Not sure why it happens but the
data_ptr()
ofcompute2.weight
is sometimes the same ofcompute1.weight.
Hence it is using the same value as compute1 through self.tied_params_map and we get an assert error. I checked that the values are not the same but it gives the same address. This does not happen on my local machine and it passes sometimes on the CI. Maybe an issue with the hardware. We should have enough coverage withtest_dispatch_model_tied_weights_memory_with_nested_offload_cpu
, so I think it's fine if we skip it.cc @fxmarty