Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GAIL with SAC learner on GPU #660

Merged
merged 6 commits into from
Feb 6, 2023
Merged

Conversation

ernestum
Copy link
Collaborator

@ernestum ernestum commented Jan 11, 2023

Description

Right now GAIL with a SAC learner is broken on the GPU. Read details in #655

Testing

This PR includes a test to reproduce the issue.
Note: the test in the pipeline does not fail since it is executed without GPU.

@codecov
Copy link

codecov bot commented Jan 11, 2023

Codecov Report

Merging #660 (6c9baa6) into master (2acc4b6) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #660   +/-   ##
=======================================
  Coverage   97.58%   97.58%           
=======================================
  Files          87       87           
  Lines        8566     8568    +2     
=======================================
+ Hits         8359     8361    +2     
  Misses        207      207           
Impacted Files Coverage Δ
src/imitation/algorithms/adversarial/common.py 96.83% <100.00%> (+0.01%) ⬆️
tests/algorithms/test_adversarial.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ernestum ernestum marked this pull request as ready for review January 19, 2023 11:51
@ernestum
Copy link
Collaborator Author

This now fixes the issue and makes the test pass again (also on a GPU device).
However the regression test is pointless since the CircleCI runners don't have a GPU.

I tried to use the meta device in pytorch as suggested here but then we run into issues because the tensors on the meta device contain no data.

Another option would be to switch to a cuda-enabled runner but then the supported images seem to be quite restricted and it would probably be costly/wasteful to run all of our tests in such a runner.

I would propose to introduce a GPU flag for our tests and then run only those on a dedicated GPU runner. What do you think @AdamGleave ?

@ernestum ernestum force-pushed the 655_gail_with_sac_on_gpu branch from d51d77e to 67e25ad Compare January 20, 2023 11:45
@ernestum ernestum requested a review from AdamGleave January 23, 2023 13:05
@ernestum ernestum mentioned this pull request Jan 30, 2023
@AdamGleave
Copy link
Member

Sorry for only just getting around to reviewing this. Overall the changes looks good, I had a few minor comments on the tests.

I would propose to introduce a GPU flag for our tests and then run only those on a dedicated GPU runner. What do you think @AdamGleave ?

At a high-level makes sense. We don't actually have access to the CircleCI GPU executors (they require a minimum spend of $6k/year to enable it). We have self-hosted GPU test runners in the past (and that lets us use any Docker image), but it is a bit of a pain to maintain. So I might lean towards adding GPU tests with the flag but putting enabling it on CI on the back-burner, and we can always try to manually run the GPU tests before a release to sanity check?

@ernestum ernestum force-pushed the 655_gail_with_sac_on_gpu branch from d0ecdab to ff1037a Compare February 1, 2023 14:34
@ernestum
Copy link
Collaborator Author

ernestum commented Feb 1, 2023

Thanks for the review! I implemented all of your suggestions.

Now the test only runs when a GPU is available (otherwise it is pointless) and I opened #671 so we keep in mind to automate GPU tests someday.

@ernestum ernestum requested a review from AdamGleave February 1, 2023 14:52
@ernestum
Copy link
Collaborator Author

ernestum commented Feb 2, 2023

Note: the reduction in coverage is intentional. We lost some lines of coverage since I disabled the regression test for CPU-only instances.

@ernestum ernestum force-pushed the 655_gail_with_sac_on_gpu branch from 55b93af to 2fd96d5 Compare February 3, 2023 09:31
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AdamGleave
Copy link
Member

@ernestum can you add # pragma: no cover to the test that's being skipped? Otherwise codecov check will keep complaining about the test code not being executed.

@AdamGleave AdamGleave merged commit d74b1a8 into master Feb 6, 2023
@AdamGleave AdamGleave deleted the 655_gail_with_sac_on_gpu branch February 6, 2023 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants