-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disabling parallel evaluation on darwin #438
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, I cannot test on mac but code looks okay to me. If I remember correctly though parallel evaluation was not working with qibojit also on linux, that's why the related tests are disabled. What exactly is the new issue? Installing qibojit on mac blocks the parallel evaluation for all backends? For qibotf it used to work on mac, right?
Regarding revising the implementation, perhaps we should merge this and open an issue for now. Is there an alternative approach based on Python?
ok, I believe this PR already disables if
https://github.com/qiboteam/qibo/runs/2996557250?check_suite_focus=true#step:8:56
I think so, in particular if we import qibojit, before loading qibotf.
I agree, I don't think the parallel implementation we have is robust, we can keep as experimental for the time being, and the propose some better approach. |
Currently the tests are skipped. We could change to catching the error, however I am not sure if this is very useful.
Oh, I see now. I am not sure if there is a away to avoid this while keeping qibojit as the default backend. So we either follow this PR and disable parallel completely for mac or we make a different backend the default. |
Yeah, lets keep disable and in future revisit the implementation ref. #440. |
@stavros11 seems like examples are taking much longer to run, so we hit the max time of 6h. |
Usually example tests take long but do not hit the max time. I am running them locally now and I actually get some failures so I am guessing examples don't work well with the qibojit backend which is the new default. |
The failing tests were related to numpy conversion using Tensorflow's |
Codecov Report
@@ Coverage Diff @@
## testqibojit #438 +/- ##
===============================================
Coverage ? 100.00%
===============================================
Files ? 83
Lines ? 11865
Branches ? 0
===============================================
Hits ? 11865
Misses ? 0
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
I re-run the example tests using this branch (with the two problematic tests skipped) and they pass within a few minutes both in my laptop and qibo machine. @scarrazza could you please confirm that this works for you too? I am not sure why the CI times out. |
Thanks for checking, I doing that right now. |
@stavros11, yes, I get:
Let me check the memory and reducing the threads. |
Ok, the problem comes from these limitations: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources Try to run the examples with:
After few minutes the shor test fails with:
|
Given that qibotf was not crashing, maybe there examples importing tensorflow... |
Lets see what happens, in fact it is quite odd that qibotf was taking more than 3h to complete something that qibojit can do in less than 1h. So this sounds like memory limitations... |
With the latest version the tests pass on both my laptop and the qibo machine. When using the |
To me it fails with:
Do you understand why there is this log message about falling back to CPU because the GPU is out-of-memory? Sounds like shor raises a OOM in CPU, maybe during shots, and then this function catches the error. |
That's strange, I never got something similar. Are you sure the tests are running on CPU and not on GPU? By the way, when I run on GPU I get
|
On CPU for sure, I double checked nvidia-smi. I cannot run on GPU with ulimit and 2 threads, the code crashes silently.
This sounds like limited GPU memory, try to reboot or use an empty GPU. Going back to our issue here. I believe that the github action was going to swap memory when using qibotf (that's why 3h), while now with qibojit it crashes directly. If the amount of memory required by shor is really correct, then the only solution is to reduce the number of qubits for the tests. |
Thanks for the update, actually the qibo tests were passing for me on GPU even before the last push. The example tests still fail on GPU due to some cupy -> numpy conversions though.
If that's the case perhaps the simplest to try would be to remove the |
Indeed, I am playing with different configurations here: https://github.com/qiboteam/qibo/actions?query=branch%3Atryfixci If these tests hangs, then we should try to isolate which test is breaking the CI. |
With these last changes, I get 5-7 minutes runtime for all tests (no skips) on 2 different machines, using 2 threads and ulimit. |
@stavros11 just to let you know that I am debugging one by one here: https://github.com/qiboteam/qibo/actions?query=branch%3Apruning |
Looks like the scipy minimize is slowing down dramatically all tests on the CI machines, if I set a tiny maxiter test are passing. |
What is the current status of this? Can I help in any way? I see that some tests are passing fast in the |
Some of them passes, if we reduce the minimization to few iteractions, however it still takes much longer than on my laptop. If that doesn't work then I assume we can disable these tests from CI for the time being. |
@stavros11 believe me or not, but CI seems to work well only with single-thread. Let me polish this PR, then we can merge. |
With qibojit installed the multiprocessing import cannot be changed to "fork" mode on macos. This PR disables the parallel feature for macos. I believe we should schedule a revision of this module in the future, in particular considering better parallel algorithms rather than forcing circuit copies and async evaluation. @stavros11 please let me know your opinion.