disabling parallel evaluation on darwin #438

scarrazza · 2021-07-06T09:53:15Z

With qibojit installed the multiprocessing import cannot be changed to "fork" mode on macos. This PR disables the parallel feature for macos. I believe we should schedule a revision of this module in the future, in particular considering better parallel algorithms rather than forcing circuit copies and async evaluation. @stavros11 please let me know your opinion.

stavros11

Thanks for the update, I cannot test on mac but code looks okay to me. If I remember correctly though parallel evaluation was not working with qibojit also on linux, that's why the related tests are disabled. What exactly is the new issue? Installing qibojit on mac blocks the parallel evaluation for all backends? For qibotf it used to work on mac, right?

Regarding revising the implementation, perhaps we should merge this and open an issue for now. Is there an alternative approach based on Python?

scarrazza · 2021-07-06T12:28:22Z

If I remember correctly though parallel evaluation was not working with qibojit also on linux, that's why the related tests are disabled.

ok, I believe this PR already disables if get_backend() == "qibojit", so we can skip the tests or catch the runtime error.

What exactly is the new issue?

https://github.com/qiboteam/qibo/runs/2996557250?check_suite_focus=true#step:8:56
Presumably numba (or some other package from qibojit) already loads multiprocessing, so we can't change the start method from spawn to fork (our previous default only for macos).

Installing qibojit on mac blocks the parallel evaluation for all backends? For qibotf it used to work on mac, right?

I think so, in particular if we import qibojit, before loading qibotf.

Regarding revising the implementation, perhaps we should merge this and open an issue for now. Is there an alternative approach based on Python?

I agree, I don't think the parallel implementation we have is robust, we can keep as experimental for the time being, and the propose some better approach.

stavros11 · 2021-07-06T12:51:44Z

ok, I believe this PR already disables if get_backend() == "qibojit", so we can skip the tests or catch the runtime error.

Currently the tests are skipped. We could change to catching the error, however I am not sure if this is very useful.

I think so, in particular if we import qibojit, before loading qibotf.

Oh, I see now. I am not sure if there is a away to avoid this while keeping qibojit as the default backend. So we either follow this PR and disable parallel completely for mac or we make a different backend the default.

scarrazza · 2021-07-06T13:11:14Z

Yeah, lets keep disable and in future revisit the implementation ref. #440.

scarrazza · 2021-07-06T15:02:58Z

@stavros11 seems like examples are taking much longer to run, so we hit the max time of 6h.
Maybe this is something related to the number of threads? Did you observe this before?

stavros11 · 2021-07-06T15:17:59Z

@stavros11 seems like examples are taking much longer to run, so we hit the max time of 6h.
Maybe this is something related to the number of threads? Did you observe this before?

Usually example tests take long but do not hit the max time. I am running them locally now and I actually get some failures so I am guessing examples don't work well with the qibojit backend which is the new default.

stavros11 · 2021-07-06T16:41:15Z

The failing tests were related to numpy conversion using Tensorflow's .numpy(), which I fixed by changing to our backend generic K.to_numpy. There is still an issue with some tests that hang which may be the reason the CI timed out. I skip these tests temproarily so let's see if everything works now. The tests that hang are test_adiabatic3sat and test_ef_qae.

codecov · 2021-07-06T16:52:08Z

Codecov Report

❗ No coverage uploaded for pull request base (testqibojit@5712247). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             testqibojit      #438   +/-   ##
===============================================
  Coverage               ?   100.00%           
===============================================
  Files                  ?        83           
  Lines                  ?     11865           
  Branches               ?         0           
===============================================
  Hits                   ?     11865           
  Misses                 ?         0           
  Partials               ?         0

Flag	Coverage Δ
unittests	`100.00% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5712247...f73b07b. Read the comment docs.

stavros11 · 2021-07-07T08:09:50Z

I re-run the example tests using this branch (with the two problematic tests skipped) and they pass within a few minutes both in my laptop and qibo machine. @scarrazza could you please confirm that this works for you too? I am not sure why the CI times out.

scarrazza · 2021-07-07T08:11:44Z

Thanks for checking, I doing that right now.

scarrazza · 2021-07-07T08:24:05Z

@stavros11, yes, I get:

94 passed, 12 skipped, 28 warnings in 703.09s (0:11:43)

Let me check the memory and reducing the threads.

scarrazza · 2021-07-07T08:35:15Z

Ok, the problem comes from these limitations: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

Try to run the examples with:

ulimit -v 7000000
OMP_NUM_THREADS=2 pytest -s examples/

After few minutes the shor test fails with:

 RuntimeError: State does not fit in /CPU:0 memory.Please switch the execution device to a different one using ``qibo.set_device``.

scarrazza · 2021-07-07T09:00:00Z

Given that qibotf was not crashing, maybe there examples importing tensorflow...

scarrazza · 2021-07-07T11:52:14Z

Lets see what happens, in fact it is quite odd that qibotf was taking more than 3h to complete something that qibojit can do in less than 1h. So this sounds like memory limitations...

stavros11 · 2021-07-07T12:18:29Z

With the latest version the tests pass on both my laptop and the qibo machine. When using the ulimit and OMP_NUM_THREADS commands the tests still pass on my laptop but give various errors instantly on qibo machine. So I'm not sure if CI will pass but it is likely that no.

scarrazza · 2021-07-07T13:17:07Z

To me it fails with:

------------------------------------------------------------

Using Shor's algorithm to factorize N = 21 with a = 16.

  - Performing algorithm using a fully quantum iQFT.

  - Total number of qubits used: 22.

[Qibo|WARNING|2021-07-07 15:12:54]: Falling back to CPU because the GPU is out-of-memory.

INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/home/carrazza/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 269, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>   File "/home/carrazza/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 323, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
...

Do you understand why there is this log message about falling back to CPU because the GPU is out-of-memory? Sounds like shor raises a OOM in CPU, maybe during shots, and then this function catches the error.

stavros11 · 2021-07-07T14:12:30Z

Do you understand why there is this log message about falling back to CPU because the GPU is out-of-memory? Sounds like shor raises a OOM in CPU, maybe during shots, and then this function catches the error.

That's strange, I never got something similar. Are you sure the tests are running on CPU and not on GPU?

By the way, when I run on GPU I get

cupy_backends.cuda.libs.cusolver.CUSOLVERError: CUSOLVER_STATUS_INTERNAL_ERROR

scarrazza · 2021-07-07T14:43:45Z

Are you sure the tests are running on CPU and not on GPU?

On CPU for sure, I double checked nvidia-smi. I cannot run on GPU with ulimit and 2 threads, the code crashes silently.
If I run on GPU without ulimit, everything passes except this one:

_______________________________________________ test_entropy_large_circuit[qibojit-None] ________________________________________________

backend = None, accelerators = None

    def test_entropy_large_circuit(backend, accelerators):
        """Check that entropy calculation works for variational like circuit."""
        thetas = np.pi * np.random.random((3, 8))
        target_entropy = callbacks.EntanglementEntropy([0, 2, 4, 5])
        c1 = Circuit(8)
        c1.add((gates.RY(i, thetas[0, i]) for i in range(8)))
        c1.add((gates.CZ(i, i + 1) for i in range(0, 7, 2)))
        state1 = c1()
        e1 = target_entropy(state1)
    
        c2 = Circuit(8)
        c2.add((gates.RY(i, thetas[1, i]) for i in range(8)))
        c2.add((gates.CZ(i, i + 1) for i in range(1, 7, 2)))
        c2.add(gates.CZ(0, 7))
        state2 = (c1 + c2)()
        e2 = target_entropy(state2)
    
        c3 = Circuit(8)
        c3.add((gates.RY(i, thetas[2, i]) for i in range(8)))
        c3.add((gates.CZ(i, i + 1) for i in range(0, 7, 2)))
        state3 = (c1 + c2 + c3)()
        e3 = target_entropy(state3)
    
        entropy = callbacks.EntanglementEntropy([0, 2, 4, 5])
        c = Circuit(8, accelerators)
        c.add(gates.CallbackGate(entropy))
        c.add((gates.RY(i, thetas[0, i]) for i in range(8)))
        c.add((gates.CZ(i, i + 1) for i in range(0, 7, 2)))
        c.add(gates.CallbackGate(entropy))
        c.add((gates.RY(i, thetas[1, i]) for i in range(8)))
        c.add((gates.CZ(i, i + 1) for i in range(1, 7, 2)))
        c.add(gates.CZ(0, 7))
        c.add(gates.CallbackGate(entropy))
        c.add((gates.RY(i, thetas[2, i]) for i in range(8)))
        c.add((gates.CZ(i, i + 1) for i in range(0, 7, 2)))
        c.add(gates.CallbackGate(entropy))
        state = c()
    
        K.assert_allclose(state3, state)
>       K.assert_allclose(entropy[:], K.cast([0, e1, e2, e3]))

src/qibo/tests/test_core_callbacks.py:238: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/qibo/backends/numpy.py:460: in cast
    return self.op.cast(x, dtype=dtype)
../qibojit/src/qibojit/custom_operators/__init__.py:47: in cast
    return backend.cast(x, dtype)
../qibojit/src/qibojit/custom_operators/backends.py:170: in cast
    return self.cp.asarray(x, dtype=dtype)
../../../anaconda3/lib/python3.8/site-packages/cupy/_creation/from_data.py:66: in asarray
    return _core.array(a, dtype, False, order)
cupy/_core/core.pyx:2120: in cupy._core.core.array
    ???
cupy/_core/core.pyx:2199: in cupy._core.core.array
    ???
cupy/_core/core.pyx:2267: in cupy._core.core._send_object_to_gpu
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.

cupy_backends.cuda.libs.cusolver.CUSOLVERError: CUSOLVER_STATUS_INTERNAL_ERROR

This sounds like limited GPU memory, try to reboot or use an empty GPU.

Going back to our issue here. I believe that the github action was going to swap memory when using qibotf (that's why 3h), while now with qibojit it crashes directly. If the amount of memory required by shor is really correct, then the only solution is to reduce the number of qubits for the tests.

stavros11 · 2021-07-07T20:16:40Z

On CPU for sure, I double checked nvidia-smi. I cannot run on GPU with ulimit and 2 threads, the code crashes silently.
If I run on GPU without ulimit, everything passes except this one:

Thanks for the update, actually the qibo tests were passing for me on GPU even before the last push. The example tests still fail on GPU due to some cupy -> numpy conversions though.

Going back to our issue here. I believe that the github action was going to swap memory when using qibotf (that's why 3h), while now with qibojit it crashes directly. If the amount of memory required by shor is really correct, then the only solution is to reduce the number of qubits for the tests.

If that's the case perhaps the simplest to try would be to remove the N=21 value from the Shor test and leave N=15. I am not sure if it would work but it would definitely make the test lighter.

scarrazza · 2021-07-07T20:22:37Z

If that's the case perhaps the simplest to try would be to remove the N=21 value from the Shor test and leave N=15. I am not sure if it would work but it would definitely make the test lighter.

Indeed, I am playing with different configurations here: https://github.com/qiboteam/qibo/actions?query=branch%3Atryfixci
I have fixed some extra problems including the adiabatic3sat, however as you can see, jobs still take a long time, while on my laptop if we skip the EF_QEA these should take max 6 minutes with ulimit and 2 cores.

If these tests hangs, then we should try to isolate which test is breaking the CI.

scarrazza · 2021-07-07T20:56:51Z

With these last changes, I get 5-7 minutes runtime for all tests (no skips) on 2 different machines, using 2 threads and ulimit.

scarrazza · 2021-07-08T08:09:50Z

@stavros11 just to let you know that I am debugging one by one here: https://github.com/qiboteam/qibo/actions?query=branch%3Apruning

scarrazza · 2021-07-08T09:52:57Z

Looks like the scipy minimize is slowing down dramatically all tests on the CI machines, if I set a tiny maxiter test are passing.

stavros11 · 2021-07-08T12:40:18Z

What is the current status of this? Can I help in any way? I see that some tests are passing fast in the prunning branch. Also, example tests are still failing on GPU for me with the current version of this PR.

scarrazza · 2021-07-08T12:44:56Z

Some of them passes, if we reduce the minimization to few iteractions, however it still takes much longer than on my laptop.
I will try to replace the exec call with something else (e.g. this), I have the suspicious that pytest is accumulating data.

If that doesn't work then I assume we can disable these tests from CI for the time being.

scarrazza · 2021-07-08T16:00:57Z

@stavros11 believe me or not, but CI seems to work well only with single-thread. Let me polish this PR, then we can merge.

scarrazza added 5 commits July 6, 2021 09:53

excluding parallel evaluation for qibojit

c8d6fb3

disabling parallel on darwin

5467c4e

fixing missing import

3e84854

fix typo

01dee8c

updating parallel tests

7861b84

scarrazza requested a review from stavros11 July 6, 2021 09:53

scarrazza mentioned this pull request Jul 6, 2021

adding qibojit to CI #437

Merged

stavros11 approved these changes Jul 6, 2021

View reviewed changes

stavros11 added 3 commits July 6, 2021 19:55

Fix examples

517bfaf

Fix efqae

4277525

Skip hanging tests

9774437

stavros11 added 3 commits July 7, 2021 15:38

Remove all Tensorflow imports

6737296

Remove tf use

1afe8c9

Fix reuploading example

ef54c2b

scarrazza added 5 commits July 7, 2021 19:16

fixing entropy_large_circuit test on GPU

7eee9b8

removing large N for shor example

6db9413

skipping shor

37a00ec

moving to qibojit, unlocking shor

94390c3

fixing to numpy conversion, running all tests

8940731

reducing EF_QAE iterations

f4a3243

trying single thread

e6a1f45

scarrazza added 4 commits July 8, 2021 21:32

cleanup

6f43204

fixing examples tests on gpu

9d7a949

adding logging info when changing backend / showing default device

56abc57

reducing unary footprint

f73b07b

scarrazza merged commit 1d72d91 into testqibojit Jul 8, 2021

scarrazza deleted the fixparalleljit branch July 10, 2021 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disabling parallel evaluation on darwin #438

disabling parallel evaluation on darwin #438

scarrazza commented Jul 6, 2021

stavros11 left a comment

scarrazza commented Jul 6, 2021

stavros11 commented Jul 6, 2021

scarrazza commented Jul 6, 2021

scarrazza commented Jul 6, 2021

stavros11 commented Jul 6, 2021

stavros11 commented Jul 6, 2021

codecov bot commented Jul 6, 2021 •

edited

Loading

stavros11 commented Jul 7, 2021

scarrazza commented Jul 7, 2021

scarrazza commented Jul 7, 2021

scarrazza commented Jul 7, 2021

scarrazza commented Jul 7, 2021

scarrazza commented Jul 7, 2021

stavros11 commented Jul 7, 2021

scarrazza commented Jul 7, 2021

stavros11 commented Jul 7, 2021 •

edited

Loading

scarrazza commented Jul 7, 2021

stavros11 commented Jul 7, 2021

scarrazza commented Jul 7, 2021 •

edited

Loading

scarrazza commented Jul 7, 2021

scarrazza commented Jul 8, 2021

scarrazza commented Jul 8, 2021

stavros11 commented Jul 8, 2021 •

edited

Loading

scarrazza commented Jul 8, 2021 •

edited

Loading

scarrazza commented Jul 8, 2021

disabling parallel evaluation on darwin #438

disabling parallel evaluation on darwin #438

Conversation

scarrazza commented Jul 6, 2021

stavros11 left a comment

Choose a reason for hiding this comment

scarrazza commented Jul 6, 2021

stavros11 commented Jul 6, 2021

scarrazza commented Jul 6, 2021

scarrazza commented Jul 6, 2021

stavros11 commented Jul 6, 2021

stavros11 commented Jul 6, 2021

codecov bot commented Jul 6, 2021 • edited Loading

Codecov Report

stavros11 commented Jul 7, 2021

scarrazza commented Jul 7, 2021

scarrazza commented Jul 7, 2021

scarrazza commented Jul 7, 2021

scarrazza commented Jul 7, 2021

scarrazza commented Jul 7, 2021

stavros11 commented Jul 7, 2021

scarrazza commented Jul 7, 2021

stavros11 commented Jul 7, 2021 • edited Loading

scarrazza commented Jul 7, 2021

stavros11 commented Jul 7, 2021

scarrazza commented Jul 7, 2021 • edited Loading

scarrazza commented Jul 7, 2021

scarrazza commented Jul 8, 2021

scarrazza commented Jul 8, 2021

stavros11 commented Jul 8, 2021 • edited Loading

scarrazza commented Jul 8, 2021 • edited Loading

scarrazza commented Jul 8, 2021

codecov bot commented Jul 6, 2021 •

edited

Loading

stavros11 commented Jul 7, 2021 •

edited

Loading

scarrazza commented Jul 7, 2021 •

edited

Loading

stavros11 commented Jul 8, 2021 •

edited

Loading

scarrazza commented Jul 8, 2021 •

edited

Loading