Replies: 3 comments
-
Hi Miguel, are you observing this 1e-9 discrepancy with double precision data or with single precision? Cheers, |
Beta Was this translation helpful? Give feedback.
-
Hi @mreineck, I had the bigger arrays in single precision for memory saving. I put everything in double precision and I still get the same problem. I still get about 0.03% of the pixels with a difference bigger than 1e-9 between output and input maps. Furthermore, this is the output of np.testing.assert_almost_equal(input_maps, output_maps, decimal=9, verbose=True):
So, in (at least) one of the pixels, this is as bad as 1e-4. |
Beta Was this translation helpful? Give feedback.
-
Hi @mj-gomes , good catch! Yes, instruction reordering might be the culprit here. May you try to use Otherwise, we could investigate the machine code produced by Numba using |
Beta Was this translation helpful? Give feedback.
-
Hello everyone.
I came across an interesting problem with numba parallelization and its influence in the precision of the computations, at least for the TOD and Mapmaking operations in hwp_sys.py.
Context: I was running some tests for the hwp_sys.py code and the tests kept failing, even though I was just testing for the ideal case, and I knew (from looking at the maps) that it should work. The output maps should be equal to the input maps. The problem was that a percentage of the (output maps - input maps) pixels was not inside the 1e-9 tolerance that I had put to account for eventual approximations. This percentage became smaller when increasing the nside.
Origin: I found out that this comes from @njit(parallel=True) decorator, specifically the "parallel=True" part. Because addition and multiplication of floating numbers are not associative operations, and apparently numba in the parallel mode changes the order of the operations, there are roundings which accumulate and end up being fairly important. At least this is the best explanation I could find. And indeed, if parallel=False, I have 100% equality up to 1e-9 tolerance.
For nside 256, for example, using 2 detectors in a 1y simulation, I had about 0.03% of the pixels above the 1e-9 tolerance. This is a very small percentage of the pixels, but it should increase with the number of detectors. For now I wrote a test method that ignores this, but I was talking with @hivon about it and it would be good to know if anyone as already ran into this problem and has a good solution. Doing parallel=False is, in my opinion, not an option, because this parallelization increases the speed by a significant amount (I can run some tests to obtain specific values for this speedup if needed).
I don't know much about numba, and from what I have seen, there is no straightforward way to force it to keep the order of operations. Should we restructure the methods to force the order of operations? Or else, should we consider using other options, like parallelizing these methods with mpi instead of numba? I suppose however that numba is much faster that mpi for this numpy operations in loops since I understand it is optimized for that.
Thank you in advance for the discussion,
Miguel
Beta Was this translation helpful? Give feedback.
All reactions