-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage blowing up in large DMO runs #118
Comments
(as a side note, reading in the data (4.1TB) takes 4.6 hrs, which is less than 1% of the theoretical read speed of the system, and makes things quite hard to debug) |
Any advice on things to tweak in the config or setup welcome. I have already tried more node, fewer ranks / node and other similar things but the memory seems to always blow up. |
@stuartmcalpine and @bwvdnbro will be interested as well. |
This has been mentioned a couple of times so I decided to study a bit more what could it be. It seems the configuration value |
@MatthieuSchaller I tried opening the core files but I don't have enough permissions to do so; e.g.:
|
Permissions fixed. The read is not done using parallel hdf5 so I don't think the limit applies. I'll try raising that variable by 10x and see whether it helps. |
Unfortunately the core files are truncated, so I couldn't even get a stacktrace out of them. This is somewhat expected: the expected core sizes are in the order of ~200 GB, but SLURM would have SIGKILL'd the processes after waiting for a bit while they were writing their core files. Based on the log messages this memory blowup seems to be happening roughly in the same place where our latest fix for #53 is located -- that is, when densities are being computed for structures that span across MPI ranks. To be clear: I don't think the error happens because of the fix -- and if the logs are to be trusted, the memory spike comes even before that, while particles are being exchanged between ranks in order to perform the calculation. Some of these particle exchanges are based on the While other avenues of investigation make sense too, it could be worth a shot to try this one out and see if by using a different data structure we actually solve the problem (or not). |
Thanks for looking into it. Based on this analysis, anything I should try? Note that I am already trying to not use the SOlists for the same reason of exploding memory footprint(#112) even though we do crash earlier here. |
Unfortunately I'm not sure there's much you can do without code modifications, if the problem is indeed where I think it is. When I fixed the code for #73 I thought that it would be a good idea to modify I'll see if I can come up with something you can test more or less quickly and will post any updates here. |
Thanks. FYI, I tried running an even more trimmed-down version where I request no over-denstiy calculation, no aperture calculation, and no radial profiles and it also ran out of memory. I'll see whether changing Would it make sense to add a maximal distance Also, would it be possible to try the config you guys used on Gadi for your big run? If I am not mistaken it was not far from our run in terms of numbers of particle. Might be a baseline for us to try here. |
@doctorcbpower could you point @MatthieuSchaller to the config he's asking above? Thanks! |
Hi @rtobar and @MatthieuSchaller, sorry, only just saw this. I have been using this for similar particle numbers in smaller boxes, so it should work. |
Thanks! I'll give this a go. Do you remember how much memory was needed for VR to succeed and how many MPI ranks were used? |
Good news: That last configuration worked out of the box on my run. Time to fire up a |
One quick note related to the i/o comment above. This config took 6hr30 to read in the data. Then 1hr30 for the rest. |
It doesn't have a value defined for Having said that, I've never done an exhausting profiling of the reading code. Even with those reading sizes it still sounds like things could be better. If the inputs are compressed there'll be also some overhead associated to that I guess, but I don't know if that's the case. |
Hi @MatthieuSchaller, sorry for delay. That's the config file I use when running VR inline - to be comfortable, I find you basically need to double the memory for the particular runs we do, may not be as severe given the box sizes you are running. Also no issue with reading in particle data, so defining |
Here are bits of the configs that are different and plausibly related to what we see as a problem:
(mine is also used for baryon runs) I am not quite sure what all of these do. Looking at this list, I am getting suspicious about Does any other of these parameter values look suspicious to you? @rtobar the data is compressed and using some hdf5 filters so that will play a role. And indeed, the lower default |
This might help us understand what exactly is causing #118 (if logs are correctly flushed and don't get too scrambled). Signed-off-by: Rodrigo Tobar <[email protected]>
@MatthieuSchaller not a solution, but the latest master now contains some more logging to find out what's going on and how much is expected to travel through MPI, which seems to be the problem. Changing |
Thanks, I'll update my copy of the code to use this extra output. My job checking whether Chris' config but with approximate velocity density calculation is still in the queue. If the code performs better with a precise calculation then it's a double-win for us actually. :) |
Eventually got my next test to run. Taking Chris' config from above and applying the following changes:
then the code crashed in the old way. (Note without the new memory-related outputs) I would think it's the approximate calculation of the local velocity density which is the problem here. To be extra sure, I am now trying it again but with everything set back to Chris' value apart from that local vel disp. parameter. |
And now changing just |
Describe the bug
Memory footprint blows up in (or just after) the calculation of the local fields
This is running on a 5670^3 particle DMO setup in a 3.2Gpc box at z=4.
The z=5 output works fine.
I am running using 480 MPI ranks on 120 nodes. Each node offers 1TB of RAM.
The code advertises at the start needing 16.68TB.
Crash:
(Difficult to know where that comes from as exceptions are not caught anywhere.
Compilation:
g++7.3 with Intel-MPI 2018
Command line:
Config:
vrconfig_3dfof_subhalos_SO_hydro.txt
If it helps, run location on cosma: /cosma8/data/dp004/jlvc76/FLAMINGO/ScienceRuns/DMO/L3200N5760/VR/catalogue_0008
There are core dumps as well.
The text was updated successfully, but these errors were encountered: