Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running VELOCIraptor on large RAMSES cosmo hydro snapshots #119

Open
sorcej opened this issue May 15, 2024 · 26 comments
Open

Running VELOCIraptor on large RAMSES cosmo hydro snapshots #119

sorcej opened this issue May 15, 2024 · 26 comments

Comments

@sorcej
Copy link

sorcej commented May 15, 2024

  • Compilation ok (but might need to adjust a few options like longint for all the particle types - both stars and DM particles)
  • Configfile ok (but might need to adjust for a few parameters. Need to adjust properly code units, especially because of RAMSES peculiar units)
  • Start running ok
  • Segmentation fault after Read header - Try several configurations (with up to 660 nodes, different nb of tasks per node and different nb of threads, try also different -s and -Z options in the command line )
@pelahi
Copy link
Owner

pelahi commented May 15, 2024

Hi @sorcej so I think it is likely the read header isn't reading the header correctly. I will say that I have tried to make it work but I've encountered several different ramses formats with different structure in the header. Since the binary is not self-describing I did not know how to make it work. Could you provide a description of the binary data?
Also, are you trying the development branch? It actually is better than main (I need to change this to be the default).

@sorcej
Copy link
Author

sorcej commented May 15, 2024

Thanks @pelahi ! Ok so I switched to the development branch. I have the same issue.

I do not have a reader in C++ unfortunately but common readers in Python or Fortran typically work on the simulation (For instance, https://github.com/florentrenaud/rdramses/blob/master/rdramses.f90 works after some modifications to use longint). So there should not be anything specific but for the fact that I had to use longint for both DM and star particles

@sorcej
Copy link
Author

sorcej commented May 15, 2024

Here might be some of the problems in the reader:
// Total number of Stars over all processors
Framses.read((char*)&dummy, sizeof(dummy));
Framses.read((char*)&ramses_header_info.nstarTotal, sizeof(int));
Framses.read((char*)&dummy, sizeof(dummy));

-> since I am using longint for nstarTotal (because of more stars than 2^31). However it is probably not the only one... not sure whether it is easy to have an option for longint for both nstartot and nparttot.

@sorcej
Copy link
Author

sorcej commented May 20, 2024

(@pelahi ): Ok , I have fixed and modified a few things, now I am stuck a bit further in MPINumInDomain.
Also I did not get what this is: dmp_mass = 1.0 / (opt.Neffopt.Neffopt.Neff) * (OmegaM - OmegaB) / OmegaM; as it gives something negative so I bypassed it using Particle IDs instead.

@pelahi
Copy link
Owner

pelahi commented May 21, 2024

Hi @sorcej , apologies I'll be a bit slow in replying for the next two days as I have to finishing marking assignments for a high performance computing course. Do you mind creating a draft PR with your proposed changes so I can have a look?

@sorcej
Copy link
Author

sorcej commented May 24, 2024

Thanks @pelahi and sorry for the delay. I cannot do any PR from the supercomputer unfortunately. Anyway, I tried to understand where it was further crashing and I finally managed to pinpoint it. It is in: MPIInitialDomainDecompositionWithMesh(opt). Not sure if it is because I am using too many or not enough cores perhaps? -> ok so found out, I was using too many cores, consequently, n3 did not fit in an unsigned int. With less core, now I am stuck a bit further when broadcasting... I am continuing to explore

@sorcej
Copy link
Author

sorcej commented May 27, 2024

After several days, I decided to simply remove the option opt.impiusemesh to try to go further. I will try later to re-add it...

@sorcej
Copy link
Author

sorcej commented May 29, 2024

@pelahi ok so now I am again stuck a bit further. I again had to bypass dmp_mass = 1.0 / (opt.Neffopt.Neffopt.Neff) * (OmegaM - OmegaB) / OmegaM; as it gives something negative so I had to use Particle IDs instead in mpiramsescio.cxx this time.
It starts running and counting properly particles (note that I did not understand why we had to do it twice as it was already done but ok). Now I have a
malloc(): corrupted top size
[mpiexec@i06r03c04s12] control_cb (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1324): assert (!closed) failed
[mpiexec@i06r03c04s12] HYDI_dmx_poll_wait_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:80): callback returned error status
[mpiexec@i06r03c04s12] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:2045): error waiting for event

...
I am trying to fix that one too.

@pelahi
Copy link
Owner

pelahi commented May 30, 2024

Hi @sorcej , so the dmp_mass calculation was based on reading some ramses data where it was it was easier to quickly calculate the mass for dark matter particles using the matter density - the baryon density and the effective resolution of the simulation. Regarding your error, it could be something to do with ints being used and where values would exceed 2e9 and then give an unsigned number. Could this be the case?

@sorcej
Copy link
Author

sorcej commented May 30, 2024

@pelahi ok thanks, now I get the dmp_mass. I actually had to change manually Neff though. I will try to add it as an option (unless there is an option that I missed). Regarding the malloc. I am not sure yet but I think something went wrong when reading the particle positions thus they cannot be properly assigned to the different tasks. I am trying to fix it.

@sorcej
Copy link
Author

sorcej commented Jun 3, 2024

@pelahi Ok I have fixed the particle positions. I still need to understand something with the IDs but it looks better now. I think I have now yet another problem to solve when writing: [mpiexec@i05r01c01s04] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor)
Any idea where precisely there could be something wrong here? I am trying to pinpoint it but it is not obvious.

I still of course have load balancing issues but for now I leave it as is.

I might also have to fix some units but I will see later.

Thanks again :)

@pelahi
Copy link
Owner

pelahi commented Jun 3, 2024

Hi @sorcej , not certain about the write error. When does this happen? Could you provide the associated velociraptor output?

@sorcej
Copy link
Author

sorcej commented Jun 4, 2024

@pelahi The code writes the files .configuration, .siminfo and .units and then stops but it is still in the middle of doing SearchFullSet, I confirm that it is stuck in pfof=SearchFullSet(opt,Nlocal,Part,ngroup) again... I fixed the long IDs and the long trees but I still have a seg fault and the debug mode is not helpful.

@sorcej
Copy link
Author

sorcej commented Jun 12, 2024

@pelahi ok finally managed to narrow down where it crashes. It is when trying to build a new tree:
tree = new KDTree(Part.data(),nbodies,opt.Bsize,tree->TPHYS,tree->KEPAN,1000,0,0,0,period);

I have yet to find out why it crashes for some of the tasks. Sometimes it gives me a free() invalid pointer and sometimes a double free or corruption (out) but I do not get why and why only for some of the tasks

@pelahi
Copy link
Owner

pelahi commented Jun 14, 2024

Hi @sorcej , would you have a small ramses input example I could try? It would help me debug the issue.

@sorcej
Copy link
Author

sorcej commented Jun 28, 2024

Hi @pelahi, sorry for not getting back to you earlier. I was attending conferences (actually I will be leaving again on Sunday). Anyway, it works with a small example. That said I managed to get outputs when dealing only with DM particles even with the large simulation now. I need though to check these outputs and whether they make sense. Then I will try to deal also with the stars, etc. Thanks

@sorcej
Copy link
Author

sorcej commented Oct 1, 2024

Hi @pelahi , any chance that you still have the config file used for that paper https://arxiv.org/pdf/1806.11417 ? I am interested in finding galaxies and their properties. I tried using and adapting the config file sample_galaxycatalog_run.cfg in the example folder but with no success yet. Thanks

@pelahi
Copy link
Owner

pelahi commented Oct 7, 2024

Hi @sorcej , apologies for the late reply so the best config is https://github.com/pelahi/VELOCIraptor-STF/blob/development/examples/sample_galaxycatalog_run.cfg as you noted but you might want to try keeping the 3DFOF envelop if looking at replicating Rodrigo's paper. Now can you provide some information as to what issues you are encountering? Is it a processing issue? or is it that you are not getting the intracluster light you were expecting?

@sorcej
Copy link
Author

sorcej commented Nov 7, 2024

Thanks a lot @pelahi for your answer and sorry for the delay in getting back to you. I had to put aside this analysis for a while. By keeping the 3DFOF envelop, you mean Keep_FOF=1 correct? About the issues, I had several. At first, it was not even running but it ended up being apparently a problem with the machine... Now it seems to be fixed. Currently, I am a bit perplex regarding the catalogs I get. If only star particles are used, I do not really understand how to read the output file "properties" that seems to have information similar to what I will get with dark matter particles. But perhaps I did not activate the proper output? Thanks for your help.

@pelahi
Copy link
Owner

pelahi commented Nov 11, 2024

I @sorcej , so the properties autocalculated were very focused on typical halo + galaxy properties people often require. If you have an idea of the properties you'd like to calculate I could quickly make a branch that might calculate the output you need. Otherwise, you can you the python tools to load the properties of the particles files in in python (it's a simple hdf5 output) and then read the input data and get the particles per group and calculate the desired properties. There's an example of this in the tools directory.

@sorcej
Copy link
Author

sorcej commented Nov 12, 2024

@pelahi , thanks a lot for your answer. In the outputs though I am missing the galaxy properties you mentioned. I must have done something wrong there. I only have typical halo properties but on star particles... This is why I find this a bit obscure to me. I will have a look in the meantime to the tools you mentioned. Thanks

@sorcej
Copy link
Author

sorcej commented Nov 25, 2024

Hi @pelahi, sorry I am still super confused:

velociraptor_00218.catalog_particles.unbound.0
Total_num_of_particles_in_all_groups: 239447203

velociraptor_00218.catalog_particles.0
Total_num_of_particles_in_all_groups: 2716658054

Why is the sum of the two not equal to the total number of the DM particles in the simulation?
Another question why is Sum(Group_Size) not equal to the Total_num_of_particles_in_all_groups (i.e. 2716658054)?

Thanks for your highlights. I am clearly struggling to understand the outputs.

@pelahi
Copy link
Owner

pelahi commented Nov 29, 2024

Hi @sorcej so the *.catalog_particles *.catalog_particles.unbound arose from the format of listing particles used by subfind which listed "bound" and "unbound" particles in separate files. Now this distinction really only makes physical sense when extracting data from the entire simulation (all particle types). In your case with just stars, I would just combine the lists and not treat the distinction as anything real.

@sorcej
Copy link
Author

sorcej commented Nov 29, 2024

Hi @pelahi, sorry, I was unclear. I was first looking at the results using dark matter particles trying to understand the outputs. I understand your point using star particles though. Could you please point me to some explanations on how to understand these outputs so that I can read the particles (either DM or stars) belonging to a given object (either halo or galaxy) to derive properties that are not in the velociraptor_00218.properties.0 files (for example for the star particles I do not see the age and metallicity of the star particles belonging to a given galaxy)? Thank you so much

@pelahi
Copy link
Owner

pelahi commented Dec 3, 2024

Sure @sorcej , it's here https://velociraptor-stf.readthedocs.io/en/latest/output.html. Note to have extra star properties read and then used, you will likely need to adjust the ramses io to ensure that these are stored and also you need to compile the code with -DVR_USE_HYDRO=ON which will then have particles store extra specific set of properties, like metallicity or age and calculate properties based on this.

You can event store extra custom properties with specific names (see for example the config https://github.com/pelahi/VELOCIraptor-STF/blob/development/examples/sample_swifthydro_3dfof_subhalo_extra_properties.cfg) specifically entries like
Star_internal_property_names=BirthDensity,InitialMass,

However, both will likely need an update to the ramses interface and substructure_properties files to calculate what you desire.

Can you be precise on exactly what you want to calculate and then we can see if updates are actually needed?

@sorcej
Copy link
Author

sorcej commented Dec 19, 2024

Thanks @pelahi , weirdly enough I think what I need would be with -DVR_USE_STAR=ON but when I compile with it, STARON is still not defined... a minima what I would need is not only to have the IDs of the particles that belong to the galaxies but also their pos, vel, mass, age and metallicity so that I can derive galaxy properties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants