-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIO needs a way to change the ncid of a file... #1457
Comments
@WardF I have this working but am still testing. I will put up a PR tomorrow. I hope this can make it to the 4.7.1 release. It is required for PIO integration. |
@DennisHeimbigner I'm going through all of GitHub PR's and such now, any input/thoughts on this? |
I think there a lot of consequences to this change. I still do not understand |
I am trying to figure out the consequences for this. |
Would it be possible for pio to use its own ncid equivalents and |
I believe this is something that can be in place for PIO, with minimum code impact and no affect on non-PIO users. Why is it needed? Imagine I have 10K processors. I assign 1K to I/O, 1K to an ocean model, and 8K to an atmospheric model. Recall that with PIO, the I/O processors do all I/O for both the ocean model and the atmospheric model. The ocean model creates/opens a file. The computation processors assign an ncid (file ID = 1), and so do the I/O processors (file ID = 1); the IDs are the same since this is the first file. Then the atmospheric model opens a file. It assigns an ncid (file ID = 1). It has no knowledge of the ocean model and what it has done. The I/O processors assign ID = 2 (since ID 1 is already in use). The I/O processors know that the ocean model has opened a file. Hilarious confusion ensues. Now here's how it works after ncids can be reassigned. The ocean model opens a file. The I/O processors open it (file ID = 1) and they assign an ncid to the ocean model code (still 1). Now the atmospheric model opens a file. The I/O processors open it (file ID = 2) and they assign an ncid to the atmospheric model (ID = 2). So this is why PIO needs to assign ncids. The ncids have to be generated in the I/O processors, and assigned in the computational processors. Make sense? As I say, this will have no effect on normal netcdf operation. Just two extra functions which move the NC. |
I might add that these two functions are all that remain of #555 The rest of #555 has been accomplished using the user-defined format feature. But PIO needs the ability to reassign ncid. This makes a lot more sense to put in the netcdf library than the PIO library, because it would require that PIO be aware of lots of netCDF internal structs. |
In your example, it appears that you are assuming that each processor |
Each processor has its own global state. For processors in the same computational unit, that state will be identical. So if all 1000 processors are in the same computational unit, and call nc_create, they will all start with the same starting ncid, and increment it by one, so they can all assign ncids and they are all the same. But consider the case of multi-level parallelism. Now 300 processors are running an ocean model, 600 are running an atmospheric model, and 100 are doing I/O. The ocean model creates a file and gets the first ncid. It tells the I/O processors to create the file, and the I/O processors also use the first ID. So far, so good. Now the atmospheric model creates a file. It uses the first ncid, but the I/O processors are already using that. The first ncid is in use, so the second ncid is used. (Don't forget that the I/O processors are the only ones really opening a file when nc_create() is called with PIO.) So now the I/O cores have to reassign the ncid to the atmospheric model. Once they do, everything works fine. There are no serious consequences to adding this function. If it is not called, it does not do anything. As with any of the functions in nc4internal.c, if called incorrectly, damage may result. The same would be true if the user deleted a var from a var list by calling nc4_var_list_del() inappropriately. I am not proposing that this become part of the public API. It is an internal function that PIO uses. So it is hard to see the objection here. Passing the NC struct would of course also work. The point of this function is to hide knowledge of the netcdf internals from PIO. If you decline to merge this code, then I will simply have to copy a bunch of netcdf internals into the PIO library, causing even greater exposure, which is not what any of us want. With this change, the PIO library does not need to know about the NC struct, and netcdf-c does not need to know about any PIO internals. As I mentioned during various meetings about PIO, some changes to the netcdf-c library are required to support them. This is that change. The point is to bring these great HPC features to the users. Adding this one function is a remarkably small change to accomplish so much. These features will be very exciting to HPC users. Being able to scale to thousands of cores is a transformatioal improvement to netCDF. Being able to re-use existing netCDF code is a huge win for the science community, and netCDF. |
To make things a little more theoretical, the issue here is really that ncid is assigned before the dispatch layer is called. So the dispatch layer does not control the assignment of ncid. Another way to approach this would be to have the dispatch layer code assign ncids, but then we have the problem of coordinating between different dispatch layers. And the code to assign ncid would be repeated everywhere, but the need to control ncid is probably restricted to the odd but important case of multi-level parallelism. So in the case of multi-level parallelism, I determined that being able to move the ncid had the minimal impact on netcdf-c code. |
Short answer, yes. There are multiple copies of the netcdf-c library One question that comes to mind is that within a processor, it appears |
@DennisHeimbigner not sure what you are answering "yes" to. There are multiple copies of the netcdf global state. They are not shared. Multi-processing means a bunch of processors executing the same code with independent memory. So if 300 processors are involved in a computational unit, each of them have their own state for netCDF. But since they are all executing the same code, that state changes in lockstep. Do not confuse this with threads. There is no connection. If netcdf-c is made threadsafe, it will be threadsafe on each of the 300 processors. Changing the ncid would obviously be one of the changes in global state that have to be protected to achieve thread safety, just as any operation on the nc4internal lists would be if they change the state of the list. So in nc4internal.c there would be some code around all changes to the lists that is protected with a semaphore. And the ncid changing code too, would be protected with that semaphore. It is not "an" I/O processor, it is 100 I/O processors, all running the same code. So there is no possibility of race conditions. They do not share the same internal state, but they execute exactly the same calls and MPI makes each processor to wait for the others to catch up. When a file is created it is created on all 100, and none of them continue processing until all have created the file and agree to start using it (PIO enforces that). I know this is confusing. Perhaps I should drop by Unidata and meet in person to explain it. |
From a concurrency point of view, I think the correct solution However, I suspect that this proper approach is not doable |
In non-async mode, all processors do the same exact things to their global state, so it is always in lockstep. In async situations (i.e. multi-level parallelism) then the I/O processors are in charge of the file list, and they own the nicds. They are the only ones that really open files on the file system. But the computational units need to know the ncid's that have been assigned by the I/O units. Since they are not in lockstep, they don't know. And that is the point of the function I have introduced. It allows the I/O processors to tell the computational processors what the "real" ncid is. Since they are operating independently of the computational processors, there is no shared memory between them. If we want the computational processors to know the "real" ncid, we must have a function to call to inform them of it. Also, it should be clear that this happens during the create call. The user only ever sees one value for the ncid, and it is the same everywhere. The ncid is never attempted to be changed other than during the create call. I believe this PR is the lowest impact way to solve multi-level parallelism with netcdf. It will have no impact on other users. And if you do make the library thread-safe, this code must be part of that, so it would be best if it were in the netcdf-c library, not the PIO library. |
I think I understand your point The fact that some subset of processors |
I agree. What I did not realize is that PIO is in effect introducing |
No problem. I'm happy that others are safe-guarding my baby too. ;-) |
This had been done. I will close this issue. |
@DennisHeimbigner I have a little itty-bitty problem...
PIO/netCDF integration is working great, but it turns out that PIO needs to reassign ncids after the NC struct has been created and added to the list.
Recall that with PIO, different computational units (each N processors) can create/open netCDF files, all through a single set of dedicated I/O processors. That means that the I/O processors must be in charge of ncids, so that there is no conflicts.
The way it works now is from dfile.c NC_open() function:
So the NC is created and added to the NC list before the PIO open is called.
In the PIO open, I need to determine the correct ncid, and reassign the file to that new ncid.
To support this I am adding two functions to netcdf-c. One of them changes the ncid info in the libsrc4 metadata:
I am having some trouble putting together the other function I need, to move an item in the NC list.
Also I wonder what effect this will have on the selection of the next available ncid. I think it will work fine actually.
What I would like to do as a first step is to document and write tests for the code in nclistmgr.c relating to the ncid stuff. That way I will have a better understanding of what is going on.
The text was updated successfully, but these errors were encountered: