Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync reopen_MOM_file existence inquiry #328

Merged
merged 1 commit into from
Feb 28, 2023

Conversation

marshallward
Copy link
Member

reopen_MOM_file includes an inquire test to determine whether we are attempting to reopen an existing file. If missing, it will attempt to instead create the file.

The switch from FMS to netCDF I/O exposed a race condition here, where one rank may create the file, and a another rank may incorrectly identify the file as existing. This resulted in a segmentation fault.

Unsure why this was not detected before, it could be that FMS was more resilient to the possibility of missing files.

Regardless, it makes sense to use a barrier after the inquire call to ensure that each rank is reporting the correct state. This patch does this with a sync_PEs call.

@codecov
Copy link

codecov bot commented Feb 24, 2023

Codecov Report

Merging #328 (398892b) into dev/gfdl (37b030a) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

❗ Current head 398892b differs from pull request most recent head e375aa7. Consider uploading reports for the commit e375aa7 to get more accurate results

@@             Coverage Diff              @@
##           dev/gfdl     #328      +/-   ##
============================================
- Coverage     37.16%   37.16%   -0.01%     
============================================
  Files           265      265              
  Lines         74505    74508       +3     
  Branches      13837    13839       +2     
============================================
+ Hits          27690    27691       +1     
- Misses        41720    41723       +3     
+ Partials       5095     5094       -1     
Impacted Files Coverage Δ
src/framework/MOM_io.F90 30.54% <0.00%> (-0.09%) ⬇️
src/framework/MOM_document.F90 73.36% <0.00%> (+0.21%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marshallward, I see how this new sync_PEs() call is helpful or necessary if all of the PEs are calling reopen_MOM_file, but wouldn't this create an unbalanced barrier (and cause the model to hang) if only a single PE were doing the writing and hence calling reopen_MOM_file? I would anticipate that only having one PE call this routine would be perfectly sensible, if for example, only globally integrated data were being output. Perhaps this sync_PEs() call is only necessary if thread /= SINGLE_FILE?

Perhaps we should change the logic in the routines that call reopen_MOM_file so that only the root PE (or whichever PE actually does the writing to a single file) is making the calls to open files, etc. This would avoid the need for an extra barrier, and it could be more efficient by reducing resource contention.

Alternately, would it make sense to add an argument to reopen_MOM_file() and create_MOM_file() to indicate whether this is a single-PE call, and skip the sync if it is?

@marshallward
Copy link
Member Author

marshallward commented Feb 25, 2023

AFAIK, these functions (reopen_MOM_file, create_MOM_file) are currently always called by all ranks. That seems to be the expected usage, which means that the inquire() calls are inevitably going to produce the incorrect result if one rank reaches open() before another reaches inquire().

When there is only a single file written by a single PE, such as ocean.stats.nc, rank management is handled by the underlying IO layer. There shouldn't be any redundant IO calls, although some ranks may be spinning their wheels while others are doing IO.

We could make these functions more robust so that they could reside within blocks called by a single PE or an arbitrary subset of PEs (which is what I believe you are suggesting), but in that case I think we'd have to abandon the whole "open if present, create if absent" feature of this function, which yields some of the problems that you're raising.

One alternative is to take the FMS approach and make the underlying netCDF layer more fault-tolerant to those scenarios. But that won't change the fact that the value of exists will be wrong, and we'd be juggling concurrent and contradictory open() calls.

We could, as you suggest, use a flag to control this, but I'd think it just complicates and masks over the underlying problem.

Regarding your question:

wouldn't this create an unbalanced barrier (and cause the model to hang) if only a single PE were doing the writing and hence calling reopen_MOM_file?

I think if this were the case, then it would have been detected by the regression suite, right? (Perhaps also worth noting that there is literally only one instance of this function.)

@Hallberg-NOAA
Copy link
Member

I have tested whether the code would work if all of the (single file) netcdf IO calls from MOM_sum_output.F90 (the only place where reopen_file() is called) only occurred from the root_PE. It would work, and it would avoid the issue that this PR is intended to address.

Usually we seek the most parsimonious solution to a problem. In this case I think that which is most parsimonious is open for debate. Do we go for the fewest added or modified lines of code (the proposed change in this PR wins the code-golf prize), or do we try to minimize the number of stooges we are trying to send through a narrow door?

@marshallward
Copy link
Member Author

Are you suggesting that it be used like this?

if (is_root_pe()) then
    call reopen_MOM_file(...)
endif

I suppose that would work, but it doesn't make reopen_MOM_file any safer; if some hypothetical stooge calls it without is_root_pe() guardrails, then the exists flag is one again ambiguous and problematic.

In my opinion, the real problem is the attempt to do an inquire() test when the calling state is ambiguous. The function needs to be reduced in scope to avoid these kind of problems. But that is a larger task that I'd rather tackle after the new IO layers have settled.

@marshallward
Copy link
Member Author

I edited the function to do something closer to your suggestion. It can now be called by either the root PE or a larger PE set containing the root.

It now explicitly does not support threaded IO. I would suggest we leave it for now, and come back to this problem if the need arises.

reopen_MOM_file includes an inquire test to determine whether we are
attempting to reopen an existing file.  If missing, it will attempt to
create the missing file.

The switch from FMS to netCDF I/O exposed a race condition here, where
one rank may create the file, and another delayed rank may incorrectly
identify this new file as already existing.  This resulted in a
segmentation fault.

Unsure why this was not detected before; it could be that FMS was more
resilient to the possibility of missing files.  Regardless, the `exists`
value was volatile and could lead to potential error.

This patch introduces a temporary fix to the issue by checking the root
PE and threading value.  When threading is single-file, only the root PE
participates in the existence test and file creation.  This accounts for
the case where either the root PE or any larger subset containing the
root PE calls the function.  It does not account for the more exotic
case where a non-root PE many wish to create a file.

If threading is set to MULTIPLE (i.e. IO domains) then an error is
raised, since there's currently no safe way to implement an equivalent
`inquire()` test.

We can revisit this function when stronger controls around threaded IO
are introduced.  But for now, I believe that this is the best that we
can do.
Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the revised code safely covers all of the possible use cases and that is should neatly address the issue that prompted these changed. I am approving this, pending a successful pipeline test result once this rises to the top of the PR queue.

@adcroft
Copy link
Member

adcroft commented Feb 28, 2023

This passed testing (done manually)

@adcroft adcroft merged commit 5f62858 into NOAA-GFDL:dev/gfdl Feb 28, 2023
marshallward pushed a commit that referenced this pull request Mar 3, 2023
  Revised write_energy so that only the root_PE attempts to open, reopen or
write to a netcdf file.  Although FMS can handle cases where multiple PEs make
the same calls with internal logic, this change avoids requiring such internal
(hidden) logical tests, and instead is more explicit on what is actually
intended.  This change is complementary to MOM6 dev/gfdl PR #328, which adds
internal logic to handle the case where all PEs are making a call to reopen a
single netcdf file.  All answers and output are bitwise identical.
@marshallward marshallward deleted the reopen_inquire_sync branch April 21, 2023 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants