Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to sync NetCDF files #34

Merged
merged 1 commit into from
May 13, 2024
Merged

Add option to sync NetCDF files #34

merged 1 commit into from
May 13, 2024

Conversation

Sbozzolo
Copy link
Member

@Sbozzolo Sbozzolo commented May 13, 2024

Sometimes, NetCDF files need syncing (force-writing to disk). This seems to be particularly the case for GPU runs.

This PR adds an option to provide the NetCDFWriter with a schedule to call NCDatasets.sync based on arbitrary conditions.

The default behavior depends on the context: on CPUs, we let NetCDF manage its buffered writes, on GPUs, we call sync at the every time steps for those datasets that need to be synced (ie, those that were written since the last sync).

In general, we keep track of what files have been recently touched (adding them to a list internal to the writer) and whenever sync is called, NCDatasets.sync is called on those files. This ensures that only files that needs to be synced are synced and that calling sync twice in a row results in a no-op for the second sync.

Closes #33

@Sbozzolo Sbozzolo force-pushed the gb/nc_sync branch 2 times, most recently from 159b8de to 88cad2a Compare May 13, 2024 19:08
Sometimes, NetCDF files need syncing (force-writing to disk). This seems
to be particularly the case for GPU runs.

This PR adds an option to provide the `NetCDFWriter` with a schedule to
call `NCDatasets.sync` based on arbitrary conditions.

The default behavior depends on the context: on CPUs, we let `NetCDF`
manage its buffered writes, on GPUs, we call `sync` at the every time
steps for those datasets that need to be synced (ie, those that were
written since the last sync).

In general, we keep track of what files have been recently
touched (adding them to a list internal to the `writer`) and whenever
`sync` is called, `NCDatasets.sync` is called on those files. This
ensures that only files that needs to be synced are synced and that
calling `sync` twice in a row results in a no-op for the second `sync`.
@Sbozzolo Sbozzolo merged commit 9b5282f into main May 13, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NetCDF files need syncing
2 participants