-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread safety Part 1 #1373
Comments
That sounds good; I'm going to push through the outstanding PR's as they stand at the moment, but once that's done we can wait for this :) |
No need to hold up most merges. It is only really big ones that may cause trouble. |
Dennis, I look forward to seeing this work develop. Also, how do you plan to test? |
This will be a great feature in NetCDF. I am also looking forward to see it. FYI. There is a test program in PnetCDF for testing one-file-per-thread safety. |
Thanks. What is the general state of thread support in pnetcdf? |
Starting from 1.11.0, PnetCDF supports one-file-per-thread safety. |
re: #1373 (partial) * Mark some global constants be const to indicate to make them easier to track. * Hide direct access to the ncrc_globalstate behind a function call. * Convert dispatch tables to constants (except the user defined ones) This has some consequences in terms of function arguments needing to be marked as const also. * Remove some no longer needed global fields * Aggregate all the globals in nclog.c * Uniformly replace nc_sizevector{0,1} with NC_coord_{zero,one} * Uniformly replace nc_ptrdffvector1 with NC_stride_one * Remove some obsolete code
Over on the Rust wrapper (https://github.com/mhiley/rust-netcdf) we are wondering how unsafe the library is when multi-threading.
|
Hi, I understand this is about netcdf-c but for the sake of brevity, I use the following Python snippet for illustration purposes:
With this minimum working example, I can reliably reproduce a segfault. So, even using two distinct datasets seems to be problematic.
Following this path, it shows that libhdf5 itself may not be thread-safe: Following the suggestion from this link and building libhdf5 with the following configure options makes the above minimum working example run without segfault:
However, as stated in the libhdf5 documentation, "--enable-threadsafe" and "--enable-hl" are mutually exclusive and their combination is unsupported. Albeit, it seems to work at least for the minimum example above. I hope this information is helpful. Cheers, |
There is (was?) an hdf5 build flag that serialized access to the hdf5 API. The better solution for netcdf-c is to analyze the code for non-constant |
When working on the wrapper for netcdf, we found it was easy to corrupt internal data structures when working with a non-threadsafe A workaround for multi-threaded (read-only) access is direct file-access, as e.g. @gauteh implemented in hidefix. |
It seems that this work must be done in each dispatch layer. For example, for netCDF/HDF5, in libsrc4 and libhdf5, we protect with a mutex each function that changes any internal data. Is that what you had in mind? If we work together, you doing the classic and opendap, me doing HDF5 and HDF4, we could probably do that pretty quickly. And it would not hurt to do it incrementally (i.e. one dispatch layer at a time). Can we come up with a programming idiom for this? Do you have a first-case that you have explored? Presumably this is just going to be some macro that wraps the code to check the mutex, wait for it to be available, and then set it and go. And then another macro that unsets the mutex. Then each function that changed the internal metadata would get something like:
Multi threaded operation would also be of great use to NOAA HPC codes, and, presumably, to those of other large data producers. |
There are two ways to do this, I think. The simplest is to do what hdf5 did, |
What about my suggestion of protecting all functions that now change metadata? Seems like that would be easier than get/set functions, and would allow multiple threads to read the metadata at one time, as would not be the case if we serialized access to the library. Reads of metadata should be able to happen in multiple threads at the same time. Only if someone wants to change the value do we need to serialize access, right? Let me be more specific. From nc4internal.h we can clearly see which functions just read metadata (find_nc_*()) and which change metadata.
|
But you need to make sure that you can isolate all metadata accesses thru |
Personally my guess is that locking the global data is better than locking the code. |
Well if we lock the global data by struct instead, then the getter functions are the existing find_* functions, right? |
Maybe, but I bet if you look, you find any number of places where structs |
Remember, it is not enough to lock modifications. Reads must also be locked. |
If the file is opened read-only, do reads need to be locked? |
Let me see if an example will help me understand. Here's some code from libhdf4:
In this code we read various fields. In the assert we read h5->format_file_info. A bit later on, we also read fields from the var struct, format_file_info, att, sdsid, etc. You are saying that each of those must be accessed through a get function instead? So each can have its own mutex? |
Yes, chunks are read and decompressed to a non-thread safe (but size
limited) cache.
tir. 2. jun. 2020, 21:53 skrev Edward Hartnett <[email protected]>:
… If the file is opened read-only, do reads need to be locked?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1373 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAN3636DA2MCFTQ2X2E57LRUVKC7ANCNFSM4HAAHU7Q>
.
|
The short answer is yes. The basic problem is that if some other thread |
Well crap, that's a lot of work. |
True. Also about read-only files. The problem there is lazy reading In any case, this is probably why HDF5 went for a single global lock |
The more I think about it, the more I like the read-only file special case as a target. |
Certainly we can turn off lazy reading (that is, force a read of all metadata on file open) in select cases. This could be trivially done with an NC_MULTITHREADED flag, for example. Reading with multithreading would indeed be worth having, even if it only worked for read-only files. But I think the holy grail is going to be to have different threads write data at the same time. Metadata is not a big deal in terms of the total I/O time. What is most important is when they are writing all their data for a timestep. If different threads could write to the file, that would be great. |
Good point. But as long as the metadata is read-only for the duration However, the gotcha still remains. We just do not have enough control |
Well this may be a case where the classic format pulls ahead. ;-) |
Please see my comment: #1495 (comment) |
Since changes to the netcdf-c library appear to be slowing down for the moment,
I am going to create a PR that is an initial step to thread safety support.
Specifically, I am going to start collecting all global variables and computed
constants into a single file.
There are some complexities.
These will need to be protected by a global mutex.
Thread safe access to them will need to be done by that dispatcher's code.
that uses a separate mutex. Once these constants are initialized, they will
be accessible without any synchronization.
The text was updated successfully, but these errors were encountered: