From f7a0fba2d6ff93ad237f54b391cbeea903e41ad6 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Thu, 29 Aug 2024 02:18:57 +0200 Subject: [PATCH] Add RFC101 text: Raster dataset read-only thread-safety --- doc/source/development/rfc/index.rst | 1 + .../rfc101_raster_dataset_threadsafety.rst | 484 ++++++++++++++++++ doc/source/spelling_wordlist.txt | 4 + 3 files changed, 489 insertions(+) create mode 100644 doc/source/development/rfc/rfc101_raster_dataset_threadsafety.rst diff --git a/doc/source/development/rfc/index.rst b/doc/source/development/rfc/index.rst index f0763fe8d0ca..73f24bcf634e 100644 --- a/doc/source/development/rfc/index.rst +++ b/doc/source/development/rfc/index.rst @@ -106,3 +106,4 @@ RFC list rfc97_feature_and_fielddefn_sealing rfc98_build_requirements_gdal_3_9 rfc99_geometry_coordinate_precision + rfc101_raster_dataset_threadsafety diff --git a/doc/source/development/rfc/rfc101_raster_dataset_threadsafety.rst b/doc/source/development/rfc/rfc101_raster_dataset_threadsafety.rst new file mode 100644 index 000000000000..48620c40fa3b --- /dev/null +++ b/doc/source/development/rfc/rfc101_raster_dataset_threadsafety.rst @@ -0,0 +1,484 @@ +.. _rfc-101: + +=================================================================== +RFC 101: Raster dataset read-only thread-safety +=================================================================== + +============== ============================================= +Author: Even Rouault +Contact: even.rouault @ spatialys.com +Started: 2024-Aug-29 +Status: Draft +Target: GDAL 3.10 +============== ============================================= + +Summary +------- + +This RFC enables users to get instances of :cpp:class:`GDALDataset` +(and their related objects such as :cpp:class:`GDALRasterBand`) that are +thread-safe for read-only raster operations, that is such instances can +be safely used from multiple threads without locking. + +Terminology +----------- + +The exact meaning of the terms ``thread-safe`` or ``re-entrant`` is not fully +standardized. We will use here the `QT definitions `__. +In particular, a C function or C++ method is said to be re-entrant if it can +be called simultaneously from multiple threads, *but* only if each invocation +uses its own data/instance. On the contrary, it is thread-safe is if can be +called on the same data/instance (so thread-safe is stronger than re-entrant) + +Motivation +---------- + +A number of raster algorithms can be designed to read chunks of a raster in +an independent and concurrent way, with resulting speed-ups when using +multi-threading. Currently, given a GDALDataset instance is not thread-safe, +this requires either to deal with I/O in a single thread, or through a mutex +to protect against concurrent use, or one needs to open a separate GDALDataset +for each worker thread. Both approaches can complicate the writing of such +algorithms. The enhancement of this RFC aims at providing a special GDALDataset +instance that can be used safely from multiple threads. Internally, it does use +one GDALDataset per thread, but hides this implementation detail to the user. + +C and C++ API extensions +------------------------ + +A new ``GDAL_OF_THREAD_SAFE`` opening flag is added to be specified to +:cpp:func:`GDALOpenEx` / :cpp:func:`GDALDataset::Open`. This flag is for now +mutually exclusive with ``GDAL_OF_VECTOR``, ``GDAL_OF_MULTIDIM_RASTER`` and +``GDAL_OF_UPDATE``. That is this flag is only meant for read-only raster +operations (``GDAL_OF_RASTER | GDAL_OF_THREAD_SAFE``). + +To know if a given dataset can be used in a thread-safe way, the following +C++ method is added to the GDALDataset class: + +.. code-block:: c++ + + /** Return whether this dataset, and its related objects (typically raster + * bands), can be called for the intended scope. + * + * Note that in the current implementation, nScopeFlags should be set to + * GDAL_OF_RASTER, as thread-safety is limited to read-only operations and + * excludes operations on vector layers (OGRLayer) or multidimensional API + * (GDALGroup, GDALMDArray, etc.) + * + * This is the same as the C function GDALDatasetIsThreadSafe(). + * + * @since 3.10 + */ + bool IsThreadSafe(int nScopeFlags) const; + + +The corresponding C function is added: + +.. code-block:: c + + bool GDALDatasetIsThreadSafe(GDALDatasetH hDS, int nScopeFlags, + CSLConstList papszOptions); + + +A new C++ function, GDALCreateThreadSafeDataset, is added with two forms: + +.. code-block:: c++ + + std::unique_ptr GDALCreateThreadSafeDataset(std::unique_ptr poDS, int nScopeFlags); + + std::unique_ptr GDALCreateThreadSafeDataset(GDALDataset* poDS, int nScopeFlags); + +This function accepts a (generally non thread-safe) source dataset and return +a new dataset that is a thread-safe wrapper around it. +The nScopeFlags argument must be compulsory set to GDAL_OF_RASTER to express that +the intended scope is read-only raster operations (other values will result in +an error and a NULL returned dataset). +This function is used internally by GDALOpenEx() when the GDAL_OF_THREAD_SAFE +flag is passed to wrap the dataset returned by the driver. +The first form takes ownership of the source dataset. The second form does not, +but references it internally, and assumes that its lifetime will be longer than +the lifetime of the returned thread-safe dataset. Note that the second form does +increase the reference count on the passed dataset while it is being used, so +patterns like the following one are valid: + +.. code-block:: c++ + + auto poDS = GDALDatase::Open(...); + auto poThreadSafeDS = GDALCreateThreadSafeDataset(poDS, GDAL_OF_RASTER | GDAL_OF_THREAD_SAFE); + poDS->ReleaseRef(); + // ... do something with hThreadSafeDS ... + poThreadSafeDS.reset(); // optional + + +The corresponding C function for the second form is added: + +.. code-block:: c + + GDALDatasetH GDALCreateThreadSafeDataset(GDALDatasetH hDS, int nScopeFlags, CSLConstList papszOptions); + + +Usage examples +-------------- + +Example of a function processing a whole dataset passed as a filename: + +.. code-block:: c++ + + void foo(const char* pszFilename) + { + auto poDS = std::unique_ptr(GDALDataset::Open( + pszFilename, GDAL_OF_RASTER | GDAL_OF_THREAD_SAFE | GDAL_OF_VERBOSE_ERROR)); + if( !poDS ) + { + return; + } + + // TODO: spawn threads using poDS + } + + +Example of a function processing a whole dataset passed as an object: + +.. code-block:: c++ + + void foo(GDALDataset* poDS) + { + std::unique_ptr poThreadSafeDSUniquePtr; // keep in that scope + GDALDataset* poThreadSafeDS; + if( poDS->IsThreadSafe(GDAL_OF_RASTER) ) + { + poThreadSafeDS = poDS; + } + else + { + poThreadSafeDSUniquePtr = GDALCreateThreadSafeDataset(poDS, GDAL_OF_RASTER); + // TODO: check poThreadSafeDSUniquePtr is not null. + poThreadSafeDS = poThreadSafeDSUniquePtr.get(); + } + + if( poThreadSafeDS ) + { + // TODO: spawn threads using poThreadSafeDS + } + else + { + // TODO: Serial version of the algorithm. It can happen if + // poDS is a on-the-fly dataset. + } + } + + +Example of a function processing a single band passed as an object: + +.. code-block:: c++ + + void foo(GDALRasterBand* poBand) + { + std::unique_ptr poThreadSafeDSUniquePtr; // keep in that scope + + GDALRasterBand* poThreadSafeBand = nullptr; + GDALDataset* poDS = poBand->GetDataset(); + // Check that poBand has a matching owing dataset + if( poDS && poDS->GetRasterBand(poBand->GetBand()) == poBand ) + { + GDALDataset* poThreadSafeDS; + if( poDS->IsThreadSafe(GDAL_OF_RASTER) ) + { + poThreadSafeDS = poDS; + } + else + { + poThreadSafeDSUniquePtr = GDALCreateThreadSafeDataset(poDS, GDAL_OF_RASTER); + poThreadSafeDS = poThreadSafeDSUniquePtr.get(); + } + + if( poThreadSafeDS ) + poThreadSafeBand = poThreadSafeDS->GetBand(poBand->GetBand()); + } + + if( poThreadSafeBand ) + { + // TODO: spawn threads using poThreadSafeBand + } + else + { + // TODO: Serial version of the algorithm. It can happen if + // poBand is a on-the-fly band, or a "detached" band, such as a + // mask band, or an overview band as returned by some drivers. + } + } + + +SWIG bindings +------------- + +The new C macro and functions are bound to SWIG as: + +- ``gdal.OF_THREAD_SAFE`` +- :py:func:`Dataset.IsThreadSafe(nScopeFlags)` +- :py:func:`Dataset.CreateThreadSafeDataset(nScopeFlags)`. The Python + implementation of this method takes care of keeping a reference on the source + dataset in the returned thread-safe dataset, so the user does not have to + care about their respective lifetimes. + +Usage and design limitations +---------------------------- + +* As implied by the RFC title, the scope of thread-safety is restricted to + **raster** and **read-only** operations. + +* For GDALDataset instances pointing to a file on the regular filesystem, the + limitation of the maximum number of file descriptor opened by a process + (1024 on most Linux systems) could be hit if working with a sufficiently large + number of worker threads and/or instances of GDALThreadSafeDataset. + +* The generic implementation of GDALCreateThreadSafeDataset assumes that the + source dataset can be re-opened by its name (GetDescription()), which is the + case for datasets opened by GDALOpenEx(). A special implementation is also + made for dataset instances of the MEM driver. But, there is currently no + support for creating a thread-safe dataset wrapper on on-the-fly datasets + returned by some algorithms. + +* Inherent to the selected approach, there is a band block cache per thread, and + thus no sharing of cached blocks between threads. + However, this should not be a too severe limitation for algorithms where + threads process independent regions of the raster, hence reuse of cached blocks + would be inexistent or low. Optimal algorithms will make sure to work on + regions of interest aligned on the block size (this advice also applies for + the current approach of manually opening a dataset for each worker thread). + +* Due to implementation limitations, :cpp:func:`GDALRasterBand::GetDefaultRAT` + on a GDALThreadSafeDataset instance only works if the RAT is an instance of + :cpp:class:`GDALDefaultRasterAttributeTable`. A clean error is emitted if + this is not the case. This could potentially be extended to work with any + subclass of :cpp:class:`GDALRasterAttributeTable` but with significant + additional coding to create a thread-safe wrapper. (GDALDefaultRasterAttributeTable + is intrinsically thread-safe for read-only operations). This is not perceived + as a limitation for the intended use cases of this RFC (reading pixel values + in parallel). + +* Some drivers, like netCDF, and HDF5 in some builds, use a global lock around + each call to their APIs, due to the underlying libraries not being re-entrant. + Obviously scalability of GDALThreadSafeDataset will be limited by such global + lock. + But this is no different than the approach of opening as many dataset as + worker threads. + +Implementation details +---------------------- + +(This section is mostly of interest for developers familiar with GDAL internals +and may be skipped by users of the GDAL API) + +The gist of the implementation lies in a new file ``gcore/gdalthreadsafedataset.cpp`` +which defines several classes (internal details): + +- ``GDALThreadSafeDataset`` extending :cpp:class:`GDALProxyDataset`. + Instances of that class are returned by GDALCreateThreadSafeDataset(). + On instantiation, it creates as many GDALThreadSafeRasterBand instances as + the number of bands of the source dataset. + All virtual methods of GDALDataset are redefined by GDALProxyDataset. + GDALThreadSafeDataset overloads its ReferenceUnderlyingDataset method, so that + a thread-locale dataset is opened the first-time a thread calls a method on + the GDALThreadSafeDataset instance, cached for later use, and method call is + forwarded to it. + +- ``GDALThreadSafeRasterBand`` extending :cpp:class:`GDALProxyRasterBand`. + On instantiation, it creates child GDALThreadSafeRasterBand instances for + band mask and overviews. + Its ReferenceUnderlyingRasterBand method calls ReferenceUnderlyingDataset + on the GDALThreadSafeDataset instance to get a thread-locale dataset, fetches + the appropriate thread-locale band and forwards its the method call. + +- ``GDALThreadLocalDatasetCache``. Instances of that class use thread-local + storage. The main member of such instances is a LRU cache that maps + GDALThreadSafeDataset* instances to a thread specific GDALDataset smart pointer. + On GDALThreadSafeDataset destruction, there's code to iterate over all + alive GDALThreadLocalDatasetCache instances and evict no-longer needed entries + in them, within a per-GDALThreadLocalDatasetCache instance mutex, to avoid + issues when dealing with several instances of GDALThreadLocalDatasetCache... + Note that the existence of this mutex should not cause performance issues, since + contention on it, should be very low in real-world use cases (it could become + a bottleneck if GDALThreadSafeDataset were created and destroyed at a very + high pace) + +Two protected virtual methods are added to GDALDataset for GDALThreadSafeDataset +implementation, and may be overloaded by drivers if needed (but it is not +anticipated that drivers but the MEM driver need to do that) + +- ``bool CanBeCloned(int nScopeFlags, bool bCanShareState) const``. + This method determines if a source dataset can be "cloned" (or re-opened). + It returns true for instances returned by GDALOpenEx, for instances of the MEM + driver if ``nScopeFlags`` == ``GDAL_OF_RASTER`` (and ``bCanShareState`` is + true for instances of the MEM driver) + +- ``std::unique_ptr Clone(int nScopeFlags, bool bCanShareState) const``. + This method returns a "clone" of the dataset on which it is called, and is used + by GDALThreadSafeDataset::ReferenceUnderlyingDataset() when a thread-locale + dataset is needed. + Implementation of that method must be thread-safe. + The base implementation calls GDALOpenEx() reusing the dataset name, open flags + and open option. It is overloaded in the MEM driver to return a new instance + of MEMDataset, but sharing the memory buffers with the source dataset. + +No code in drivers, but the MEM driver, is modified by the candidate +implementation. + +A few existing non-virtual methods of GDALDataset and GDALRasterBand have been +made virtual (and overloaded by GDALProxyDataset and GDALProxyRasterBand), +to avoid modifying state on the GDALThreadSafeRasterBand instance, which +wouldn't be thread-safe. + +- :cpp:func:`GDALDataset::BlockBasedRasterIO`: + it interacts with the block cache +- :cpp:func:`GDALRasterBand::GetLockedBlockRef`: + it interacts with the block cache +- :cpp:func:`GDALRasterBand::TryGetLockedBlockRef`: + it interacts with the block cache +- :cpp:func:`GDALRasterBand::FlushBlock`: + it interacts with the block cache +- :cpp:func:`GDALRasterBand::InterpolateAtPoint`: + it uses a per-band cache +- :cpp:func:`GDALRasterBand::EnablePixelTypeSignedByteWarning`: it should + already have been made virtual for GDALProxyRasterBand needs. + +Performance +----------- + +The existing multireadtest utility that reads a dataset from multiple threads +has been extended with a -thread_safe flag to asks to use GDAL_OF_THREAD_SAFE +when opening the dataset in the main thread and use it in the worker threads, +instead of the default behavior of opening explicitly a dataset in each thread. +The thread-safe mode shows similar scalability as the default mode, sometimes +with a slightly decreased efficiency, but not in a too problematic way. + +For example on a 20x20 raster: + +.. code-block:: shell + + $ time multireadtest -t 4 -i 1000000 20x20.tif + real 0m2.084s + user 0m8.155s + sys 0m0.020s + + vs + + $ time multireadtest -thread_safe -t 4 -i 1000000 20x20.tif + real 0m2.387s + user 0m9.334s + sys 0m0.029s + + +But on a 4096x4096 raster with a number of iterations reduced to 100. the +thread_safe mode is slightly faster: + +.. code-block:: shell + + $ time multireadtest -t 4 -i 100 4096x4096.tif + real 0m4.845s + user 0m18.666s + sys 0m0.084s + + vs + + $ time multireadtest -thread_safe -t 4 -i 100 4096x4096.tif + real 0m4.255s + user 0m16.370s + sys 0m0.096s + + +A Python equivalent of multireadtest has been written. Scalability depends +on how much Python code is executed. If relatively few long-enough calls to +GDAL are done, scalability tends to be good due to the Python Global Interpreter +Lock (GIL) being dropped around them. If many short calls are done, the GIL +itself, or its repeated acquisition and release, becomes the bottleneck. This is +no different than using a GDALDataset per thread. + +Documentation +------------- + +Documentation for the new constant and functions will be added. The +:ref:`multithreading` page will be updated to reflect the new capability +introduced by this RFC. + +Backward compatibility +---------------------- + +No issue anticipated: the C and C++ API are extended. +The C++ ABI is modified due to additions of new virtual methods. + +Testing +------- + +Tests will be added for the new functionality, including stress tests to have +sufficiently high confidence in the correctness of the implementation for common +use cases. + +Risks +----- + +Like all code related to multi-threading, the C++ language and tooling offers +hardly any safety belt against thread-safety programming errors. So it cannot +be excluded that the implementation suffers from bugs in some edge scenarios, +or in the usage of some methods of GDALDataset, GDALRasterBand and related objects +(particularly existing non-virtual methods of those classes that could happen +to have a non thread-safe implementation) + +Open questions +-------------- + +The question in this section needs to be solved during the discussion phase of +the RFC. + +For the unusual situations where a dataset cannot be reopened and thus +GDALCreateThreadSafeDataset() fails, should we provide an additional ``bForce`` +argument to force it to still return a dataset, where calls to the wrapped +dataset are protected by a mutex? This would enable to always write multi-thread +safe code, even if the access to the dataset is serialized. +Similarly we could have a +``std::unique_ptr GDALCreateThreadSafeRasterBand(GDALRasterBand* poBand, int nOpenFlags, bool bForce)`` +function that would try to use GDALCreateThreadSafeDataset() internally if it +manages to identify the dataset to which the band belongs to, and otherwise would +fallback to protecting calls to the wrapped band with a mutex. + +Design discussion +----------------- + +This paragraph discusses a number of thoughts that arose during the writing of +this RFC. + +A significantly different alternative could have consisted in adding native +thread-safety in each driver. But this is not realistic for the following reasons: + +* if that was feasible, it would require considerable development effort to + rework each drivers. So realistically, only a few select drivers would be updated. + +* Even updating a reduced number of drivers would be extremely difficult, in + particular the GeoTIFF one, due to the underlying library not being reentrant, + and deferred loading strategies and many state variables being modified even + by read-only APIs. And this applies to most typical drivers. + +* Due to the inevitable locks, there would be a (small) cost bore by callers + even on single-thread uses of thread-safe native drivers. + +* Some core mechanisms, particularly around the per-band block cache structures, + are not currently thread-safe. + +A variant of the proposed implementation that did not use thread-local storage +has been initially attempted. It stored instead a +``std::map>`` on each GDALThreadSafeDataset +instance. This implementation was simpler, but unfortunately suffered from high +lock contention since a mutex had to be taken around each access to this map, +with the contention increasing with the number of concurrent threads. + +Related issues and PRs +---------------------- + +- Candidate implementation: https://github.com/OSGeo/gdal/compare/master...rouault:gdal:raster_multi_thread?expand=1 + +- https://github.com/OSGeo/gdal/issues/8448: GTiff: Allow concurrent reading of single blocks + +Voting history +-------------- + +TBD diff --git a/doc/source/spelling_wordlist.txt b/doc/source/spelling_wordlist.txt index 46bea3a8339d..fec3027e6fa6 100644 --- a/doc/source/spelling_wordlist.txt +++ b/doc/source/spelling_wordlist.txt @@ -988,6 +988,7 @@ gdalbuildvrt gdalcompare gdalconst gdaldata +GDALDataset gdaldem gdaldriver gdalenhance @@ -1017,6 +1018,7 @@ gdalrasterblock gdalrc gdalsrsinfo gdaltest +GDALThreadLocalDatasetCache gdaltindex gdaltransform gdalvirtualmem @@ -1850,6 +1852,7 @@ MultiPolygon multipolygons multiproc multiprocess +multireadtest multirecords multispectral multistring @@ -2047,6 +2050,7 @@ nRecurseDepth nRefCount nReqOrder nSampleStep +nScopeFlags nSecond nSize nSrcBufferAllocSize