Creating error callbacks #215

ChristianBeilschmidt · 2021-09-07T08:47:53Z

I agree to follow the project's code of conduct.
I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

I've attached two options for setting up the global GDAL error callback. One is thread-safe and comes with a new dependecy and the other one is not.

The reason for storing the callback in a static variable is that we cannot pass the ownership to GDAL. We can access a generic pointer via the pUserData field and cast it to our function type. Therefore, in order not to leak our FnMut, we have to store it somehow and drop it later on.

When we decide on which version to use, I can remove the other one and convert the draft to a PR.

lnicola · 2021-09-07T08:59:37Z

Can you use once_cell instead? It might get folded into std at some point.

ChristianBeilschmidt · 2021-09-07T09:42:18Z

Can you use once_cell instead? It might get folded into std at some point.

Done that in da4c2ac

ChristianBeilschmidt · 2021-09-08T14:22:40Z

Okay, I've done two things:

0111fbd
I removed the non-thread-safe variant.
Moreover, I noticed that the previous version had a potential race condition when we would switch the callback in the mutex and then install the new callback. Thus, we have to first install the new callback to not have the old callback point to a dangling pointer. However, we have to first call it and then move the callback into the mutex without changing the reference that GDAL uses in p_user_data. This is why I wrapped the Box into a Pin<Box<_>>. If you have any improvement on that, I will try it.
6387a29
I moved the error type enum to error.rs

src/config.rs

rmanoka · 2021-09-11T21:25:10Z

@ChristianBeilschmidt Thanks for the implementation. Since concurrency is a bit hard to debug / reason about, I wanted to discuss a few points on the overall impl.

Callback Storage. Currently you're storing the callback both in the static variable, and in GDAL's ErrorHandlerUserData. We should have a single source of truth; I think just using the static variable is simpler.
Function Type. I was a bit worried about the FnMut possibly being called in parallel in error_handler. However, it seems GDAL uses a mutex to ensure the error handler is not called / assigned in parallel. Could you add some comments or GDAL doc links explaining this. The function needs Sync in addition to Send + 'static as it is being accessed from different threads.
Poison Handling. I think the only panic while holding the mutex in your implementation is if CPLSetErrorHandler panics. Thus, the function pointer is valid even if poisoned. Pls. explain this in the code for future reference.
Locking. The mutex code seems good to me. I still think just a Box<ErrorCallbackType> suffices, as the locking mechanism ensures the pointer doesn't change while being accessed.

If you are feeling brave, there is a way to fully avoid using any locks at all. The idea is to use an AtomicPtr as the static variable, initialized to null (Box pointers are never null).

Before setting the handler, use AtomicPtr::swap:
```
// Swap should be done before CPLSetHandler
let old_ptr = ERROR_CALLBACK.swap(Box::into_raw(callback), Ordering::SeqCst);
CPLSetHandler(...)
// Drop logic must happen after CPLSetHandler
if old_addr > 0 {
    // Convert back to Box to destruct properly
    unsafe { Box::from_raw(old_addr as *mut ErrorCallbackType) };
}
```
Removal is similar, swap in null_mut
```
let old_ptr = ERROR_CALLBACK.swap(null_mut(), Ordering::SeqCst);
CPLSetHandler(None);
// Drop old_ptr
```
This is somewhat like the Treiber stack, but much simpler as we just store one value and piggy-back on GDAL's mutex. See also this neat rust exposition. Unlike the treiber stack, we can in fact safely drop the memory as we piggy-back on GDAL's mutex mechanism.

Calling is simple. If our fn error_handler is called, then we know a handler we set is currently active. So we can use AtomicPtr::load to read the ptr, and call the function. If the pointer is not null, we can be sure the pointed memory is not freed as the CPLSetHandler call won't finish before the error handler return. Only edge case is if the handler is called while we're removing the handler, in which case the ptr may be null. We just need to exit without calling in this case.

Edit: fix lock-free logic to not require leaking the box ptr.

ChristianBeilschmidt · 2021-09-14T15:14:49Z

@ChristianBeilschmidt Thanks for the implementation. Since concurrency is a bit hard to debug / reason about, I wanted to discuss a few points on the overall impl.
1. Callback Storage. Currently you're storing the callback both in the static variable, and in GDAL's `ErrorHandlerUserData`. We should have a single source of truth; I think just using the static variable is simpler.

I guess we can't. GDAL doesn't take the ownership of that pointer, it treats it as a reference. Thus, we would leak if we transfer the ownership to GDAL nevertheless and it gets discarded by a call to CPLSetErrorHandler (or whatever) that is not covered by our two functions.

2. Function Type.  I was a bit worried about the `FnMut` possibly being called in parallel in `error_handler`.  However, it seems GDAL [uses a mutex](https://github.com/OSGeo/gdal/blob/543d6ffd39b66a727da49cb398591959a8f644cf/gdal/port/cpl_error.cpp#L355) to ensure the error handler is not called / assigned in parallel.  Could you add some comments or GDAL doc links  explaining this.  The function needs `Sync` in addition to `Send + 'static` as it is being accessed from different threads.

I changed it here: bd4beaa

3. Poison Handling. I think the only panic while holding the mutex in your implementation is if `CPLSetErrorHandler` panics.  Thus, the function pointer is valid even if poisoned.  Pls. explain this in the code for future reference.

I've addressed it here: bd4beaa

4. Locking.  The mutex code seems good to me.  I still think just a `Box<ErrorCallbackType>` suffices, as the locking mechanism ensures the pointer doesn't change while being accessed.

if you only use one Box and you move it into the Mutex, the address changes. The inner Box makes the address stable.
And we need two stable addresses while swapping our new Box into the mutex while simultaneously ensuring that all error callbacks that happen in-between (from actual errors, not our function calls) have a valid pointer.

   If you are feeling brave, there is a way to fully avoid using any locks at all.  The idea is to use an `AtomicPtr` as the static variable, initialized to null (`Box` pointers are never null).
   Before setting the handler, use `AtomicPtr::swap`:
   ```rust
   // Swap should be done before CPLSetHandler
   let old_ptr = ERROR_CALLBACK.swap(Box::into_raw(callback), Ordering::SeqCst);
   CPLSetHandler(...)
   // Drop logic must happen after CPLSetHandler
   if old_addr > 0 {
       // Convert back to Box to destruct properly
       unsafe { Box::from_raw(old_addr as *mut ErrorCallbackType) };
   }
   ```
   
       
         
       
   
         
       
   
       
     
   Removal is similar, swap in `null_mut`
   ```rust
   let old_ptr = ERROR_CALLBACK.swap(null_mut(), Ordering::SeqCst);
   CPLSetHandler(None);
   // Drop old_ptr
   ```
   
   
       
         
       
   
         
       
   
       
     
   This is somewhat like the [Treiber stack](https://en.wikipedia.org/wiki/Treiber_stack), but much simpler as we just store one value and piggy-back on GDAL's mutex.  See also this neat [rust exposition](https://aturon.github.io/blog/2015/08/27/epoch/).  Unlike the treiber stack, we can in fact safely drop the memory as we piggy-back on GDAL's mutex mechanism.
   Calling is simple.   If our `fn error_handler` is called, then we know a handler we set is currently active.  So we can use `AtomicPtr::load` to read the ptr, and call the function.  If the pointer is not null, we can be sure the pointed memory is not freed as the `CPLSetHandler` call won't finish before the error handler return.  Only edge case is if the handler is called while we're removing the handler, in which case the ptr may be null.  We just need to exit without calling in this case.

Edit: fix lock-free logic to not require leaking the box ptr.

I could try it and see if we don't introduce new race conditions. However, the function is most likely not called in huge concurrency. So I don't see the benefit in not locking, here.

jdroenner · 2021-09-15T11:52:07Z

can we merge it like it is or do we need more discussion / changes?

rmanoka · 2021-09-15T15:22:48Z

@ChristianBeilschmidt

Callback Storage. Currently you're storing the callback both in the static variable, and in GDAL's ErrorHandlerUserData. We should have a single source of truth; I think just using the static variable is simpler.

I guess we can't. GDAL doesn't take the ownership of that pointer, it treats it as a reference. Thus, we would leak if we transfer the ownership to GDAL nevertheless and it gets discarded by a call to CPLSetErrorHandler (or whatever) that is not covered by our two functions.

I'm suggesting not using user-data of GDAL at all, and only using our locked structure. So the only pointer being passed to GDAL is the error_handler function which is a static function. It's mostly for code clarity, so optional.

Locking. The mutex code seems good to me. I still think just a Box<ErrorCallbackType> suffices, as the locking mechanism ensures the pointer doesn't change while being accessed.

if you only use one Box and you move it into the Mutex, the address changes. The inner Box makes the address stable.
And we need two stable addresses while swapping our new Box into the mutex while simultaneously ensuring that all error callbacks that happen in-between (from actual errors, not our function calls) have a valid pointer.

If you move a Box into a Mutex, the data pointed to by the Box doesn't move. For instance:

let mut boxed = Box::new(0usize);
let ptr: *mut _ = boxed.as_mut() as *mut _;

let mutex = Mutex::new(boxed);
assert_eq!(
    mutex.lock().unwrap().as_mut() as *mut _,
    ptr
);

If you are feeling brave, there is a way to fully avoid using any locks at all. The idea is to use an AtomicPtr as the static variable, initialized to null (Box pointers are never null).
I could try it and see if we don't introduce new race conditions. However, the function is most likely not called in huge concurrency. So I don't see the benefit in not locking, here.

Ok.

@jdroenner A few minor suggestions, mostly for clarity and remove extra allocations.

ChristianBeilschmidt · 2021-09-16T07:19:36Z

@ChristianBeilschmidt

Callback Storage. Currently you're storing the callback both in the static variable, and in GDAL's ErrorHandlerUserData. We should have a single source of truth; I think just using the static variable is simpler.

I guess we can't. GDAL doesn't take the ownership of that pointer, it treats it as a reference. Thus, we would leak if we transfer the ownership to GDAL nevertheless and it gets discarded by a call to CPLSetErrorHandler (or whatever) that is not covered by our two functions.

I'm suggesting not using user-data of GDAL at all, and only using our locked structure. So the only pointer being passed to GDAL is the error_handler function which is a static function. It's mostly for code clarity, so optional.

I hesitate to switch to AtomicPtr since we then guarantee to not drop the Box<Fn> at the end of our program. But I haven't used it before. If you have an idea for that, I could try it.

Locking. The mutex code seems good to me. I still think just a Box<ErrorCallbackType> suffices, as the locking mechanism ensures the pointer doesn't change while being accessed.

if you only use one Box and you move it into the Mutex, the address changes. The inner Box makes the address stable.
And we need two stable addresses while swapping our new Box into the mutex while simultaneously ensuring that all error callbacks that happen in-between (from actual errors, not our function calls) have a valid pointer.

If you move a Box into a Mutex, the data pointed to by the Box doesn't move. For instance:
let mut boxed = Box::new(0usize);
let ptr: *mut _ = boxed.as_mut() as *mut _;

let mutex = Mutex::new(boxed);
assert_eq!(
    mutex.lock().unwrap().as_mut() as *mut _,
    ptr
);
If you are feeling brave, there is a way to fully avoid using any locks at all. The idea is to use an AtomicPtr as the static variable, initialized to null (Box pointers are never null).
I could try it and see if we don't introduce new race conditions. However, the function is most likely not called in huge concurrency. So I don't see the benefit in not locking, here.

Ok.

In your example, you have a pointer to a usize. However, we have to store a pointer to a Box<dyn Trait>, since we can't have a pointer to a dyn Trait. Thus, we have to wrap this into another Box<_> for which then your examined behavior holds.

rmanoka · 2021-09-16T07:33:33Z

I hesitate to switch to AtomicPtr since we then guarantee to not drop the Box<Fn> at the end of our program. But I haven't used it before. If you have an idea for that, I could try it.

Rust actually never drops any static variable, so even now things won't get dropped when the program exits unless remove_handler is called explicitly. The lock-free stuff is probably unnecessary complexity, so let's stick with the current impl.

In your example, you have a pointer to a usize. However, we have to store a pointer to a Box<dyn Trait>, since we can't have a pointer to a dyn Trait. Thus, we have to wrap this into another Box<_> for which then your examined behavior holds.

See this playground for example of boxing a FnMut into a Mutex and calling it via a raw ptr. Basically the same use-case as here.

Also, pls. add a line in CHANGES.md about this feature.

ChristianBeilschmidt · 2021-09-16T11:43:33Z

I hesitate to switch to AtomicPtr since we then guarantee to not drop the Box<Fn> at the end of our program. But I haven't used it before. If you have an idea for that, I could try it.

Rust actually never drops any static variable, so even now things won't get dropped when the program exits unless remove_handler is called explicitly. The lock-free stuff is probably unnecessary complexity, so let's stick with the current impl.

In your example, you have a pointer to a usize. However, we have to store a pointer to a Box<dyn Trait>, since we can't have a pointer to a dyn Trait. Thus, we have to wrap this into another Box<_> for which then your examined behavior holds.

See this playground for example of boxing a FnMut into a Mutex and calling it via a raw ptr. Basically the same use-case as here.

The playground, unfortunately, disregards one aspect as far as I see:
Currently, we store a pointer to a Box<Fn>. This is a 64-bit pointer type.
However, a pointer to dyn Fn is a 128-bit pointer type. Rust uses a fat pointer to store the length information that it can derive for sized structs.
Thus, I can cast the *mut c_void from pUserData, which is 64-bit, to *mut Box<Fn> but I cannot cast it to *mut dyn Fn.
This is why we need the doubled box.

Also, pls. add a line in CHANGES.md about this feature.

a4df453

rmanoka · 2021-09-16T12:16:49Z

The playground, unfortunately, disregards one aspect as far as I see:
Currently, we store a pointer to a Box<Fn>. This is a 64-bit pointer type.
However, a pointer to dyn Fn is a 128-bit pointer type. Rust uses a fat pointer to store the length information that it can derive for sized structs.
Thus, I can cast the *mut c_void from pUserData, which is 64-bit, to *mut Box<Fn> but I cannot cast it to *mut dyn Fn.
This is why we need the doubled box.

Hmm, that's interesting. Could you add the explanations in the code for future reference? I suppose we can avoid the extra allocation if we don't use user-data at all, and just use the lock within the error handler.

rmanoka · 2021-09-16T13:08:22Z

I'll go ahead and approve the pr as I don't see any major issues. Thanks @ChristianBeilschmidt

ChristianBeilschmidt · 2021-09-16T14:32:20Z

The playground, unfortunately, disregards one aspect as far as I see:
Currently, we store a pointer to a Box<Fn>. This is a 64-bit pointer type.
However, a pointer to dyn Fn is a 128-bit pointer type. Rust uses a fat pointer to store the length information that it can derive for sized structs.
Thus, I can cast the *mut c_void from pUserData, which is 64-bit, to *mut Box<Fn> but I cannot cast it to *mut dyn Fn.
This is why we need the doubled box.

Hmm, that's interesting. Could you add the explanations in the code for future reference? I suppose we can avoid the extra allocation if we don't use user-data at all, and just use the lock within the error handler.

44db095

I'm also learning new things with this FFI stuff. So it is good that you look carefully at these PRs 😄.

two options for creating error callbacks

8b609cc

changed lazy_static to once_cell

da4c2ac

ChristianBeilschmidt added 2 commits September 8, 2021 16:09

thread-safe error handler

0111fbd

move CplErrorType to error module

6387a29

ChristianBeilschmidt marked this pull request as ready for review September 8, 2021 14:22

rmanoka reviewed Sep 8, 2021

View reviewed changes

src/config.rs Outdated Show resolved Hide resolved

src/config.rs Outdated Show resolved Hide resolved

src/config.rs Show resolved Hide resolved

ChristianBeilschmidt added 3 commits September 10, 2021 15:08

removing error callback race-condition

bce691d

fixing doc-comments

40e7fc8

not pinning the box-fn box

4e5e45b

clarifying error callback handling

bd4beaa

added change to CHANGES.md

a4df453

rmanoka approved these changes Sep 16, 2021

View reviewed changes

explanation for double-box

44db095

jdroenner merged commit 638ca0a into georust:master Sep 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating error callbacks #215

Creating error callbacks #215

ChristianBeilschmidt commented Sep 7, 2021 •

edited

Loading

lnicola commented Sep 7, 2021 •

edited

Loading

ChristianBeilschmidt commented Sep 7, 2021

ChristianBeilschmidt commented Sep 8, 2021

rmanoka commented Sep 11, 2021 •

edited

Loading

ChristianBeilschmidt commented Sep 14, 2021 •

edited

Loading

jdroenner commented Sep 15, 2021

rmanoka commented Sep 15, 2021

ChristianBeilschmidt commented Sep 16, 2021

rmanoka commented Sep 16, 2021

ChristianBeilschmidt commented Sep 16, 2021

rmanoka commented Sep 16, 2021

rmanoka commented Sep 16, 2021

ChristianBeilschmidt commented Sep 16, 2021 •

edited

Loading

Creating error callbacks #215

Creating error callbacks #215

Conversation

ChristianBeilschmidt commented Sep 7, 2021 • edited Loading

lnicola commented Sep 7, 2021 • edited Loading

ChristianBeilschmidt commented Sep 7, 2021

ChristianBeilschmidt commented Sep 8, 2021

rmanoka commented Sep 11, 2021 • edited Loading

ChristianBeilschmidt commented Sep 14, 2021 • edited Loading

jdroenner commented Sep 15, 2021

rmanoka commented Sep 15, 2021

ChristianBeilschmidt commented Sep 16, 2021

rmanoka commented Sep 16, 2021

ChristianBeilschmidt commented Sep 16, 2021

rmanoka commented Sep 16, 2021

rmanoka commented Sep 16, 2021

ChristianBeilschmidt commented Sep 16, 2021 • edited Loading

ChristianBeilschmidt commented Sep 7, 2021 •

edited

Loading

lnicola commented Sep 7, 2021 •

edited

Loading

rmanoka commented Sep 11, 2021 •

edited

Loading

ChristianBeilschmidt commented Sep 14, 2021 •

edited

Loading

ChristianBeilschmidt commented Sep 16, 2021 •

edited

Loading