Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating error callbacks #215

Merged
merged 10 commits into from
Sep 17, 2021
Merged

Conversation

ChristianBeilschmidt
Copy link
Contributor

@ChristianBeilschmidt ChristianBeilschmidt commented Sep 7, 2021

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

I've attached two options for setting up the global GDAL error callback. One is thread-safe and comes with a new dependecy and the other one is not.

The reason for storing the callback in a static variable is that we cannot pass the ownership to GDAL. We can access a generic pointer via the pUserData field and cast it to our function type. Therefore, in order not to leak our FnMut, we have to store it somehow and drop it later on.

When we decide on which version to use, I can remove the other one and convert the draft to a PR.

@lnicola
Copy link
Member

lnicola commented Sep 7, 2021

Can you use once_cell instead? It might get folded into std at some point.

@ChristianBeilschmidt
Copy link
Contributor Author

Can you use once_cell instead? It might get folded into std at some point.

Done that in da4c2ac

@ChristianBeilschmidt
Copy link
Contributor Author

Okay, I've done two things:

  1. 0111fbd
    I removed the non-thread-safe variant.
    Moreover, I noticed that the previous version had a potential race condition when we would switch the callback in the mutex and then install the new callback. Thus, we have to first install the new callback to not have the old callback point to a dangling pointer. However, we have to first call it and then move the callback into the mutex without changing the reference that GDAL uses in p_user_data. This is why I wrapped the Box into a Pin<Box<_>>. If you have any improvement on that, I will try it.
  2. 6387a29
    I moved the error type enum to error.rs

@ChristianBeilschmidt ChristianBeilschmidt marked this pull request as ready for review September 8, 2021 14:22
src/config.rs Outdated Show resolved Hide resolved
src/config.rs Outdated Show resolved Hide resolved
src/config.rs Show resolved Hide resolved
@rmanoka
Copy link
Contributor

rmanoka commented Sep 11, 2021

@ChristianBeilschmidt Thanks for the implementation. Since concurrency is a bit hard to debug / reason about, I wanted to discuss a few points on the overall impl.

  1. Callback Storage. Currently you're storing the callback both in the static variable, and in GDAL's ErrorHandlerUserData. We should have a single source of truth; I think just using the static variable is simpler.

  2. Function Type. I was a bit worried about the FnMut possibly being called in parallel in error_handler. However, it seems GDAL uses a mutex to ensure the error handler is not called / assigned in parallel. Could you add some comments or GDAL doc links explaining this. The function needs Sync in addition to Send + 'static as it is being accessed from different threads.

  3. Poison Handling. I think the only panic while holding the mutex in your implementation is if CPLSetErrorHandler panics. Thus, the function pointer is valid even if poisoned. Pls. explain this in the code for future reference.

  4. Locking. The mutex code seems good to me. I still think just a Box<ErrorCallbackType> suffices, as the locking mechanism ensures the pointer doesn't change while being accessed.

    If you are feeling brave, there is a way to fully avoid using any locks at all. The idea is to use an AtomicPtr as the static variable, initialized to null (Box pointers are never null).

    Before setting the handler, use AtomicPtr::swap:

    // Swap should be done before CPLSetHandler
    let old_ptr = ERROR_CALLBACK.swap(Box::into_raw(callback), Ordering::SeqCst);
    CPLSetHandler(...)
    // Drop logic must happen after CPLSetHandler
    if old_addr > 0 {
        // Convert back to Box to destruct properly
        unsafe { Box::from_raw(old_addr as *mut ErrorCallbackType) };
    }

    Removal is similar, swap in null_mut

    let old_ptr = ERROR_CALLBACK.swap(null_mut(), Ordering::SeqCst);
    CPLSetHandler(None);
    // Drop old_ptr

    This is somewhat like the Treiber stack, but much simpler as we just store one value and piggy-back on GDAL's mutex. See also this neat rust exposition. Unlike the treiber stack, we can in fact safely drop the memory as we piggy-back on GDAL's mutex mechanism.

    Calling is simple. If our fn error_handler is called, then we know a handler we set is currently active. So we can use AtomicPtr::load to read the ptr, and call the function. If the pointer is not null, we can be sure the pointed memory is not freed as the CPLSetHandler call won't finish before the error handler return. Only edge case is if the handler is called while we're removing the handler, in which case the ptr may be null. We just need to exit without calling in this case.

Edit: fix lock-free logic to not require leaking the box ptr.

@ChristianBeilschmidt
Copy link
Contributor Author

ChristianBeilschmidt commented Sep 14, 2021

@ChristianBeilschmidt Thanks for the implementation. Since concurrency is a bit hard to debug / reason about, I wanted to discuss a few points on the overall impl.

1. Callback Storage. Currently you're storing the callback both in the static variable, and in GDAL's `ErrorHandlerUserData`. We should have a single source of truth; I think just using the static variable is simpler.

I guess we can't. GDAL doesn't take the ownership of that pointer, it treats it as a reference. Thus, we would leak if we transfer the ownership to GDAL nevertheless and it gets discarded by a call to CPLSetErrorHandler (or whatever) that is not covered by our two functions.

2. Function Type.  I was a bit worried about the `FnMut` possibly being called in parallel in `error_handler`.  However, it seems GDAL [uses a mutex](https://github.com/OSGeo/gdal/blob/543d6ffd39b66a727da49cb398591959a8f644cf/gdal/port/cpl_error.cpp#L355) to ensure the error handler is not called / assigned in parallel.  Could you add some comments or GDAL doc links  explaining this.  The function needs `Sync` in addition to `Send + 'static` as it is being accessed from different threads.

I changed it here: bd4beaa

3. Poison Handling. I think the only panic while holding the mutex in your implementation is if `CPLSetErrorHandler` panics.  Thus, the function pointer is valid even if poisoned.  Pls. explain this in the code for future reference.

I've addressed it here: bd4beaa

4. Locking.  The mutex code seems good to me.  I still think just a `Box<ErrorCallbackType>` suffices, as the locking mechanism ensures the pointer doesn't change while being accessed.

if you only use one Box and you move it into the Mutex, the address changes. The inner Box makes the address stable.
And we need two stable addresses while swapping our new Box into the mutex while simultaneously ensuring that all error callbacks that happen in-between (from actual errors, not our function calls) have a valid pointer.

   If you are feeling brave, there is a way to fully avoid using any locks at all.  The idea is to use an `AtomicPtr` as the static variable, initialized to null (`Box` pointers are never null).
   Before setting the handler, use `AtomicPtr::swap`:
   ```rust
   // Swap should be done before CPLSetHandler
   let old_ptr = ERROR_CALLBACK.swap(Box::into_raw(callback), Ordering::SeqCst);
   CPLSetHandler(...)
   // Drop logic must happen after CPLSetHandler
   if old_addr > 0 {
       // Convert back to Box to destruct properly
       unsafe { Box::from_raw(old_addr as *mut ErrorCallbackType) };
   }
   ```
   
       
         
       
   
         
       
   
       
     
   Removal is similar, swap in `null_mut`
   ```rust
   let old_ptr = ERROR_CALLBACK.swap(null_mut(), Ordering::SeqCst);
   CPLSetHandler(None);
   // Drop old_ptr
   ```
   
   
       
         
       
   
         
       
   
       
     
   This is somewhat like the [Treiber stack](https://en.wikipedia.org/wiki/Treiber_stack), but much simpler as we just store one value and piggy-back on GDAL's mutex.  See also this neat [rust exposition](https://aturon.github.io/blog/2015/08/27/epoch/).  Unlike the treiber stack, we can in fact safely drop the memory as we piggy-back on GDAL's mutex mechanism.
   Calling is simple.   If our `fn error_handler` is called, then we know a handler we set is currently active.  So we can use `AtomicPtr::load` to read the ptr, and call the function.  If the pointer is not null, we can be sure the pointed memory is not freed as the `CPLSetHandler` call won't finish before the error handler return.  Only edge case is if the handler is called while we're removing the handler, in which case the ptr may be null.  We just need to exit without calling in this case.

Edit: fix lock-free logic to not require leaking the box ptr.

I could try it and see if we don't introduce new race conditions. However, the function is most likely not called in huge concurrency. So I don't see the benefit in not locking, here.

@jdroenner
Copy link
Member

can we merge it like it is or do we need more discussion / changes?

@rmanoka
Copy link
Contributor

rmanoka commented Sep 15, 2021

@ChristianBeilschmidt

  1. Callback Storage. Currently you're storing the callback both in the static variable, and in GDAL's ErrorHandlerUserData. We should have a single source of truth; I think just using the static variable is simpler.

I guess we can't. GDAL doesn't take the ownership of that pointer, it treats it as a reference. Thus, we would leak if we transfer the ownership to GDAL nevertheless and it gets discarded by a call to CPLSetErrorHandler (or whatever) that is not covered by our two functions.

I'm suggesting not using user-data of GDAL at all, and only using our locked structure. So the only pointer being passed to GDAL is the error_handler function which is a static function. It's mostly for code clarity, so optional.

  1. Locking. The mutex code seems good to me. I still think just a Box<ErrorCallbackType> suffices, as the locking mechanism ensures the pointer doesn't change while being accessed.

if you only use one Box and you move it into the Mutex, the address changes. The inner Box makes the address stable.
And we need two stable addresses while swapping our new Box into the mutex while simultaneously ensuring that all error callbacks that happen in-between (from actual errors, not our function calls) have a valid pointer.

If you move a Box into a Mutex, the data pointed to by the Box doesn't move. For instance:

let mut boxed = Box::new(0usize);
let ptr: *mut _ = boxed.as_mut() as *mut _;

let mutex = Mutex::new(boxed);
assert_eq!(
    mutex.lock().unwrap().as_mut() as *mut _,
    ptr
);

If you are feeling brave, there is a way to fully avoid using any locks at all. The idea is to use an AtomicPtr as the static variable, initialized to null (Box pointers are never null).
I could try it and see if we don't introduce new race conditions. However, the function is most likely not called in huge concurrency. So I don't see the benefit in not locking, here.

Ok.

@jdroenner A few minor suggestions, mostly for clarity and remove extra allocations.

@ChristianBeilschmidt
Copy link
Contributor Author

@ChristianBeilschmidt

  1. Callback Storage. Currently you're storing the callback both in the static variable, and in GDAL's ErrorHandlerUserData. We should have a single source of truth; I think just using the static variable is simpler.

I guess we can't. GDAL doesn't take the ownership of that pointer, it treats it as a reference. Thus, we would leak if we transfer the ownership to GDAL nevertheless and it gets discarded by a call to CPLSetErrorHandler (or whatever) that is not covered by our two functions.

I'm suggesting not using user-data of GDAL at all, and only using our locked structure. So the only pointer being passed to GDAL is the error_handler function which is a static function. It's mostly for code clarity, so optional.

I hesitate to switch to AtomicPtr since we then guarantee to not drop the Box<Fn> at the end of our program. But I haven't used it before. If you have an idea for that, I could try it.

  1. Locking. The mutex code seems good to me. I still think just a Box<ErrorCallbackType> suffices, as the locking mechanism ensures the pointer doesn't change while being accessed.

if you only use one Box and you move it into the Mutex, the address changes. The inner Box makes the address stable.
And we need two stable addresses while swapping our new Box into the mutex while simultaneously ensuring that all error callbacks that happen in-between (from actual errors, not our function calls) have a valid pointer.

If you move a Box into a Mutex, the data pointed to by the Box doesn't move. For instance:

let mut boxed = Box::new(0usize);
let ptr: *mut _ = boxed.as_mut() as *mut _;

let mutex = Mutex::new(boxed);
assert_eq!(
    mutex.lock().unwrap().as_mut() as *mut _,
    ptr
);

If you are feeling brave, there is a way to fully avoid using any locks at all. The idea is to use an AtomicPtr as the static variable, initialized to null (Box pointers are never null).
I could try it and see if we don't introduce new race conditions. However, the function is most likely not called in huge concurrency. So I don't see the benefit in not locking, here.

Ok.

In your example, you have a pointer to a usize. However, we have to store a pointer to a Box<dyn Trait>, since we can't have a pointer to a dyn Trait. Thus, we have to wrap this into another Box<_> for which then your examined behavior holds.

@rmanoka
Copy link
Contributor

rmanoka commented Sep 16, 2021

I hesitate to switch to AtomicPtr since we then guarantee to not drop the Box<Fn> at the end of our program. But I haven't used it before. If you have an idea for that, I could try it.

Rust actually never drops any static variable, so even now things won't get dropped when the program exits unless remove_handler is called explicitly. The lock-free stuff is probably unnecessary complexity, so let's stick with the current impl.

In your example, you have a pointer to a usize. However, we have to store a pointer to a Box<dyn Trait>, since we can't have a pointer to a dyn Trait. Thus, we have to wrap this into another Box<_> for which then your examined behavior holds.

See this playground for example of boxing a FnMut into a Mutex and calling it via a raw ptr. Basically the same use-case as here.

Also, pls. add a line in CHANGES.md about this feature.

@ChristianBeilschmidt
Copy link
Contributor Author

I hesitate to switch to AtomicPtr since we then guarantee to not drop the Box<Fn> at the end of our program. But I haven't used it before. If you have an idea for that, I could try it.

Rust actually never drops any static variable, so even now things won't get dropped when the program exits unless remove_handler is called explicitly. The lock-free stuff is probably unnecessary complexity, so let's stick with the current impl.

In your example, you have a pointer to a usize. However, we have to store a pointer to a Box<dyn Trait>, since we can't have a pointer to a dyn Trait. Thus, we have to wrap this into another Box<_> for which then your examined behavior holds.

See this playground for example of boxing a FnMut into a Mutex and calling it via a raw ptr. Basically the same use-case as here.

The playground, unfortunately, disregards one aspect as far as I see:
Currently, we store a pointer to a Box<Fn>. This is a 64-bit pointer type.
However, a pointer to dyn Fn is a 128-bit pointer type. Rust uses a fat pointer to store the length information that it can derive for sized structs.
Thus, I can cast the *mut c_void from pUserData, which is 64-bit, to *mut Box<Fn> but I cannot cast it to *mut dyn Fn.
This is why we need the doubled box.

Also, pls. add a line in CHANGES.md about this feature.

a4df453

@rmanoka
Copy link
Contributor

rmanoka commented Sep 16, 2021

The playground, unfortunately, disregards one aspect as far as I see:
Currently, we store a pointer to a Box<Fn>. This is a 64-bit pointer type.
However, a pointer to dyn Fn is a 128-bit pointer type. Rust uses a fat pointer to store the length information that it can derive for sized structs.
Thus, I can cast the *mut c_void from pUserData, which is 64-bit, to *mut Box<Fn> but I cannot cast it to *mut dyn Fn.
This is why we need the doubled box.

Hmm, that's interesting. Could you add the explanations in the code for future reference? I suppose we can avoid the extra allocation if we don't use user-data at all, and just use the lock within the error handler.

@rmanoka
Copy link
Contributor

rmanoka commented Sep 16, 2021

I'll go ahead and approve the pr as I don't see any major issues. Thanks @ChristianBeilschmidt

@ChristianBeilschmidt
Copy link
Contributor Author

ChristianBeilschmidt commented Sep 16, 2021

The playground, unfortunately, disregards one aspect as far as I see:
Currently, we store a pointer to a Box<Fn>. This is a 64-bit pointer type.
However, a pointer to dyn Fn is a 128-bit pointer type. Rust uses a fat pointer to store the length information that it can derive for sized structs.
Thus, I can cast the *mut c_void from pUserData, which is 64-bit, to *mut Box<Fn> but I cannot cast it to *mut dyn Fn.
This is why we need the doubled box.

Hmm, that's interesting. Could you add the explanations in the code for future reference? I suppose we can avoid the extra allocation if we don't use user-data at all, and just use the lock within the error handler.

44db095

I'm also learning new things with this FFI stuff. So it is good that you look carefully at these PRs 😄.

@jdroenner jdroenner merged commit 638ca0a into georust:master Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants