-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reloading default HighThroughputExecutor after parsl.clear()
results in KeyError: 'block_id'
exception
#2871
Comments
one thing in your example that sticks out to me is that you can't use the same In the case of the thread pool executor, this probably doesn't matter, but it's likely to be more of a problem with htex. There's a style for re-loading "the same" config that is used in (for example) the test suite and some applications that I work on that that looks something like this:
Probably we should make this more enforced - it wasn't part of the initial design of how Config was meant to work, but it's turned out since then that's a property of Config objects (and the subobjects contained inside a Config). Can you try something like the above and see how things go? |
I see in your comment on the cytotable issue cytomining/CytoTable#90 (comment) that you have that sort of behaviour as a recommended change - for a user to generate a new config in each for loop iteration. Either that, or (I'm not sure how your code is organised for this) make it so parsl/DFK can be initialised outside of the loop and not reloaded in each iteration in that loop. I've found it a bit easier to understand why things work that way by thinking of a |
Thank you @benclifford for the recommendations and clarity here! Is there a way to compare Parsl Config objects such that one could avoid "accidental" reload of configurations? I recognize deep object comparisons might perform this but wondered if there was something like a GUID or similar assigned to each configuration which could provide this capability. |
so if you want to say "we have a DFK loaded right now, and someone just gave me a config object ... is it the same config object? and if not, do a parsl shutdown/restart with the new config" then I would say you won't be able to compare any better than simple python object identity, using the For the example in CytoTable issue 90
probably for that to work, the user should create one so then the cytotable code can compare the parsl_config just passed in with the config object of the currently active parsl and switch configs if the user has supplied a new one on a subsequent call. If you avoid shutting down and restarting parsl each time, I think you'll also get some speedup from not paying the startup and shutdown costs of parsl for each loop iteration - depending on circumstances that might be significant, or might not. However, this style feels a bit weird to me and I'd kinda encourage you to think about if you can also initialise parsl outside such a loop - i guess mostly it depends on how much you expect your users to be changing configs as a loop progresses. |
Many thanks @benclifford for the recommendations here! I'll move forward with using Parsl as you described. |
Clarifies problems users found in the documentation error in parallel documentation #2915: Decorators are not needed for ParslPoolExecutor.map Reloading default HighThroughputExecutor after parsl.clear() results in KeyError: 'block_id' exception #2871 : Configs may not be re-used Fixes Docs: Further clarification on how serialization impacts where imports need to be #2918 : Explain when functions need imports Fixes Allowing for inputs and outputs to take immutable types #2925 : Explain mutable types are undesirable Thanks, @Andrew-S-Rosen
Describe the bug
Thanks for the great work on Parsl - it's an awesome tool! I noticed when attempting to load a HighThroughputExecutor configuration after using
parsl.clear()
an exception occurs which states:KeyError: 'block_id'
. This doesn't seem to happen when using ThreadPoolExecutor. I'm unsure if this is expected behavior so I wanted to raise a bug just in case it helps diagnose or address this issue.To Reproduce
Run the following Python snippet:
Expected behavior
I'd expect something like what is witnessed with the ThreadPoolExecutor from the following example. If this isn't possible or is unreasonable to expect, I'd expect documentation or an upfront message from the logs about workarounds.
Environment
Distributed Environment
The text was updated successfully, but these errors were encountered: