-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[minor] Use factories in existing nodes #303
Conversation
…structor Instead of "Meta" nodes
The object isn't holding instance level state and older versions of python bork here.
It's superceded by the snippets.factory stuff
If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module
With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability.
And reformat tests in the vein of usage in Function and Macro
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more |
Pull Request Test Coverage Report for Build 8886092588Details
💛 - Coveralls |
* Make node_function and type hints available at the _class level_ And fix one error type on the storage tests. * Pull the same trick with Macro But use an actual intermediate class instead of a function, so that we retain some of the `Composite` class methods and the IDE doesn't complain that we're capitalizing the name * Make the Function misdirection a class too * Refactor: Pull out a HasCreator interface So that `Composite` and the not-actually-a-`type[Node]` `Macro` can avoid duplicating code * Update docstrings * Format black * Codacy: don't import unused wraps * Make many AbstractFunction methods classmethods * Make output channel info available as a Function class attribute So channel labels and type hints (and later any ontological hints) can be viewed without ever instantiating a node. That means `output_labels` is no longer available at the class instance level, but only when defining a new class! * Make input channel info available as a Function class attribute This was much easier because we never need to inspect the source code manually (as we do sometimes to check return values). * Add/extend public method docstrings * Refactor: slide * Remove unused private attribute * Fix docstrings * Format black * Remove unused imports Just some housekeeping in the files we're touching anyhow * Fix docstring * Make macro output labels a class attribute * Format black * Bump typeguard from 4.1.5 to 4.2.1 (#262) * Bump typeguard from 4.1.5 to 4.2.1 Bumps [typeguard](https://github.com/agronholm/typeguard) from 4.1.5 to 4.2.1. - [Release notes](https://github.com/agronholm/typeguard/releases) - [Changelog](https://github.com/agronholm/typeguard/blob/master/docs/versionhistory.rst) - [Commits](agronholm/typeguard@4.1.5...4.2.1) --- updated-dependencies: - dependency-name: typeguard dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * [dependabot skip] Update environment * [dependabot skip] Update env file --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: pyiron-runner <[email protected]> * Revert warning and test for the graphviz call+windows combination (#275) Whatever was broken in the upstream dependency or github image seems to be fixed. * [patch] Update dependencies (#274) * Update dependencies * [dependabot skip] Update env file * Fix browser version typo * [dependabot skip] Update env file --------- Co-authored-by: pyiron-runner <[email protected]> * Revert save overload (#234) * Prepend instead of appending new connections Fetch looks for the first available data in connection lists, so prepend on connection formation so that newer connections get higher priority. * Add RandomFloat to the standard library * Use standard library for for and while loop docs examples * Fix type hint in while loop Now that `Function` is not a node and we automate the functionality of `SingleValueNode` * Don't use maps in the README example * Disallow IO maps on `AbstractMacro` I tried to break this down a reasonable amount, but I also want the tests passing at each commit. In principle, I could duplicate code instead of simply moving it to break this apart a bit further, but in practice I'm not expecting anyone to actually review this PR so that seems like needless effort. Here, I * Move `inputs/outputs_map` from `Composite` down to `Workflow` * Disable automatic IO generation on `AbstractMacro` -- you _need_ to define it by labels and the graph creator signature/returns * Add IO previewing and class-level stuff analogous to `AbstractFunction` onto `AbstractMacro` now that its IO can't be remapped * Update tests and examples accordingly * Finally, re-write the while- and for-loop meta nodes, which depended heavily on the presence of maps Note that although the for- and while-loops are totally different under the hood, the only user-facing changes are (a) you can only wrap nodes that are importable, and (b) the IO maps for the while-loop are mandatory instead of optional. * Fix atomistics macros We duplicate the task node signature for the macros now, but I am not currently stressed about this; there are ways around it, I just don't see the abstraction being worthwhile for three nodes. * Make the pyiron_atomistics Bulk node importable * Provide values for all for-loop input We don't pass the loop body defaults and we explicitly create input channels now, so we (sadly) need to provide all input. I played around scraping the defaults and hints from the body node, but the string representation vs code representation is just too much of a headache. Ultimately I think we will need to get away from this executed string somehow. * Format black * Prune macro IO nodes that don't get forked For the sake of computational efficiency, have macro IO value-link directly to child node IO wherever possible. * Ignore the notebook testing script I just created this so that I can check that the notebooks pass locally before pushing to github (just from my IDE without actually running the notebooks). For the CI we have a specific job that runs the notebooks with a different environment, so we don't want the remote CI discovering these tests. * Give the `Function` creator class a function-like name * Make it an actual function! * Rename `AbstractFunction` back to the simpler `Function` Now that that name is free again * Un-parent `Macro` from `HasCreator` It was easy enough to directly reference `AbstractMacro` in all the places in tests and documentation where we were previously exploiting this parentage. * Give the `Macro` creator class a function-like name * Make it a function! * Remove unused import * Rename `AbstractMacro` back to `Macro` * Update some text in the deepdive * Rename `wrap_as` to `wrap` In the process of getting nicer phrasing * Refactor: rename * Refactor: rename * Refactor: rename * Correct some mis-names that slipped into strings * Refactor: rename * Add back README stuff A chunk got lost in the mix, but thankfully I noticed reviewing the diff * Tweak docstrings * Remove line break from example output * Don't give `Function` special treatment for `self` * Add a test for protected names * Make the graph creator an abstract(static) method This was just an oversight. No impact on behaviour, but makes things clearer to devs. * Let Macro nodes scrape their output labels from their return statement Just like Function nodes do, but additionally left-stripping `macro.` (or `self.` or `wf.` or whatever the `graph_creator`'s first self-like argument is called) * Disallow degenerate output labels * Refactor: extract shared behaviour for scraping IO from a class method * Refactor: rename the dev-facing attribute for manually setting output labels * Refactor: rename classmethod getters * Refactor: simplify if * Refactor: more efficient if * Only build channel previews once * Refactor validation and apply it to both Macro and Function * Add examples to the deepdive of turning off output validation * Improve error messages * Refactor: move the preview stuff to its own file * Refactor: rename public facing preview methods * Introduce an interface class To separate the ideas of having IO information available at the class level, and scraping such data from a class method. * Refactor: slide public method up * Change class method to simple attribute * Introduce new abstract classes and reparent Macro and Function accordingly * Replace Macro and Function decorators With calls to a centralized decorator factory * Refactor: slide creators below the decorators they use * Extend docstrings * Move degeneracy validation inside the try loop As it will trigger an OSError off the same un-inspectable case * Introduce a custom warning type So that we can test whether or not we're catching _our_ warning * Test new IO classes explicitly and remove duplication downstream * Update notebooks * Format black * Add a helper to see both previews at once * Format black * Use `self` canonically as the first macro argument in the source code * Remove unused imports * Update readme * Update quickstart * Update deepdive * Run all starters when pulling * Refactor: break out earlier when the node in question is the starter * Refactor: immediately convert to list * Run parented pulls by running the parent By using these specially constructed execution signals * Track post-facto provenance * Remove recursion limit adjustment The stack always goes directly up to the parent in macros now, instead of through all previous calls, so we don't need hit against this limit anymore. * Format black * Fix bad type hint * Remove abstract method from Composite In Workflow the method just returns the first argument, so a method is entirely unnecessary; in Macro we legitimately need it, but now that this is the only place where it's implemented/used, we can give it an even more specific name. * Pull IO creation up to parent class * Remove unused import * Formatting * Format black * Use lru_cache * Move for- and while-loops to their own submodule * Give label a default in Node.__init__ * Don't pass type_hint directly to a node * Make HasIO.set_input_values accept args * Fix notebooks * Make all nodes parse kwargs as input in __post__ * Stop using __post__ Replace it with a system where `Node.__init__` calls `_setup_node` and then `_after_node_setup`, so that children of `Node` can use a combination of (a) doing stuff before calling `super().__init__` and (b) overriding `_node_setup` in order to be sure they accomplish what is needed prior to the post-setup stuff happening. This is a bit more cognitive overhead for node class writers, but simplifies the actual technology. * Format black * Allow run and run callbacks to take (optional!) arguments * Let run and related methods on Node take args as well as kwargs * Update pyiron-verse dependencies (#302) * Update pyiron-verse dependencies Synchronized to avoid headaches with patch number upper limits, so we can get it all done in a single pull * [dependabot skip] Update env file * Format black * Bypass failure point Everything still looks totally fine locally. Let's skip the failing doctest line and see if 3.10 has any problems at the unit test stage. * Add it back with some debug * Reduce debug `instance.result` exists and is the expected `DynamicFoo` object --------- Co-authored-by: pyiron-runner <[email protected]> * [minor] Factories instead of meta nodes, and use them for transformer nodes (#293) * Replace evaluated code to-/from-list nodes with actual classes Leveraging custom constructors * Develop meta abstraction and make a new home for transforming meta nodes * Accept node args too * Introduce a decorator for building the class-IO * Change paradigm to whether or not the node uses __reduced__ and a constructor Instead of "Meta" nodes * Allow direct use of Constructed children * Move and update constructed stuff * Add new singleton behaviour so factory-produced classes can pass is-tests * Apply constructed and class registration updates to the transformers * Remove (now unused) meta module * PEP8 newline * Remove unnecessary __getstate__ The object isn't holding instance level state and older versions of python bork here. * Add constructed __*state__ compatibility for older versions * 🐛 add missing `return` * Format black * Introduce a new factory pattern At the cost of requiring factory functions to forgo kwargs, we get object matching for factories and classes, and picklability for instances. Still need to check some edge cases around the use of stuff with a non-trivial qualname. * Test factories as methods * Test local scoping * Add docstrings and hide a function * Simplify super * Give demo class in tests have a more generic __init__ * Test and document multiple inheritance * Stop storing __init__ args By leveraging `__new__` and `__getnewargs_ex__` * Add clearing methods In case you want to change some constructor behaviour and clear the access cache. Not sure what should be cached at the end of the day, but for now just give users a shortcut for clearing it. * Use snippets.factory.classfactory and typing.ClassVar for transformers * Revert singleton * Format black * Remove constructed It's superceded by the snippets.factory stuff * Gently rename things for when the factory comes from a decorator * Allow factory made classes to also come from decorators * Format black --------- Co-authored-by: pyiron-runner <[email protected]> * [minor] Use factories in existing nodes (#303) * Change paradigm to whether or not the node uses __reduced__ and a constructor Instead of "Meta" nodes * Allow direct use of Constructed children * Move and update constructed stuff * Add new singleton behaviour so factory-produced classes can pass is-tests * PEP8 newline * Remove unnecessary __getstate__ The object isn't holding instance level state and older versions of python bork here. * Add constructed __*state__ compatibility for older versions * 🐛 add missing `return` * Format black * Revert singleton * Remove constructed It's superceded by the snippets.factory stuff * Format black * Let the factory clear method take specific names * Don't override __module__ to the factory function If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module * Clean up storage if job tests fail * Make tinybase the default storage backend * Switch Function and Macro over to using classfactory With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability. * Remove unused decorator And reformat tests in the vein of usage in Function and Macro * Format black --------- Co-authored-by: pyiron-runner <[email protected]> * [minor] Executor compliance (#304) * Change paradigm to whether or not the node uses __reduced__ and a constructor Instead of "Meta" nodes * Allow direct use of Constructed children * Move and update constructed stuff * Add new singleton behaviour so factory-produced classes can pass is-tests * PEP8 newline * Remove unnecessary __getstate__ The object isn't holding instance level state and older versions of python bork here. * Add constructed __*state__ compatibility for older versions * 🐛 add missing `return` * Format black * Revert singleton * Remove constructed It's superceded by the snippets.factory stuff * Format black * Let the factory clear method take specific names * Don't override __module__ to the factory function If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module * Clean up storage if job tests fail * Make tinybase the default storage backend * Switch Function and Macro over to using classfactory With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability. * Remove unused decorator And reformat tests in the vein of usage in Function and Macro * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Format black --------- Co-authored-by: pyiron-runner <[email protected]> * [patch] Make factory made objects cloudpickleable (#305) * Change paradigm to whether or not the node uses __reduced__ and a constructor Instead of "Meta" nodes * Allow direct use of Constructed children * Move and update constructed stuff * Add new singleton behaviour so factory-produced classes can pass is-tests * PEP8 newline * Remove unnecessary __getstate__ The object isn't holding instance level state and older versions of python bork here. * Add constructed __*state__ compatibility for older versions * 🐛 add missing `return` * Format black * Revert singleton * Remove constructed It's superceded by the snippets.factory stuff * Format black * Let the factory clear method take specific names * Don't override __module__ to the factory function If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module * Clean up storage if job tests fail * Make tinybase the default storage backend * Switch Function and Macro over to using classfactory With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability. * Remove unused decorator And reformat tests in the vein of usage in Function and Macro * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Compute qualname if not provided * Fail early if there is a <locals> function in the factory made hierarchy * Skip the factory fanciness if you see <locals> This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling. Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle * Format black * Test ClassFactory this way too --------- Co-authored-by: pyiron-runner <[email protected]> * [patch] More transformers (#306) * Change paradigm to whether or not the node uses __reduced__ and a constructor Instead of "Meta" nodes * Allow direct use of Constructed children * Move and update constructed stuff * Add new singleton behaviour so factory-produced classes can pass is-tests * PEP8 newline * Remove unnecessary __getstate__ The object isn't holding instance level state and older versions of python bork here. * Add constructed __*state__ compatibility for older versions * 🐛 add missing `return` * Format black * Revert singleton * Remove constructed It's superceded by the snippets.factory stuff * Format black * Let the factory clear method take specific names * Don't override __module__ to the factory function If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module * Clean up storage if job tests fail * Make tinybase the default storage backend * Switch Function and Macro over to using classfactory With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability. * Remove unused decorator And reformat tests in the vein of usage in Function and Macro * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Compute qualname if not provided * Fail early if there is a <locals> function in the factory made hierarchy * Skip the factory fanciness if you see <locals> This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling. Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle * Format black * Test ClassFactory this way too * Test existing list nodes * Rename length base class * Refactor transformers to use on_run and run_args more directly * Introduce an inputs-to-dict transformer * Preview IO as a separate step To guarantee IO construction happens as early as possible in case it fails * Add dataframe transformer * Remove prints 🤦 * Add dataframe transformer tests * Add transformers to the create menu * Format black --------- Co-authored-by: pyiron-runner <[email protected]> * [patch] 🧹 be more consistent in caching/shortcuts (#307) * Change paradigm to whether or not the node uses __reduced__ and a constructor Instead of "Meta" nodes * Allow direct use of Constructed children * Move and update constructed stuff * Add new singleton behaviour so factory-produced classes can pass is-tests * PEP8 newline * Remove unnecessary __getstate__ The object isn't holding instance level state and older versions of python bork here. * Add constructed __*state__ compatibility for older versions * 🐛 add missing `return` * Format black * Revert singleton * Remove constructed It's superceded by the snippets.factory stuff * Format black * Let the factory clear method take specific names * Don't override __module__ to the factory function If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module * Clean up storage if job tests fail * Make tinybase the default storage backend * Switch Function and Macro over to using classfactory With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability. * Remove unused decorator And reformat tests in the vein of usage in Function and Macro * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Compute qualname if not provided * Fail early if there is a <locals> function in the factory made hierarchy * Skip the factory fanciness if you see <locals> This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling. Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle * Format black * Test ClassFactory this way too * Test existing list nodes * Rename length base class * Refactor transformers to use on_run and run_args more directly * Introduce an inputs-to-dict transformer * Preview IO as a separate step To guarantee IO construction happens as early as possible in case it fails * Add dataframe transformer * Remove prints 🤦 * Add dataframe transformer tests * Add transformers to the create menu * Format black * 🧹 be more consistent in caching/shortcuts Instead of always defining private holders by hand * Format black --------- Co-authored-by: pyiron-runner <[email protected]> * [patch] Dataclass transformer (#308) * Change paradigm to whether or not the node uses __reduced__ and a constructor Instead of "Meta" nodes * Allow direct use of Constructed children * Move and update constructed stuff * Add new singleton behaviour so factory-produced classes can pass is-tests * PEP8 newline * Remove unnecessary __getstate__ The object isn't holding instance level state and older versions of python bork here. * Add constructed __*state__ compatibility for older versions * 🐛 add missing `return` * Format black * Revert singleton * Remove constructed It's superceded by the snippets.factory stuff * Format black * Let the factory clear method take specific names * Don't override __module__ to the factory function If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module * Clean up storage if job tests fail * Make tinybase the default storage backend * Switch Function and Macro over to using classfactory With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability. * Remove unused decorator And reformat tests in the vein of usage in Function and Macro * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Compute qualname if not provided * Fail early if there is a <locals> function in the factory made hierarchy * Skip the factory fanciness if you see <locals> This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling. Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle * Format black * Test ClassFactory this way too * Test existing list nodes * Rename length base class * Refactor transformers to use on_run and run_args more directly * Introduce an inputs-to-dict transformer * Preview IO as a separate step To guarantee IO construction happens as early as possible in case it fails * Add dataframe transformer * Remove prints 🤦 * Add dataframe transformer tests * Add transformers to the create menu * Format black * 🧹 be more consistent in caching/shortcuts Instead of always defining private holders by hand * Introduce a dataclass node * Give the dataclass node a simpler name Since we can inject attribute access, I don't anticipate ever needing the reverse dataclass-to-outputs node, so let's simplify the naming here. * Remove unused import * Set the output type hint automatically * Add docs * Add tests * Format black * PEP8 newline * Format black --------- Co-authored-by: pyiron-runner <[email protected]> * [minor] Introduce for-loop (#309) * Change paradigm to whether or not the node uses __reduced__ and a constructor Instead of "Meta" nodes * Allow direct use of Constructed children * Move and update constructed stuff * Add new singleton behaviour so factory-produced classes can pass is-tests * PEP8 newline * Remove unnecessary __getstate__ The object isn't holding instance level state and older versions of python bork here. * Add constructed __*state__ compatibility for older versions * 🐛 add missing `return` * Format black * Revert singleton * Remove constructed It's superceded by the snippets.factory stuff * Format black * Let the factory clear method take specific names * Don't override __module__ to the factory function If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module * Clean up storage if job tests fail * Make tinybase the default storage backend * Switch Function and Macro over to using classfactory With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability. * Remove unused decorator And reformat tests in the vein of usage in Function and Macro * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Expose concurrent.futures executors on the creator * Only expose the base Executor from pympipool Doesn't hurt us now and prepares for the version bump * Extend `Runnable` to use a non-static method This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too. Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`. This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods. * Format black * Compute qualname if not provided * Fail early if there is a <locals> function in the factory made hierarchy * Skip the factory fanciness if you see <locals> This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling. Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle * Format black * Test ClassFactory this way too * Test existing list nodes * Rename length base class * Refactor transformers to use on_run and run_args more directly * Introduce an inputs-to-dict transformer * Preview IO as a separate step To guarantee IO construction happens as early as possible in case it fails * Add dataframe transformer * Remove prints 🤦 * Add dataframe transformer tests * Add transformers to the create menu * Format black * 🧹 be more consistent in caching/shortcuts Instead of always defining private holders by hand * Introduce a dataclass node * Give the dataclass node a simpler name Since we can inject attribute access, I don't anticipate ever needing the reverse dataclass-to-outputs node, so let's simplify the naming here. * Remove unused import * Set the output type hint automatically * Add docs * Add tests * Format black * PEP8 newline * Introduce for-loop * Refactor: break _build_body into smaller functions * Resolve dataframe column name conflicts When a body node has the same labels for looped input as for output * Update docstrings * Refactor: rename file * Don't break when one of iter or zipped is empty * 🐛 pass body hint, not hint and default, to row collector input hint * Silence disconnection warning Since(?) disconnection is reciprocal it was firing left right and centre * Update deepdive * Remove old for loop * Remove unused import * Add tests * Remove unused attributes * Add a shortcut for assigning an executor to all body nodes * Format black * Format black --------- Co-authored-by: pyiron-runner <[email protected]> * Bump pyiron ecosystem (#313) * Bump pyiron ecosystem * [dependabot skip] Update env file * Bumpy pympipool in conda env * Be more gracious in parallel for-loop speed test * [dependabot skip] Update env file * Bump base in the conda env * [dependabot skip] Update env file * Downgrade pympipool pylammpsmpi, and thus pyiron_atomistics is still a patch behind, and to my chagrin patches remain hard-pinned in pyiron_atomistics * Update deepdive notebook to use changed pympipool signature * [dependabot skip] Update env file --------- Co-authored-by: pyiron-runner <[email protected]> * Move pandas to main env * Add matgl to notebooks env * [dependabot skip] Update env file * Introduce new access points to zip and iter loop dataframes right on the node * Remove Joerg's iter * Update decorator calls * 🔥 🐛 Don't pull on input nodes (#318) * 🔥 🐛 Don't pull on input nodes A simple call invokes a pull, including pulling the parent tree -- this blows up when the loop is inside a parent scope instead of alone. Since we _know_ these are just user input nodes with direct value connections to the loop macro's scope, we never need to run the pull, or indeed even a fetch the value. * Format black --------- Co-authored-by: pyiron-runner <[email protected]> * Make databases a module * Expose output maps on for-loop convenience access points * Get the notebooks running * Rewiring some stuff * Using the new iter method * Formatting nodes with PascalCase * Whatever else was needed to get the notebooks to execute * Format black --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: pyiron-runner <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This leverages the
@classfactory
tools from the last PR to make existingFunction
andMacro
nodes pickleable. That means that the entire standard (need to check for- and while- loops still) ecosystem can now be pickled:This had one giant, nasty side effect: storing with the
h5io
back-end is almost completely useless.h5io
can't handle anything that returns a non-standard reconstructor in its reduced value, which is now almost all of the usual nodes. For what I suspect is a similar reason, we can't just slap a node onto aDataContainer
and to-hdf it.To mitigate this pain, I switched the default back-end over to
tinybase
(which works fine, I believe due to its fallback ontocloudpickle
-- kudos to @pmrv), and simply pickle the input node on theNodeOutputJob
interface withpyiron_base
. I'll also stack on top of this apickle
(and probablycloudpickle
, in case the nodes carry unpickleable data) back-end and make that default. Hierarchical and browsable long-term storage is still a key objective (👍 @niklassiemer), but I want to delay it until we have a storage tool that actually supports such storage in a pickle-compliant way. I.e. I want to start with a foundation of being pickle-compliant, and then find (or more likely write) a storage tool that is also pickle-compliant, than to bend over backwards to get long-term storage now.