Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joerg's lammps nodes #33

Closed
wants to merge 114 commits into from
Closed

Joerg's lammps nodes #33

wants to merge 114 commits into from

Conversation

liamhuber
Copy link
Member

Just directly pulled over from the contrib PR plus updating the workflow import paths.

@github-actions
Copy link

Binder 👈 Launch a binder notebook on branch pyiron/pyiron_workflow/JNmpi_lammps_nodes

@codacy-production
Copy link

codacy-production bot commented Oct 16, 2023

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-23.23% (target: -1.00%) 0.41%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (63d9288) 2600 2222 85.46%
Head commit (d50d3ad) 3577 (+977) 2226 (+4) 62.23% (-23.23%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#33) 977 4 0.41%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

@github-actions
Copy link

github-actions bot commented Oct 21, 2023

Pull Request Test Coverage Report for Build 7391309890

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-4.4%) to 85.828%

Totals Coverage Status
Change from base Build 7391222931: -4.4%
Covered Lines: 4536
Relevant Lines: 5285

💛 - Coveralls

JNmpi added 3 commits October 21, 2023 20:38
several new node_library modules
several routines to make registering and replacing modules more convenient
added notebook to explore how close we can get to pyiron syntax
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@liamhuber
Copy link
Member Author

Notebooks test is failing because I disabled registering new packages in order to get macros working on executors. Getting registration working again is my second priority (after a pull method), but I'll hotfix this branch later tonight to give the lammps nodes the same special favour that atomistics and standard nodes have so they're available.

@liamhuber liamhuber mentioned this pull request Oct 23, 2023
@liamhuber
Copy link
Member Author

I'm also running into some surprising (to me) errors:

Starting with

from pyiron_workflow import Workflow

wf = Workflow("tmp")
wf.p1 = Workflow.create.standard.Add(1)

Running linearly works fine, but once I introduce multiple workers,

wf.iter(p1__other=[1,2,3,4], max_workers=2, executor=True)

Gives TypeError: Workflow.__init__() got an unexpected keyword argument 'p1__other'; my guess is because the re-instantiated workflow doesn't have a p1 node at all, although I'm not certain yet.

And

wf.p1.iter(other=[1,2,3,4], max_workers=2, executor=True)

Gives

ReadinessError: Add received a run command but is not ready. The node should be neither running nor failed, and all input values should conform to type hints.
Add readiness: False
STATE:
running: False
failed: False
INPUTS:
obj ready: False
other ready: True

Almost certainly because the first argument value is not getting carried along to the new copies. wf.p1.iter(obj=[1], other=[1,2,3,4], max_workers=2, executor=True) fails, but with the file error that has to do with trying to merge storage in, which means the input is passing just fine and this confirms my suspicion.

@liamhuber liamhuber mentioned this pull request Mar 28, 2024
11 tasks
JNmpi and others added 5 commits March 29, 2024 08:17
# Conflicts:
#	.binder/environment.yml
#	.ci_support/environment-notebooks.yml
#	notebooks/deepdive.ipynb
#	notebooks/quickstart.ipynb
#	pyiron_workflow/node.py
#	setup.py
* Make node_function and type hints available at the _class level_

And fix one error type on the storage tests.

* Pull the same trick with Macro

But use an actual intermediate class instead of a function, so that we retain some of the `Composite` class methods and the IDE doesn't complain that we're capitalizing the name

* Make the Function misdirection a class too

* Refactor: Pull out a HasCreator interface

So that `Composite` and the not-actually-a-`type[Node]` `Macro` can avoid duplicating code

* Update docstrings

* Format black

* Codacy: don't import unused wraps

* Make many AbstractFunction methods classmethods

* Make output channel info available as a Function class attribute

So channel labels and type hints (and later any ontological hints) can be viewed without ever instantiating a node. That means `output_labels` is no longer available at the class instance level, but only when defining a new class!

* Make input channel info available as a Function class attribute

This was much easier because we never need to inspect the source code manually (as we do sometimes to check return values).

* Add/extend public method docstrings

* Refactor: slide

* Remove unused private attribute

* Fix docstrings

* Format black

* Remove unused imports

Just some housekeeping in the files we're touching anyhow

* Fix docstring

* Make macro output labels a class attribute

* Format black

* Bump typeguard from 4.1.5 to 4.2.1 (#262)

* Bump typeguard from 4.1.5 to 4.2.1

Bumps [typeguard](https://github.com/agronholm/typeguard) from 4.1.5 to 4.2.1.
- [Release notes](https://github.com/agronholm/typeguard/releases)
- [Changelog](https://github.com/agronholm/typeguard/blob/master/docs/versionhistory.rst)
- [Commits](agronholm/typeguard@4.1.5...4.2.1)

---
updated-dependencies:
- dependency-name: typeguard
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* [dependabot skip] Update environment

* [dependabot skip] Update env file

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pyiron-runner <[email protected]>

* Revert warning and test for the graphviz call+windows combination (#275)

Whatever was broken in the upstream dependency or github image seems to be fixed.

* [patch] Update dependencies (#274)

* Update dependencies

* [dependabot skip] Update env file

* Fix browser version typo

* [dependabot skip] Update env file

---------

Co-authored-by: pyiron-runner <[email protected]>

* Revert save overload (#234)

* Prepend instead of appending new connections

Fetch looks for the first available data in connection lists, so prepend on connection formation so that newer connections get higher priority.

* Add RandomFloat to the standard library

* Use standard library for for and while loop docs examples

* Fix type hint in while loop

Now that `Function` is not a node and we automate the functionality of `SingleValueNode`

* Don't use maps in the README example

* Disallow IO maps on `AbstractMacro`

I tried to break this down a reasonable amount, but I also want the tests passing at each commit. In principle, I could duplicate code instead of simply moving it to break this apart a bit further, but in practice I'm not expecting anyone to actually review this PR so that seems like needless effort. Here, I

* Move `inputs/outputs_map` from `Composite` down to `Workflow`
* Disable automatic IO generation on `AbstractMacro` -- you _need_ to define it by labels and the graph creator signature/returns
* Add IO previewing and class-level stuff analogous to `AbstractFunction` onto `AbstractMacro` now that its IO can't be remapped
* Update tests and examples accordingly
* Finally, re-write the while- and for-loop meta nodes, which depended heavily on the presence of maps

Note that although the for- and while-loops are totally different under the hood, the only user-facing changes are (a) you can only wrap nodes that are importable, and (b) the IO maps for the while-loop are mandatory instead of optional.

* Fix atomistics macros

We duplicate the task node signature for the macros now, but I am not currently stressed about this; there are ways around it, I just don't see the abstraction being worthwhile for three nodes.

* Make the pyiron_atomistics Bulk node importable

* Provide values for all for-loop input

We don't pass the loop body defaults and we explicitly create input channels now, so we (sadly) need to provide all input. I played around scraping the defaults and hints from the body node, but the string representation vs code representation is just too much of a headache. Ultimately I think we will need to get away from this executed string somehow.

* Format black

* Prune macro IO nodes that don't get forked

For the sake of computational efficiency, have macro IO value-link directly to child node IO wherever possible.

* Ignore the notebook testing script

I just created this so that I can check that the notebooks pass locally before pushing to github (just from my IDE without actually running the notebooks). For the CI we have a specific job that runs the notebooks with a different environment, so we don't want the remote CI discovering these tests.

* Give the `Function` creator class a function-like name

* Make it an actual function!

* Rename `AbstractFunction` back to the simpler `Function`

Now that that name is free again

* Un-parent `Macro` from `HasCreator`

It was easy enough to directly reference `AbstractMacro` in all the places in tests and documentation where we were previously exploiting this parentage.

* Give the `Macro` creator class a function-like name

* Make it a function!

* Remove unused import

* Rename `AbstractMacro` back to `Macro`

* Update some text in the deepdive

* Rename `wrap_as` to `wrap`

In the process of getting nicer phrasing

* Refactor: rename

* Refactor: rename

* Refactor: rename

* Correct some mis-names that slipped into strings

* Refactor: rename

* Add back README stuff

A chunk got lost in the mix, but thankfully I noticed reviewing the diff

* Tweak docstrings

* Remove line break from example output

* Don't give `Function` special treatment for `self`

* Add a test for protected names

* Make the graph creator an abstract(static) method

This was just an oversight. No impact on behaviour, but makes things clearer to devs.

* Let Macro nodes scrape their output labels from their return statement

Just like Function nodes do, but additionally left-stripping `macro.` (or `self.` or `wf.` or whatever the `graph_creator`'s first self-like argument is called)

* Disallow degenerate output labels

* Refactor: extract shared behaviour for scraping IO from a class method

* Refactor: rename the dev-facing attribute for manually setting output labels

* Refactor: rename classmethod getters

* Refactor: simplify if

* Refactor: more efficient if

* Only build channel previews once

* Refactor validation and apply it to both Macro and Function

* Add examples to the deepdive of turning off output validation

* Improve error messages

* Refactor: move the preview stuff to its own file

* Refactor: rename public facing preview methods

* Introduce an interface class

To separate the ideas of having IO information available at the class level, and scraping such data from a class method.

* Refactor: slide public method up

* Change class method to simple attribute

* Introduce new abstract classes and reparent Macro and Function accordingly

* Replace Macro and Function decorators

With calls to a centralized decorator factory

* Refactor: slide creators below the decorators they use

* Extend docstrings

* Move degeneracy validation inside the try loop

As it will trigger an OSError off the same un-inspectable case

* Introduce a custom warning type

So that we can test whether or not we're catching _our_ warning

* Test new IO classes explicitly and remove duplication downstream

* Update notebooks

* Format black

* Add a helper to see both previews at once

* Format black

* Use `self` canonically as the first macro argument in the source code

* Remove unused imports

* Update readme

* Update quickstart

* Update deepdive

* Run all starters when pulling

* Refactor: break out earlier when the node in question is the starter

* Refactor: immediately convert to list

* Run parented pulls by running the parent

By using these specially constructed execution signals

* Track post-facto provenance

* Remove recursion limit adjustment

The stack always goes directly up to the parent in macros now, instead of through all previous calls, so we don't need hit against this limit anymore.

* Format black

* Fix bad type hint

* Remove abstract method from Composite

In Workflow the method just returns the first argument, so a method is entirely unnecessary; in Macro we legitimately need it, but now that this is the only place where it's implemented/used, we can give it an even more specific name.

* Pull IO creation up to parent class

* Remove unused import

* Formatting

* Format black

* Use lru_cache

* Move for- and while-loops to their own submodule

* Give label a default in Node.__init__

* Don't pass type_hint directly to a node

* Make HasIO.set_input_values accept args

* Fix notebooks

* Make all nodes parse kwargs as input in __post__

* Stop using __post__

Replace it with a system where `Node.__init__` calls `_setup_node` and then `_after_node_setup`, so that children of `Node` can use a combination of (a) doing stuff before calling `super().__init__` and (b) overriding `_node_setup` in order to be sure they accomplish what is needed prior to the post-setup stuff happening. This is a bit more cognitive overhead for node class writers, but simplifies the actual technology.

* Format black

* Allow run and run callbacks to take (optional!) arguments

* Let run and related methods on Node take args as well as kwargs

* Update pyiron-verse dependencies (#302)

* Update pyiron-verse dependencies

Synchronized to avoid headaches with patch number upper limits, so we can get it all done in a single pull

* [dependabot skip] Update env file

* Format black

* Bypass failure point

Everything still looks totally fine locally. Let's skip the failing doctest line and see if 3.10 has any problems at the unit test stage.

* Add it back with some debug

* Reduce debug

`instance.result` exists and is the expected `DynamicFoo` object

---------

Co-authored-by: pyiron-runner <[email protected]>

* [minor] Factories instead of meta nodes, and use them for transformer nodes (#293)

* Replace evaluated code to-/from-list nodes with actual classes

Leveraging custom constructors

* Develop meta abstraction and make a new home for transforming meta nodes

* Accept node args too

* Introduce a decorator for building the class-IO

* Change paradigm to whether or not the node uses __reduced__ and a constructor

Instead of "Meta" nodes

* Allow direct use of Constructed children

* Move and update constructed stuff

* Add new singleton behaviour so factory-produced classes can pass is-tests

* Apply constructed and class registration updates to the transformers

* Remove (now unused) meta module

* PEP8 newline

* Remove unnecessary __getstate__

The object isn't holding instance level state and older versions of python bork here.

* Add constructed __*state__ compatibility for older versions

* 🐛 add missing `return`

* Format black

* Introduce a new factory pattern

At the cost of requiring factory functions to forgo kwargs, we get object matching for factories and classes, and picklability for instances. Still need to check some edge cases around the use of stuff with a non-trivial qualname.

* Test factories as methods

* Test local scoping

* Add docstrings and hide a function

* Simplify super

* Give demo class in tests have a more generic __init__

* Test and document multiple inheritance

* Stop storing __init__ args

By leveraging `__new__` and `__getnewargs_ex__`

* Add clearing methods

In case you want to change some constructor behaviour and clear the access cache. Not sure what should be cached at the end of the day, but for now just give users a shortcut for clearing it.

* Use snippets.factory.classfactory and typing.ClassVar for transformers

* Revert singleton

* Format black

* Remove constructed

It's superceded by the snippets.factory stuff

* Gently rename things for when the factory comes from a decorator

* Allow factory made classes to also come from decorators

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* [minor] Use factories in existing nodes (#303)

* Change paradigm to whether or not the node uses __reduced__ and a constructor

Instead of "Meta" nodes

* Allow direct use of Constructed children

* Move and update constructed stuff

* Add new singleton behaviour so factory-produced classes can pass is-tests

* PEP8 newline

* Remove unnecessary __getstate__

The object isn't holding instance level state and older versions of python bork here.

* Add constructed __*state__ compatibility for older versions

* 🐛 add missing `return`

* Format black

* Revert singleton

* Remove constructed

It's superceded by the snippets.factory stuff

* Format black

* Let the factory clear method take specific names

* Don't override __module__ to the factory function

If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module

* Clean up storage if job tests fail

* Make tinybase the default storage backend

* Switch Function and Macro over to using classfactory

With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability.

* Remove unused decorator

And reformat tests in the vein of usage in Function and Macro

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* [minor] Executor compliance (#304)

* Change paradigm to whether or not the node uses __reduced__ and a constructor

Instead of "Meta" nodes

* Allow direct use of Constructed children

* Move and update constructed stuff

* Add new singleton behaviour so factory-produced classes can pass is-tests

* PEP8 newline

* Remove unnecessary __getstate__

The object isn't holding instance level state and older versions of python bork here.

* Add constructed __*state__ compatibility for older versions

* 🐛 add missing `return`

* Format black

* Revert singleton

* Remove constructed

It's superceded by the snippets.factory stuff

* Format black

* Let the factory clear method take specific names

* Don't override __module__ to the factory function

If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module

* Clean up storage if job tests fail

* Make tinybase the default storage backend

* Switch Function and Macro over to using classfactory

With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability.

* Remove unused decorator

And reformat tests in the vein of usage in Function and Macro

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* [patch] Make factory made objects cloudpickleable (#305)

* Change paradigm to whether or not the node uses __reduced__ and a constructor

Instead of "Meta" nodes

* Allow direct use of Constructed children

* Move and update constructed stuff

* Add new singleton behaviour so factory-produced classes can pass is-tests

* PEP8 newline

* Remove unnecessary __getstate__

The object isn't holding instance level state and older versions of python bork here.

* Add constructed __*state__ compatibility for older versions

* 🐛 add missing `return`

* Format black

* Revert singleton

* Remove constructed

It's superceded by the snippets.factory stuff

* Format black

* Let the factory clear method take specific names

* Don't override __module__ to the factory function

If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module

* Clean up storage if job tests fail

* Make tinybase the default storage backend

* Switch Function and Macro over to using classfactory

With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability.

* Remove unused decorator

And reformat tests in the vein of usage in Function and Macro

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Compute qualname if not provided

* Fail early if there is a <locals> function in the factory made hierarchy

* Skip the factory fanciness if you see <locals>

This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling.

Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle

* Format black

* Test ClassFactory this way too

---------

Co-authored-by: pyiron-runner <[email protected]>

* [patch] More transformers (#306)

* Change paradigm to whether or not the node uses __reduced__ and a constructor

Instead of "Meta" nodes

* Allow direct use of Constructed children

* Move and update constructed stuff

* Add new singleton behaviour so factory-produced classes can pass is-tests

* PEP8 newline

* Remove unnecessary __getstate__

The object isn't holding instance level state and older versions of python bork here.

* Add constructed __*state__ compatibility for older versions

* 🐛 add missing `return`

* Format black

* Revert singleton

* Remove constructed

It's superceded by the snippets.factory stuff

* Format black

* Let the factory clear method take specific names

* Don't override __module__ to the factory function

If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module

* Clean up storage if job tests fail

* Make tinybase the default storage backend

* Switch Function and Macro over to using classfactory

With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability.

* Remove unused decorator

And reformat tests in the vein of usage in Function and Macro

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Compute qualname if not provided

* Fail early if there is a <locals> function in the factory made hierarchy

* Skip the factory fanciness if you see <locals>

This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling.

Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle

* Format black

* Test ClassFactory this way too

* Test existing list nodes

* Rename length base class

* Refactor transformers to use on_run and run_args more directly

* Introduce an inputs-to-dict transformer

* Preview IO as a separate step

To guarantee IO construction happens as early as possible in case it fails

* Add dataframe transformer

* Remove prints 🤦

* Add dataframe transformer tests

* Add transformers to the create menu

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* [patch] 🧹 be more consistent in caching/shortcuts (#307)

* Change paradigm to whether or not the node uses __reduced__ and a constructor

Instead of "Meta" nodes

* Allow direct use of Constructed children

* Move and update constructed stuff

* Add new singleton behaviour so factory-produced classes can pass is-tests

* PEP8 newline

* Remove unnecessary __getstate__

The object isn't holding instance level state and older versions of python bork here.

* Add constructed __*state__ compatibility for older versions

* 🐛 add missing `return`

* Format black

* Revert singleton

* Remove constructed

It's superceded by the snippets.factory stuff

* Format black

* Let the factory clear method take specific names

* Don't override __module__ to the factory function

If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module

* Clean up storage if job tests fail

* Make tinybase the default storage backend

* Switch Function and Macro over to using classfactory

With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability.

* Remove unused decorator

And reformat tests in the vein of usage in Function and Macro

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Compute qualname if not provided

* Fail early if there is a <locals> function in the factory made hierarchy

* Skip the factory fanciness if you see <locals>

This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling.

Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle

* Format black

* Test ClassFactory this way too

* Test existing list nodes

* Rename length base class

* Refactor transformers to use on_run and run_args more directly

* Introduce an inputs-to-dict transformer

* Preview IO as a separate step

To guarantee IO construction happens as early as possible in case it fails

* Add dataframe transformer

* Remove prints 🤦

* Add dataframe transformer tests

* Add transformers to the create menu

* Format black

* 🧹 be more consistent in caching/shortcuts

Instead of always defining private holders by hand

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* [patch] Dataclass transformer (#308)

* Change paradigm to whether or not the node uses __reduced__ and a constructor

Instead of "Meta" nodes

* Allow direct use of Constructed children

* Move and update constructed stuff

* Add new singleton behaviour so factory-produced classes can pass is-tests

* PEP8 newline

* Remove unnecessary __getstate__

The object isn't holding instance level state and older versions of python bork here.

* Add constructed __*state__ compatibility for older versions

* 🐛 add missing `return`

* Format black

* Revert singleton

* Remove constructed

It's superceded by the snippets.factory stuff

* Format black

* Let the factory clear method take specific names

* Don't override __module__ to the factory function

If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module

* Clean up storage if job tests fail

* Make tinybase the default storage backend

* Switch Function and Macro over to using classfactory

With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability.

* Remove unused decorator

And reformat tests in the vein of usage in Function and Macro

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Compute qualname if not provided

* Fail early if there is a <locals> function in the factory made hierarchy

* Skip the factory fanciness if you see <locals>

This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling.

Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle

* Format black

* Test ClassFactory this way too

* Test existing list nodes

* Rename length base class

* Refactor transformers to use on_run and run_args more directly

* Introduce an inputs-to-dict transformer

* Preview IO as a separate step

To guarantee IO construction happens as early as possible in case it fails

* Add dataframe transformer

* Remove prints 🤦

* Add dataframe transformer tests

* Add transformers to the create menu

* Format black

* 🧹 be more consistent in caching/shortcuts

Instead of always defining private holders by hand

* Introduce a dataclass node

* Give the dataclass node a simpler name

Since we can inject attribute access, I don't anticipate ever needing the reverse dataclass-to-outputs node, so let's simplify the naming here.

* Remove unused import

* Set the output type hint automatically

* Add docs

* Add tests

* Format black

* PEP8 newline

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* [minor] Introduce for-loop (#309)

* Change paradigm to whether or not the node uses __reduced__ and a constructor

Instead of "Meta" nodes

* Allow direct use of Constructed children

* Move and update constructed stuff

* Add new singleton behaviour so factory-produced classes can pass is-tests

* PEP8 newline

* Remove unnecessary __getstate__

The object isn't holding instance level state and older versions of python bork here.

* Add constructed __*state__ compatibility for older versions

* 🐛 add missing `return`

* Format black

* Revert singleton

* Remove constructed

It's superceded by the snippets.factory stuff

* Format black

* Let the factory clear method take specific names

* Don't override __module__ to the factory function

If it was explicitly set downstream, leave that. But if the user left it empty, still default it back to the factory function's module

* Clean up storage if job tests fail

* Make tinybase the default storage backend

* Switch Function and Macro over to using classfactory

With this, everything is pickleable (unless you slap something unpickleable on top, or define it in a place that can't be reached by pickle like inside a local function scope). The big downside is that `h5io` storage is now basically useless, since all our nodes come from custom reconstructors. Similarly, for the node job `DataContainer` can no longer store the input node. The `tinybase` backend is still working ok, so I made it the default, and I got the node job working again by forcing it to cloudpickle the input node on saving. These are some ugly hacks, but since storage is an alpha feature right now anyhow, I'd prefer to push ahead with pickleability.

* Remove unused decorator

And reformat tests in the vein of usage in Function and Macro

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Expose concurrent.futures executors on the creator

* Only expose the base Executor from pympipool

Doesn't hurt us now and prepares for the version bump

* Extend `Runnable` to use a non-static method

This is significant. `on_run` is no longer a property returning a staticmethod that will be shipped off, but we directly ship off `self.on_run` so `self` goes with it to remote processes. Similarly, `run_args` gets extended to be `tuple[tuple, dict]` so positional arguments can be sent too.

Stacked on top of pickleability, this means we can now use standard `concurrent.futures.ProcessPoolExecutor` -- as long as the nodes are all defined somewhere importable, i.e. not in `__main__`. Since working in notebooks is pretty common, the more flexible `pympipool.Executor` is left as the default `Workflow.create.Executor`.

This simplifies some stuff under the hood too, e.g. `Function` and `Composite` now just directly do their thing in `on_run` instead of needing the misdirection of returning their own static methods.

* Format black

* Compute qualname if not provided

* Fail early if there is a <locals> function in the factory made hierarchy

* Skip the factory fanciness if you see <locals>

This enables _FactoryMade objects to be cloudpickled, even when they can't be pickled, while still not letting the mere fact that they are dynamic classes stand in the way of pickling.

Nicely lifts our constraint on the node job interaction with pyiron base, which was leveraging cloudpickle

* Format black

* Test ClassFactory this way too

* Test existing list nodes

* Rename length base class

* Refactor transformers to use on_run and run_args more directly

* Introduce an inputs-to-dict transformer

* Preview IO as a separate step

To guarantee IO construction happens as early as possible in case it fails

* Add dataframe transformer

* Remove prints 🤦

* Add dataframe transformer tests

* Add transformers to the create menu

* Format black

* 🧹 be more consistent in caching/shortcuts

Instead of always defining private holders by hand

* Introduce a dataclass node

* Give the dataclass node a simpler name

Since we can inject attribute access, I don't anticipate ever needing the reverse dataclass-to-outputs node, so let's simplify the naming here.

* Remove unused import

* Set the output type hint automatically

* Add docs

* Add tests

* Format black

* PEP8 newline

* Introduce for-loop

* Refactor: break _build_body into smaller functions

* Resolve dataframe column name conflicts

When a body node has the same labels for looped input as for output

* Update docstrings

* Refactor: rename file

* Don't break when one of iter or zipped is empty

* 🐛 pass body hint, not hint and default, to row collector input hint

* Silence disconnection warning

Since(?) disconnection is reciprocal it was firing left right and centre

* Update deepdive

* Remove old for loop

* Remove unused import

* Add tests

* Remove unused attributes

* Add a shortcut for assigning an executor to all body nodes

* Format black

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* Bump pyiron ecosystem (#313)

* Bump pyiron ecosystem

* [dependabot skip] Update env file

* Bumpy pympipool in conda env

* Be more gracious in parallel for-loop speed test

* [dependabot skip] Update env file

* Bump base in the conda env

* [dependabot skip] Update env file

* Downgrade pympipool

pylammpsmpi, and thus pyiron_atomistics is still a patch behind, and to my chagrin patches remain hard-pinned in pyiron_atomistics

* Update deepdive notebook to use changed pympipool signature

* [dependabot skip] Update env file

---------

Co-authored-by: pyiron-runner <[email protected]>

* Move pandas to main env

* Add matgl to notebooks env

* [dependabot skip] Update env file

* Introduce new access points to zip and iter loop dataframes right on the node

* Remove Joerg's iter

* Update decorator calls

* 🔥 🐛 Don't pull on input nodes (#318)

* 🔥 🐛 Don't pull on input nodes

A simple call invokes a pull, including pulling the parent tree -- this blows up when the loop is inside a parent scope instead of alone. Since we _know_ these are just user input nodes with direct value connections to the loop macro's scope, we never need to run the pull, or indeed even a fetch the value.

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* Make databases a module

* Expose output maps on for-loop convenience access points

* Get the notebooks running

* Rewiring some stuff
* Using the new iter method
* Formatting nodes with PascalCase
* Whatever else was needed to get the notebooks to execute

* Format black

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: pyiron-runner <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Copy link

codacy-production bot commented May 9, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-16.53% (target: -1.00%) 0.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (7bd7c3e) 3946 3524 89.31%
Head commit (044598e) 4842 (+896) 3524 (+0) 72.78% (-16.53%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#33) 896 0 0.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more

@liamhuber liamhuber mentioned this pull request May 29, 2024
2 tasks
@liamhuber
Copy link
Member Author

This content has been migrated to https://github.com/pyiron/node_library, where it currently depends on pyiron_workflow-0.8.0. I want to leave this branch around, but I'm going to close the PR.

@liamhuber liamhuber closed this May 30, 2024
@liamhuber liamhuber deleted the JNmpi_lammps_nodes branch September 7, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants