Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for experimental workflows #38

Closed
SteffenBrinckmann opened this issue Oct 19, 2023 · 5 comments
Closed

Example for experimental workflows #38

SteffenBrinckmann opened this issue Oct 19, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@SteffenBrinckmann
Copy link

Joerg and Sarath introduced me to the workflow engine. Very nice work and useful. Below are some feedback that you might consider along with my example

  • I find the requirements a little much: if I only install workflows, I don't understand why I need e.g. SQL stuff
  • can the workflow.run parameters become easier
wf.step.inputs.name  = 'sem'              #step 3
wf.step.inputs.param = {'voltage':'30'}
out = wf.run()

could become one line:

out = wf.run( {'step':{name:'sem', 'param':{'voltage':30}}}) 
  • the out is also not beautiful (see below) because I need to return 3 values, ... have to pack them into a list, ...
print('Output filename',out['step__[sample, fileName, metadata]'][1])
  • a draw function that is build on matplotlib as I don't use jupyter but a conventional python script.

Full example (I can supply the Storage and Sample classes on request)

from pathlib import Path
from pyiron_workflow.workflow import Workflow
from experimental_workflows.storage import Storage, step
from experimental_workflows.sample import Sample

wf = Workflow('example_workflow')         #name
folder = Path(__file__).parent.parent/'procedures'
wf.step = step(storage=Storage(folder))   #define step and link to storage for procedures

sample = Sample('FeAl')
wf.step.inputs.sample = sample            #since only use one sample, define only once

wf.step.inputs.name  = 'polish'           #step 1
out = wf.run()

wf.step.inputs.name  = 'light microscopy' #step 2
out = wf.run()
print('Output filename',out['step__[sample, fileName, metadata]'][1])

wf.step.inputs.name  = 'sem'              #step 3
wf.step.inputs.param = {'voltage':'30'}
out = wf.run()
print('Output filename',out['step__[sample, fileName, metadata]'][1])
@SteffenBrinckmann SteffenBrinckmann added the enhancement New feature or request label Oct 19, 2023
@liamhuber
Copy link
Member

Hey @SteffenBrinckmann,

Glad to hear you're working with it and thanks for the feedback!

I find the requirements a little much: if I only install workflows, I don't understand why I need e.g. SQL stuff

💯

We currently depend directly on pyiron_base, which has a bunch of these dependencies, but we're really only using a couple of simple utility snippets that can be pulled out.
We also depend on pyiron_atomistics (and other modules) and that entire dependency stack, but this is only for the specific "node packages". The idea is to (a) move imports inside the node definitions themselves, making the dependency optional, and (b) eventually move the node packages off to their own repository(ies).

I don't have a timeframe right now, but it is correct criticism and there's a plan in place for resolving it.

can the workflow.run parameters become easier
...
could become one line:

 out = wf.run( {'step':{name:'sem', 'param':{'voltage':30}}}) 

The syntax is still evolving, but it's true that right now the .run() method doesn't accept any arguments.

However, you can accomplish this in one line by calling the node/workflow itself -- which starts by updating the input based on the arguments, and then runs it. For workflows the arguments have to be keyword, but for function nodes positional arguments are ok too.

In your example I guess this would look like

out = wf(step__name='sem', step__param={'voltage': 30})

For workflows and macros you can define inputs_map and outputs_map to give these arguments (and indeed the IO channels) better names, but by default it's just a double-underscore connection of the node label and the channel label as above.

the out is also not beautiful (see below) because I need to return 3 values, ... have to pack them into a list, ...

So I definitely agree there's lots of room for improving output beauty, but if I understand correctly here it's just annoying for you that it's returned as a dictionary rather than list? I'm open to providing a shortcut on the returned object to "listify" it more easily, but transforming it to a list can be done pretty easily without making reference the individual keys:

from pyiron_workflow import Workflow

@Workflow.wrap_as.function_node()
def multiple_returns(a, b, c):
    return a+1, b+2, c+3

wf = Workflow("io_demo")
wf.node = multiple_returns()
out = wf(node__a=0, node__b=1, node__c=2)
print(
    "Returned to list:",
    list(out.values())
)
>>> Returned to list: [1, 3, 5]
# or  access it later
print(
    "Workflow output dict to list", 
    list(wf.outputs.to_value_dict().values())
)
>>> Workflow output dict to list [1, 3, 5]

Is that sufficient?

a draw function that is build on matplotlib as I don't use jupyter but a conventional python script.

Working with .py files should be totally fine! The .draw() method returns a graphviz object that jupyter knows how to render, but it's not a problem to render it yourself using the graphviz render method

I copied my example above into a .py script and added wf.draw().render(view=True) at the end to have the graph plot automatically pop up, or wf.draw().render(filename="io_demo", format="png") to save it to file (attached to this comment below).

Like the conversion to a list, I think there's probably tweaks we can make so that the process there is smoother, but I hope that at least is doing the job you're looking for.

io_demo

@SteffenBrinckmann
Copy link
Author

@liamhuber THANKS for the fast and compete answer

I currently use steps as and that is amazing

wf.step3 = step(storage, sample, 'sem', {'voltage':'30'})

Return values via .values() & list() is better but not ideal. But it works.

Draw works great:

  • could you create all your temporary files in /tmp (python has lib)
  • please no color-gradients, we are not 1990 ;-)
  • ability to omit certain elements (everything orange)

But the greatest issue for me is the design:

  • currently one defines all the steps; then the workflow is run
  • could one choose to run steps immediately and use their output to identify the next steps?

This was referenced Oct 20, 2023
@liamhuber
Copy link
Member

I'm glad someone's using it 😂

I currently use steps as and that is amazing

Great!

Return values via .values() & list() is better but not ideal. But it works.

I opened a separate issue for this (#40). A shortcut will show up eventually in a future release.

Draw works great:
...

The 90's are back baby! 🤣 But seriously, the left-right gradients are actually intentional to help distinguish input from output so I wouldn't want to get rid of those without some other visual queue in mind; the up-down gradients are just me showing my age I guess, haha. The aesthetic feedback is certainly welcome, but pretty low on my priority list so for now just hum the Friends theme and try to tolerate it 😉

For the more functional changes I'll apply at little higher priority although there are some core functionality changes to get out of the way first; I opened a separate issue so I don't lost track of it (#41).

But the greatest issue for me is the design:

Yeah, this is a real issue. In earlier iterations, things were actually extremely greedy: nodes always pushed their data off downstream, and also nodes always updated their output whenever they got new input (and all their input was of valid type).

Of course not every node could be this way, because some were expensive and you didn't want them running all the time. So then we had this combination of "fast" and "slow" nodes.

Joerg and I have discussed it lots, and ultimately we settled on a fully-delayed framework, where you can instantiate, inspect, and give new input to nodes and they don't run unless they receive some sort of explicit instruction to do so. AFAIK this is similar behaviour to DASK and most (but not all) other frameworks. I really do think this is the best choice and don't anticipate going back and changing it.

One thing (my current primary goal) is to get a pull() method working, so that you can get up-to-date information from a particular node while only running the upstream nodes necessary for that node; and this should work both in a workflow context as well as with nodes that are connected but have no workflow parent. This is discussed in #10, #12, and #35. Still, I think it's not quite what you're looking for.

While I don't want to switch from a delayed to a greedy paradigm, I do see room to add something like a run_on_instantiation: bool = False flag to the signature of Function so that it runs itself right away. With existing infrastructure you could then change the default value from False to True right in the decorator. After instantiation it would still be fully delayed and wait for instructions to run, but you'd get that immediate feedback. I don't see a way to actually return the result (you can't return something from __init__), but the output would get populated. I'm picturing something like this:

from pyiron_workflow import Workflow

@Workflow.wrap_as.function_node(run_on_instantiation=True)
def update_immediately(x=0, y=1):
    return x+1, y+2

default_aggressive = update_immediately(y=3)
print(default_aggressive.outputs.to_value_dict())
>>> {"x+1": 1, "y+2": 5}

# Or
@Workflow.wrap_as.function_node()
def normal(x=0, y=1):
    return x-1, y-2

explicit_instruction_to_be_agressive = normal(x=10, run_on_instantiation=True)
print(explicit_instruction_to_be_agressive.outputs.to_value_dict())
>>> {"x-1": 9, "y-2": -1}

Would that be satisfactory?

@SteffenBrinckmann
Copy link
Author

That would be very great and fully satisfactory. I fully understand why you want to have delayed execution as default and that it is almost essential for long duration calculations.

Colors: of course they are low priority, I just have to give you a hard time.

Thanks

@liamhuber
Copy link
Member

Super! I've opened the final item as a separate issue in #42. Now that these are all broken into smaller tasks, I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants