Example for experimental workflows #38

SteffenBrinckmann · 2023-10-19T18:56:47Z

Joerg and Sarath introduced me to the workflow engine. Very nice work and useful. Below are some feedback that you might consider along with my example

I find the requirements a little much: if I only install workflows, I don't understand why I need e.g. SQL stuff
can the workflow.run parameters become easier

wf.step.inputs.name  = 'sem'              #step 3
wf.step.inputs.param = {'voltage':'30'}
out = wf.run()

could become one line:

out = wf.run( {'step':{name:'sem', 'param':{'voltage':30}}})

the out is also not beautiful (see below) because I need to return 3 values, ... have to pack them into a list, ...

print('Output filename',out['step__[sample, fileName, metadata]'][1])

a draw function that is build on matplotlib as I don't use jupyter but a conventional python script.

Full example (I can supply the Storage and Sample classes on request)

from pathlib import Path
from pyiron_workflow.workflow import Workflow
from experimental_workflows.storage import Storage, step
from experimental_workflows.sample import Sample

wf = Workflow('example_workflow')         #name
folder = Path(__file__).parent.parent/'procedures'
wf.step = step(storage=Storage(folder))   #define step and link to storage for procedures

sample = Sample('FeAl')
wf.step.inputs.sample = sample            #since only use one sample, define only once

wf.step.inputs.name  = 'polish'           #step 1
out = wf.run()

wf.step.inputs.name  = 'light microscopy' #step 2
out = wf.run()
print('Output filename',out['step__[sample, fileName, metadata]'][1])

wf.step.inputs.name  = 'sem'              #step 3
wf.step.inputs.param = {'voltage':'30'}
out = wf.run()
print('Output filename',out['step__[sample, fileName, metadata]'][1])

liamhuber · 2023-10-19T19:57:18Z

Hey @SteffenBrinckmann,

Glad to hear you're working with it and thanks for the feedback!

I find the requirements a little much: if I only install workflows, I don't understand why I need e.g. SQL stuff

💯

We currently depend directly on pyiron_base, which has a bunch of these dependencies, but we're really only using a couple of simple utility snippets that can be pulled out.
We also depend on pyiron_atomistics (and other modules) and that entire dependency stack, but this is only for the specific "node packages". The idea is to (a) move imports inside the node definitions themselves, making the dependency optional, and (b) eventually move the node packages off to their own repository(ies).

I don't have a timeframe right now, but it is correct criticism and there's a plan in place for resolving it.

can the workflow.run parameters become easier
...
could become one line:
 out = wf.run( {'step':{name:'sem', 'param':{'voltage':30}}}) 

The syntax is still evolving, but it's true that right now the .run() method doesn't accept any arguments.

However, you can accomplish this in one line by calling the node/workflow itself -- which starts by updating the input based on the arguments, and then runs it. For workflows the arguments have to be keyword, but for function nodes positional arguments are ok too.

In your example I guess this would look like

out = wf(step__name='sem', step__param={'voltage': 30})

For workflows and macros you can define inputs_map and outputs_map to give these arguments (and indeed the IO channels) better names, but by default it's just a double-underscore connection of the node label and the channel label as above.

the out is also not beautiful (see below) because I need to return 3 values, ... have to pack them into a list, ...

So I definitely agree there's lots of room for improving output beauty, but if I understand correctly here it's just annoying for you that it's returned as a dictionary rather than list? I'm open to providing a shortcut on the returned object to "listify" it more easily, but transforming it to a list can be done pretty easily without making reference the individual keys:

from pyiron_workflow import Workflow

@Workflow.wrap_as.function_node()
def multiple_returns(a, b, c):
    return a+1, b+2, c+3

wf = Workflow("io_demo")
wf.node = multiple_returns()
out = wf(node__a=0, node__b=1, node__c=2)
print(
    "Returned to list:",
    list(out.values())
)
>>> Returned to list: [1, 3, 5]
# or  access it later
print(
    "Workflow output dict to list", 
    list(wf.outputs.to_value_dict().values())
)
>>> Workflow output dict to list [1, 3, 5]

Is that sufficient?

a draw function that is build on matplotlib as I don't use jupyter but a conventional python script.

Working with .py files should be totally fine! The .draw() method returns a graphviz object that jupyter knows how to render, but it's not a problem to render it yourself using the graphviz render method

I copied my example above into a .py script and added wf.draw().render(view=True) at the end to have the graph plot automatically pop up, or wf.draw().render(filename="io_demo", format="png") to save it to file (attached to this comment below).

Like the conversion to a list, I think there's probably tweaks we can make so that the process there is smoother, but I hope that at least is doing the job you're looking for.

SteffenBrinckmann · 2023-10-20T08:19:55Z

@liamhuber THANKS for the fast and compete answer

I currently use steps as and that is amazing

wf.step3 = step(storage, sample, 'sem', {'voltage':'30'})

Return values via .values() & list() is better but not ideal. But it works.

Draw works great:

could you create all your temporary files in /tmp (python has lib)
please no color-gradients, we are not 1990 ;-)
ability to omit certain elements (everything orange)

But the greatest issue for me is the design:

currently one defines all the steps; then the workflow is run
could one choose to run steps immediately and use their output to identify the next steps?

liamhuber · 2023-10-20T15:42:13Z

I'm glad someone's using it 😂

I currently use steps as and that is amazing

Great!

Return values via .values() & list() is better but not ideal. But it works.

I opened a separate issue for this (#40). A shortcut will show up eventually in a future release.

Draw works great:
...

The 90's are back baby! 🤣 But seriously, the left-right gradients are actually intentional to help distinguish input from output so I wouldn't want to get rid of those without some other visual queue in mind; the up-down gradients are just me showing my age I guess, haha. The aesthetic feedback is certainly welcome, but pretty low on my priority list so for now just hum the Friends theme and try to tolerate it 😉

For the more functional changes I'll apply at little higher priority although there are some core functionality changes to get out of the way first; I opened a separate issue so I don't lost track of it (#41).

But the greatest issue for me is the design:

Yeah, this is a real issue. In earlier iterations, things were actually extremely greedy: nodes always pushed their data off downstream, and also nodes always updated their output whenever they got new input (and all their input was of valid type).

Of course not every node could be this way, because some were expensive and you didn't want them running all the time. So then we had this combination of "fast" and "slow" nodes.

Joerg and I have discussed it lots, and ultimately we settled on a fully-delayed framework, where you can instantiate, inspect, and give new input to nodes and they don't run unless they receive some sort of explicit instruction to do so. AFAIK this is similar behaviour to DASK and most (but not all) other frameworks. I really do think this is the best choice and don't anticipate going back and changing it.

One thing (my current primary goal) is to get a pull() method working, so that you can get up-to-date information from a particular node while only running the upstream nodes necessary for that node; and this should work both in a workflow context as well as with nodes that are connected but have no workflow parent. This is discussed in #10, #12, and #35. Still, I think it's not quite what you're looking for.

While I don't want to switch from a delayed to a greedy paradigm, I do see room to add something like a run_on_instantiation: bool = False flag to the signature of Function so that it runs itself right away. With existing infrastructure you could then change the default value from False to True right in the decorator. After instantiation it would still be fully delayed and wait for instructions to run, but you'd get that immediate feedback. I don't see a way to actually return the result (you can't return something from __init__), but the output would get populated. I'm picturing something like this:

from pyiron_workflow import Workflow

@Workflow.wrap_as.function_node(run_on_instantiation=True)
def update_immediately(x=0, y=1):
    return x+1, y+2

default_aggressive = update_immediately(y=3)
print(default_aggressive.outputs.to_value_dict())
>>> {"x+1": 1, "y+2": 5}

# Or
@Workflow.wrap_as.function_node()
def normal(x=0, y=1):
    return x-1, y-2

explicit_instruction_to_be_agressive = normal(x=10, run_on_instantiation=True)
print(explicit_instruction_to_be_agressive.outputs.to_value_dict())
>>> {"x-1": 9, "y-2": -1}

Would that be satisfactory?

SteffenBrinckmann · 2023-10-20T17:44:57Z

That would be very great and fully satisfactory. I fully understand why you want to have delayed execution as default and that it is almost essential for long duration calculations.

Colors: of course they are low priority, I just have to give you a hard time.

Thanks

liamhuber · 2023-10-20T17:47:46Z

Super! I've opened the final item as a separate issue in #42. Now that these are all broken into smaller tasks, I'll close this.

SteffenBrinckmann added the enhancement New feature or request label Oct 19, 2023

This was referenced Oct 20, 2023

Results to list easier #40

Closed

Add more options to .draw() #41

Closed

liamhuber mentioned this issue Oct 20, 2023

Let nodes run on instantiation #42

Closed

liamhuber closed this as completed Oct 20, 2023

liamhuber mentioned this issue Nov 27, 2023

Expose dict values as list #93

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example for experimental workflows #38

Example for experimental workflows #38

SteffenBrinckmann commented Oct 19, 2023

liamhuber commented Oct 19, 2023

SteffenBrinckmann commented Oct 20, 2023

liamhuber commented Oct 20, 2023

SteffenBrinckmann commented Oct 20, 2023

liamhuber commented Oct 20, 2023

Example for experimental workflows #38

Example for experimental workflows #38

Comments

SteffenBrinckmann commented Oct 19, 2023

liamhuber commented Oct 19, 2023

SteffenBrinckmann commented Oct 20, 2023

liamhuber commented Oct 20, 2023

SteffenBrinckmann commented Oct 20, 2023

liamhuber commented Oct 20, 2023