Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: enhancements for the future of the project #1

Open
25 of 33 tasks
ankostis opened this issue Oct 19, 2019 · 7 comments
Open
25 of 33 tasks

RFE: enhancements for the future of the project #1

ankostis opened this issue Oct 19, 2019 · 7 comments

Comments

@ankostis
Copy link

ankostis commented Oct 19, 2019

(Cloned from yahoo#22)
Collecting here enhancements that would nice to have on this project going forward:

Language & Build

2. Functionality

  • 2.1 drop "backward compatibility" Operation, clean up API
    (e.g. FunctionalOperation.compute() raises not impl, and network using private _compute() instead).
  • 2.2 (ENH: annotate graph edges as optionals yahoo/graphkit#20) store node data in networkx and not in classes
  • 2.3 add weights
  • 2.4 produce deterministic results & failures:
  • 2.5 support optional outputs.
  • 2.6 support combination of modifiers (e.g. optional-sideffect) and merge with _DataNode.
  • 2.7 assign "colors" to operations, and compute selectively on sub-sets of colors (idea from @syamajala's https://github.com/slac-lcls/networkfox)
  • 2.8 parallel execution:
    • 2.8.1 executor must accept an external "shared" pool from contextvars
      (now uses the one from Network).
    • 2.8.2 enable parallelism with configs alone - don't create the pool by default.
    • 2.8.3 enhance parallelism to support also Executors & Async coroutines.
  • 2.9 support functions with:
    • 2.9.1 *args (non-kwarg optionals)
    • 2.9.2 **kwargs (to receive the whole solution).
  • 2.10 add config-methods to top-level module.
  • 2.11 use a "start-node" to insert input-values in solution.
  • 2.12 assimilate netop.compute()/netop.__call__() to FunctionalOperation: only compute() should accept inputs dict, __call__() should accept **kwargs.
  • 2.13 keep executing as many nodes as possible, despite errors

3. diagrams:

@syamajala
Copy link

As far as colors go, I was actually thinking about this, what if it were possible to attach arbitrary metadata to nodes and pass a predicate function which looks at the metadata and returns true or false which tells you whether to evaluate the node or not?

@ankostis
Copy link
Author

I agree, this is a flexible solution, a superset of "colors", since it is more "dynamic" - it can decide the nodes to consider after the DAG has formed.

Implementation wise, the operation_predicate() function is pretty simple, but i would place it in the execution_config ContextVar.
An alternative for the node-predicate function would be to expose the DAG from a new "hook" before_compute(), and allow arbitrary calls to networkx.subgraph() - that would filter nodes in a single step, fully under User control. A drawback is that the user may destroy the DAG (remove data-nodes needed by the underlying functions).
I like exposing the power of networkx to the users.

Finally we need the metadata to be in the DAG - graphtik has this mostly implemented.

A remark on performance,
if this predicates run right before every compute() call, the effect would be (in big-O notation)

where is the number of operations in the network-operation node (assuming utilizes predicates). For simple graph-operation, m = <total-number of operation nodes>, n=1.

ankostis added a commit that referenced this issue Dec 11, 2019
e.g. assign "colors" to nodes, and solve a subset each time.
ankostis added a commit that referenced this issue Dec 11, 2019
e.g. assign "colors" to nodes, and solve a subset each time.
@ankostis
Copy link
Author

@syamajala the just released 4.0.0 supports node-attributes and and a predicate function to prune arbitrary nodes before computing results (check CHANGES and NetworkOperation.pruned() method).

@ankostis ankostis pinned this issue Jan 10, 2020
@syamajala
Copy link

Are there any plans to implement control flow nodes? Things like If, ElseIf, Else, and maybe For?

@ankostis
Copy link
Author

ankostis commented Dec 9, 2020

I had thought that initially, and had precluded it, because of all the complication such a UX would imply, to drive conditional logic in the DAG level.
But since i also needed if-then-else functionality I decided that all "conditional" code should reside inside regular python operations, as can be seen in the plotted diagram of the landing page.

Specifically, if some function wants to cancel its downstream execution, or produce a subset of its outputs (and implement if-then-else logic), can do that with endured operations or partial outputs. A reschedule point kicks-in after such operations.

I cannot think of a way to workaround also for-loops with the same machinery, because of the following architectural reasons:

  • The conceptual model of this project cannot have a goto equivalent, all scheduling decisions are based on the existence of data (there are no direct operation-dependencies). I don't want this to change.
  • The execution DAGs cannot have loops (they are acyclic after all), planning resolves any loops in the pipeline-graph before execution (see for eg. sideffects that are used to bypass any data-loops). It would be rather difficult to change this.
  • The solution expects each operation to execute only once. That may change, but it is not enough for having loops.

@syamajala
Copy link

syamajala commented Dec 9, 2020

Based on what you said I have some vague idea of how to do something like this outside of graphtik. Note: I havent tried this out yet.

What you do is have some higher level functions/nodes.

graph = graph_composer(name='graph')(
If(name='val_greater_than_10', condition_needs=['val'], condition=lambda val: val > 10, needs=['a'], provides=['c'], group='if1')(
    operation(name='a_plus_10', needs=['a'], provides=['b'])(lambda a: a+10),
    operation(name='b_divide_5', needs['b'], provides=['c'])(lambda b: b/10)),
ElseIf(name='val_equal_10', condition_needs=['val'], condition=lambda val: val == 10, needs=['a'], provides=['c'], group='if1')(
    operation(name='a_minus_10', needs=['a'], provides=['c'])(lambda a: a-10))
Else(name='val_less_than_10', condition_needs=['val'], needs=['a'], provides=['c'], group='if1')(
    operation(name='a_divide_10', condition_needs=['val'], needs=['a'], provides=['c'])(lambda a: a/10)))

What graph_composer would do then is walk the list of nodes and return an operation that looks something like this:

def if1(val, a, b):
   if val > 10:
       return val_greater_than_10_graph(a ,b)
   elif val == 10:
        return val_equal_10_graph(a, b)
   else:
       return val_less_than_10_graph(a, b)

operation(name='if1', needs=['val', 'a', 'b'], provides=['c'])(if1)

You could probably clean this up by generating a class that holds the graphs as attributes and implements the if1 function as __call__

I think instead of For what I would really like is a Map where maybe what it does is loop over lists and just applies the subgraph to each entry.

graph = graph_composer(name='graph')(
Map(name='map', needs=['a', 'b'], provides=['c'])(
    operation(name='a_times_b', needs=['a', 'b'], provides=['c'])(lambda a, b: a*b)))

graph({'i': 0, 'a': [1, 2, 3], 'b': [11, 12, 13]}) -> {'c': [11, 24, 39]}

It would work in a similar method.

Also I should say my use case for something like this is that I provide users with libraries of operations and they build the graph themselves, so putting the logic inside of the operation itself is not really feasible for me because I dont know the conditions they want to evaluate on.

@ankostis
Copy link
Author

ankostis commented Dec 9, 2020

If you auto-generate a function that returns_dict, you can forego the x3 sub-pipelines, and put all operations-nodes in the same graph:

def if1(val, a, b):
   if val > 10:
       return {"if1_if"': True}
   elif val == 10:
        return {"if1_elif"': True}
   else:
       return {"if1_else"': True}

operation(if1,
          needs="val", 
          provides=["if1_if", "if1_elif", "if1_else"], 
          returns_dict=True)

The returns_dict functions facilitate composing operations with partial-outputs.

You also need to auto-generate 3 branch-operations downstream, each one receiving one of the 3 dependencies provided above "if1_if", "if1_elif", "if1_else" as their needs.

Does that makes sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants