Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issues for removing operators #474

Open
2 tasks
haixuanTao opened this issue Apr 17, 2024 · 10 comments
Open
2 tasks

Tracking Issues for removing operators #474

haixuanTao opened this issue Apr 17, 2024 · 10 comments
Assignees
Labels
python Python API

Comments

@haixuanTao
Copy link
Collaborator

haixuanTao commented Apr 17, 2024

Context

Dora-rs Operators was built to make Intra Process Communication and makes it possible to run multiple operators within a same process. This could reduce process usage and use green threads instead of OS threads.

Challenges

The problem is that the implementation and abstraction coming with Operators was big and the more we advance in dora:

  • People are confused with custom nodes
  • People are confused about how to program operators
  • Operators are very verbose
  • This add a hierarchy in the dataflow
  • Multiple Python operators does not work with the GIL
  • Rust Operators with shared library is pretty hard, with a lot of complexity
  • Same thing with C/C++ and that leads to having complex build step due too having to compile C/C++ Operators

And we don't see people caring for Intra Process Communication or use deadline time-management functionality.

So we think about depreciating Operators, and favors nodes which is the current custom nodes.

We'll provide guide to migrate and we'll release a minor version for it.

We will migrate unique functionality within the operators to nodes such as hot-reloading.

API

Python API will look as follows:

from dora import Node

state = "XYZ"

if __name__ == "__main__":
    node = Node(hot_reload_states=[state]) 

    for input in node:
        ...
        node.send_output(pa.array([]))

with the graph:

nodes:
    - id: node_1
      path: something.py
      inputs:
        - input_1: "image"
        - input_2: "audio"
      outputs:
        - output_1
        - output_2

The rest will be free for the user to defines in its liking.

For C/C++/Rust, the API will be the current custom nodes API and will remove the support for operators.

What's next

This should makes using dora a lot more simple.

And reduce the burden for maintainer. We will then focus more on making IPC as efficient as possible in the likes of making GPU IPC available.

Follow Up TODO:

  • Make hot-reloading available for Python custom nodes
  • Remove Runtime and Operators from the code base.
@github-actions github-actions bot added the python Python API label Apr 17, 2024
@haixuanTao
Copy link
Collaborator Author

Additional notes, on people who would like to use extensive Rust Green threaded application, we believe that the way to go is to use native threadpool such as tokio, rayon, ... We think that this would mean a more direct and intuitive approach to multi threading.

@phil-opp
Copy link
Collaborator

How about we create a branch called next where we point all breaking changes? This way, we could implement this step-by-step across multiple PRs and do the breaking release once everything is ready (implementation, testing, migration guides, docs, etc).

@haixuanTao
Copy link
Collaborator Author

So the thing is that we don't have to remove operators directly, we can probably keep them as is in the codebase with some warning before retiring them in couple of versions as well as making the necessary changes at the node level for hot reloading

I tend to not be a fan of having big releases as it is always a bit stressful and hard to deal with fixes. I think this is more of the current gitflow paradigm, but then open for discussions.

@phil-opp
Copy link
Collaborator

Good point! Let's try to keep things backwards-compatible for now then.

@heyong4725
Copy link
Collaborator

Agreed. We should keep things backward compatible when introducing new programing patterns. There are some good reasons when we introduce concepts of operators for complex use cases, such as stateless operator, stateful operators, fault tolerance/redundancy, etc.

phil-opp added a commit that referenced this issue Apr 18, 2024
We plan to (soft-)remove operators and simplify the dataflow YAML file by removing the additional nesting caused by the `custom` field. This commit prepares for that. See #474 for context.
@phil-opp
Copy link
Collaborator

I created a PR at #478 to implement the new dataflow parsing logic without the extra nesting behind the custom field. I was able to implement this in a backwards compatible way, so existing dataflow definitions should continue to work. In the future, we can then deprecate the old format at some point.

@Michael-J-Ward
Copy link
Contributor

We will then focus more on making IPC as efficient as possible in the likes of making GPU IPC available.

Just curious because (again) I'm new to this space.

Are the latency / throughput requirements explicit? Something like "Must be faster than X else no one will use dora, but ideally target of Y to differentiate".

@haixuanTao
Copy link
Collaborator Author

Hey @Michael-J-Ward, it's not 😅 but in general, I tend to think that being able to have low latency means that you're able to be lean, not have too much slack.

Kind of lean management applied to software production.

In the case of GPU IPC, it's something couple of people shown interest and it would definitely push the industry forward so let's do it 🔥

@Michael-J-Ward
Copy link
Contributor

Looking deeper into the code, I understand the desire to simplify things by removing operators.

There are some good reasons when we introduce concepts of operators for complex use cases, such as stateless operator, stateful operators, fault tolerance/redundancy, etc.

@heyong4725 - Could you elaborate? If there are important use-cases that can only be implemented by operators then that would be prevent their removal, right?

If they can be removed, then would a path forward look like:

  • update any examples / docs to use nodes instead of operators
  • new release with deprecation warning to operator API
  • then start nuking operator code for a future release

Michael-J-Ward added a commit to Michael-J-Ward/dora that referenced this issue Apr 30, 2024
Minimal conversion from previous operator api to node api.

Ref dora-rs#474
@heyong4725
Copy link
Collaborator

@Michael-J-Ward Originally we have dora operators designed in such that we could potentially offload complex use cases from developers to dora framework thru dora daemon/runtime. However I agree that we can demote the operator which depends on dora framework/daemon and promote node API, keep it simple. We may revisit the operator design in future.

phil-opp added a commit that referenced this issue May 22, 2024
We plan to (soft-)remove operators and simplify the dataflow YAML file by removing the additional nesting caused by the `custom` field. This commit prepares for that. See #474 for context.
phil-opp added a commit that referenced this issue May 22, 2024
We plan to (soft-)remove operators and simplify the dataflow YAML file by removing the additional nesting caused by the `custom` field. This commit prepares for that. See #474 for context.
phil-opp added a commit that referenced this issue May 22, 2024
We plan to (soft-)remove operators and simplify the dataflow YAML file by removing the additional nesting caused by the `custom` field. This commit prepares for that. See #474 for context.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Python API
Projects
None yet
Development

No branches or pull requests

4 participants