Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dora-daemon and dora-coordinator are background process that does not see cli environment variable #363

Open
haixuanTao opened this issue Oct 18, 2023 · 11 comments
Labels
bug Something isn't working cli CLI coordinator daemon

Comments

@haixuanTao
Copy link
Collaborator

Describe the bug
Sometimes an environment variable is updated before using the dora-cli. But because the coordinator and the daemon are background process. They do not see the updated environment variable. This can generate some confusion.

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: dora up
  2. Change an env variable
  3. Start a new dataflow: dora start dataflow.yaml
  4. bis. Expect the env variable to be the latest one but see the previous one or an empty one
  5. Stop dataflow: dora stop
  6. Destroy dataflow: dora destroy

Expected behavior
I would expect the cli to pass the latest value to the daemon and coordinator.

@github-actions github-actions bot added bug Something isn't working cli CLI coordinator daemon labels Oct 18, 2023
@phil-opp
Copy link
Collaborator

I would expect the cli to pass the latest value to the daemon and coordinator.

We need to keep deployment to remote machines in mind. We probably don't want the CLI to pass the local env variables to the remote deploy machine.

@phil-opp
Copy link
Collaborator

We already support setting environment variables through the dataflow yaml file, right? Would this be a possible alternative for your use case?

@haixuanTao
Copy link
Collaborator Author

So this is the thing, when we use environment variable within the YAML Graph, it uses old env variables.

In the case we think that environment variables should not be shared in a distributed environment which I can understand, maybe we should change our approach to using variables.

@haixuanTao
Copy link
Collaborator Author

One alternative would be to add a --env argument that enables user to pass env variable when running dataflows, in the manner of docker

@phil-opp
Copy link
Collaborator

Ah, understood! So I guess the main issue is the env expansion that we introduced in 933dadc. For some cases (such as the $HOME example in the commit description) we want that the expansion happens on the target machine, but in other cases we want it to happen on the CLI machine.

I'm fine with changing the behavior to do the env expansion on the CLI machine already, but I fear that there is always some chance of confusion, depending on what you expect. Not sure what we can do to avoid this though, aside from documenting it clearly....

@Hennzau
Copy link
Collaborator

Hennzau commented Sep 4, 2024

🚀 Proposal: Enhancing Dora for Better Usability

Hello! I believe this unresolved issue is crucial for the better adoption of Dora, especially among less experienced users. At first glance, one might expect that our nodes and the Daemon would have access to the CLI's environment variables. However, it's completely understandable that this could lead to a lot of issues. I think there's a way to find an alternative solution.

🛠️ Current build Process

We already have a build process that can be assigned to nodes, which activates when we run dora build. This procedure is particularly useful when we need to install requirements or compile a program as it's usually slow.

💡 New Proposal: preprocess Procedure

It could be interesting to introduce another procedure called preprocess, which would contain a list of scripts to execute when running dora start, before trying to resolve and launch nodes. This would help enhance the environment of our dataflow or specific nodes. For example:

preprocess:
  - id: activate_python
    path: .venv/bin/activate.sh  # Will activate a Python environment
  - id: load_env_var_ext
    path: .my_env_var.sh  # Will export useful custom environment variables

nodes:
  - id: camera
    path: camera.py # will be launched by the python env, regardless where the daemon or CLI was spawned

  - id: plot
    path: plot.py # same

🔄 Node-Specific Environment Variables

If we need different environment variables for a particular node, we could add them like this:

preprocess:
  - id: activate_python
    path: .venv/bin/activate.sh
  - id: load_env_var_ext
    path: .my_env_var.sh

nodes:
  - id: camera
    path: camera.py

  - id: plot
    path: plot.py

  - id: weird_script
    preprocess:
      - id: activate_another_python
        path: .another_venv/bin/activate.sh
    path: weird_script.py # will be launched by this other_python

🌐 Distributed Dataflow Consideration

In the case of a distributed dataflow, absolute paths will need to be specified instead of relative ones, just like with everything else.

Let me know what you think! 😊

@haixuanTao
Copy link
Collaborator Author

haixuanTao commented Sep 9, 2024

So I definitely feel there is something to be done with running command before running the full nodes.

I think the preprocess idea is really interesting!

I would maybe do couple of changes.

Maybe what we could do is be slightly more github action like and make it possible to have multiline shell script in the likes of:

        run: |
          source /opt/ros/humble/setup.bash 
          cargo run --example cxx-ros2-dataflow --features="ros2-examples"

It would make it a bit more readable as people don't need to search for the shell script and also make it more easy to get started with if they used github action before.

Currently we should be able to ( or manage to ) do something like this:

      - id: terminal-print
        build: cargo build -p terminal-print
        path: shell
        args: |
          source /opt/ros/humble/setup.bash 
          cargo run --example cxx-ros2-dataflow --features="ros2-examples"

What do you think of having something like this?

P.S: the preprocess should probably be per node as it make dependant step easier to track / conflict.

@Hennzau
Copy link
Collaborator

Hennzau commented Sep 9, 2024

I'm okay with using multi-line commands inside the run parameter since filling both the path and args parameters may not be easy :)

@haixuanTao
Copy link
Collaborator Author

In all honesty, I'm drawn with using run as well as it is closer to github action, but I think that it makes it harder to maintain as it means adding another key, and knowing that this will not be backward compatible. Also, there might be errors in having both path and run set.

@Hennzau
Copy link
Collaborator

Hennzau commented Sep 9, 2024

Ok well, that's a good point. So we should stay on this pattern (path + args), documenting it, until there is a real need about something different.

@haixuanTao
Copy link
Collaborator Author

I think one of the issue is that if we only have run, we don't really have a way to control the environment in which the shell itself is spawned with. Like is it using bash or zsh or powershell or cmd, ...

Those things we could manage with ( something like):

      - id: terminal-print
        build: cargo build -p terminal-print
        path: /bin/zsh
        args: |
          source /opt/ros/humble/setup.bash 
          cargo run --example cxx-ros2-dataflow --features="ros2-examples"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cli CLI coordinator daemon
Projects
None yet
Development

No branches or pull requests

3 participants