Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Display statistics of computational jobs together with their parent nodes #5816

Closed
10 tasks done
sanderegg opened this issue May 14, 2024 · 7 comments
Closed
10 tasks done
Assignees
Labels
a:apiserver api-server service a:dask-service Any of the dask services: dask-scheduler/sidecar or worker a:director-v2 issue related with the director-v2 service a:resource-usage-tracker resource usage tracker service

Comments

@sanderegg
Copy link
Member

sanderegg commented May 14, 2024

Context

Since the public API is available it is now possible to run X computational jobs from a running dynamic service.
Looking at the Usage statistics of a user it currently displays computational jobs separated from their "parent" dynamic service.

Goal

Display the statistics of computational jobs linked to their parent service.

Needed changes

  1. Important note: we should keep backward compatibility of the API, and also for sim4life.io keep the option to pass it via metadata at least for awhile.
  2. modify the oSparc API to create/run computational jobs from a running dynamic service by passing the "parent" node ID (ideally automatically defined, if not then the API shall be modified) - use-case: sim4life, meta-modeling, jupyterlabs, ...
  3. the parent Node ID is passed all the way to the computational backend (already exists, needs to be modified based on 1.)
  4. using the parent NodeID the logs are sent back to the parent project/nodeID if it exists (already exists, needs to be modified based on 1.)
  5. the resource usage tracker shall keep track of the parent node ID if it exists
  6. the frontend shall display the usage with services and their children jobs

Tasks

Preview Give feedback
  1. mguidon
  2. Konohana0608 pcrespov
    sanderegg
  3. a:dask-service a:director-v2 a:webserver
    sanderegg
  4. 3 of 3
    a:resource-usage-tracker
    matusdrobuliak66 sanderegg
  5. a:director-v2
    sanderegg
  6. 1 of 2
    a:frontend
    odeimaiz
  7. 1 of 2
    a:apiserver a:webserver
    bisgaard-itis sanderegg
  8. a:director-v2 a:pipeline-services
    pcrespov
  9. a:director-v2
    pcrespov
  10. bisgaard-itis
@sanderegg sanderegg transferred this issue from ITISFoundation/osparc-issues May 14, 2024
@sanderegg sanderegg added a:apiserver api-server service a:director-v2 issue related with the director-v2 service a:dask-service Any of the dask services: dask-scheduler/sidecar or worker a:resource-usage-tracker resource usage tracker service labels May 14, 2024
@matusdrobuliak66 matusdrobuliak66 modified the milestone: Leeroy Jenkins May 14, 2024
@sanderegg sanderegg added this to the Leeroy Jenkins milestone May 14, 2024
@sanderegg sanderegg changed the title pass parent node ID in a structured way to the computational backend, also for cost display, logs @sanderegg @matusdrobuliak66 @mguidon Display statistics of computational jobs together with their parent nodes May 14, 2024
@sanderegg
Copy link
Member Author

After discussion with @bisgaard-itis :

proposal to modify the osparc python client:

  • modify the API call to create a computational job to get an optional header containing at least the parent node ID
  • based on ENV OSPARC_NODE_ID and possibly OSPARC_STUDY_ID variables set in the dynamic service,
  • the client can automatically fill in the headers
    -> Users that are using the python client in their code will get that feature for free

@mguidon
Copy link
Member

mguidon commented May 15, 2024

So this is to avoid having it in the not-validated metadata?

@sanderegg
Copy link
Member Author

So this is to avoid having it in the not-validated metadata?

As discussed, no. This is for generalization of this usage and to ensure we always get that info so that the billing center looks nice.

As discussed as well, both ways (the sim4life.io way and the new one should work, at least for awhile)

@bisgaard-itis
Copy link
Contributor

After discussion with @bisgaard-itis :

proposal to modify the osparc python client:

* modify the API call to create a computational job to get an optional header containing at least the parent node ID

* based on ENV `OSPARC_NODE_ID` and possibly `OSPARC_STUDY_ID` variables set in the dynamic service,

* the client can automatically fill in the headers
  -> Users that are using the python client in their code will get that feature for free

After thinking a bit more about this I have the following modified proposal: Since this approach is based on the client "picking up" the node_id and sending it to the api-server I suggest to simply overwrite the create_solver_job method in the osparc python client, so that it first calls the endpoint on the api-server to create the job and afterwards calls the patch endpoint with the metadata picked up from the environment variables. That way we will not have to modify anything on the server, so any existing functionality will continue to work and we simply "package" the endpoints into user-friendly functions on the client side.

@sanderegg
Copy link
Member Author

After discussion with @bisgaard-itis :

proposal to modify the osparc python client:

* modify the API call to create a computational job to get an optional header containing at least the parent node ID

* based on ENV `OSPARC_NODE_ID` and possibly `OSPARC_STUDY_ID` variables set in the dynamic service,

* the client can automatically fill in the headers
  -> Users that are using the python client in their code will get that feature for free

After thinking a bit more about this I have the following modified proposal: Since this approach is based on the client "picking up" the node_id and sending it to the api-server I suggest to simply overwrite the create_solver_job method in the osparc python client, so that it first calls the endpoint on the api-server to create the job and afterwards calls the patch endpoint with the metadata picked up from the environment variables. That way we will not have to modify anything on the server, so any existing functionality will continue to work and we simply "package" the endpoints into user-friendly functions on the client side.

@bisgaard-itis ok, but will this also work if the user (such as in sim4life.io) also calls the PATCH endpoint? will this not overwrite whatever was in there? Also I would prefer that the parent node id is not just some json field, but a defined one.

@bisgaard-itis
Copy link
Contributor

After discussion with @bisgaard-itis :

proposal to modify the osparc python client:

* modify the API call to create a computational job to get an optional header containing at least the parent node ID

* based on ENV `OSPARC_NODE_ID` and possibly `OSPARC_STUDY_ID` variables set in the dynamic service,

* the client can automatically fill in the headers
  -> Users that are using the python client in their code will get that feature for free

After thinking a bit more about this I have the following modified proposal: Since this approach is based on the client "picking up" the node_id and sending it to the api-server I suggest to simply overwrite the create_solver_job method in the osparc python client, so that it first calls the endpoint on the api-server to create the job and afterwards calls the patch endpoint with the metadata picked up from the environment variables. That way we will not have to modify anything on the server, so any existing functionality will continue to work and we simply "package" the endpoints into user-friendly functions on the client side.

@bisgaard-itis ok, but will this also work if the user (such as in sim4life.io) also calls the PATCH endpoint? will this not overwrite whatever was in there? Also I would prefer that the parent node id is not just some json field, but a defined one.

This basically delegates all responsibility for setting the parent node_id to the client. So essentially the idea is to do in the python osparc client exactly what Manuel is already doing in the C++ client he is using from sim4life.io and wrap it into a user-friendly function by picking up the node_id from the env. I am not sure I understand exactly what you mean by a "defined field". In the end I guess it will be added in the metadata in the db in the same way Manuel is currently doing it, no?

@sanderegg
Copy link
Member Author

After discussion with @bisgaard-itis :

proposal to modify the osparc python client:

* modify the API call to create a computational job to get an optional header containing at least the parent node ID

* based on ENV `OSPARC_NODE_ID` and possibly `OSPARC_STUDY_ID` variables set in the dynamic service,

* the client can automatically fill in the headers
  -> Users that are using the python client in their code will get that feature for free

After thinking a bit more about this I have the following modified proposal: Since this approach is based on the client "picking up" the node_id and sending it to the api-server I suggest to simply overwrite the create_solver_job method in the osparc python client, so that it first calls the endpoint on the api-server to create the job and afterwards calls the patch endpoint with the metadata picked up from the environment variables. That way we will not have to modify anything on the server, so any existing functionality will continue to work and we simply "package" the endpoints into user-friendly functions on the client side.

@bisgaard-itis ok, but will this also work if the user (such as in sim4life.io) also calls the PATCH endpoint? will this not overwrite whatever was in there? Also I would prefer that the parent node id is not just some json field, but a defined one.

This basically delegates all responsibility for setting the parent node_id to the client. So essentially the idea is to do in the python osparc client exactly what Manuel is already doing in the C++ client he is using from sim4life.io and wrap it into a user-friendly function by picking up the node_id from the env. I am not sure I understand exactly what you mean by a "defined field". In the end I guess it will be added in the metadata in the db in the same way Manuel is currently doing it, no?

@bisgaard-itis so the project metadata that Manuel is using are metadata that are owned by the user. we currently hack this out in order to get the parent NodeID. If your solution does not imply that the user may inadvertently remove the parent NodeID by explicitly calling the endpoint then I am ok.

@sanderegg sanderegg modified the milestones: South Island Iced Tea, Tom Bombadil Jul 8, 2024
@GitHK GitHK closed this as completed Jul 15, 2024
@GitHK GitHK reopened this Jul 15, 2024
@sanderegg sanderegg modified the milestones: Tom Bombadil, Eisbock Aug 13, 2024
@mrnicegyu11 mrnicegyu11 changed the title Display statistics of computational jobs together with their parent nodes Enhancement: Display statistics of computational jobs together with their parent nodes Aug 19, 2024
@sanderegg sanderegg modified the milestones: Eisbock, Doppelbock Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:apiserver api-server service a:dask-service Any of the dask services: dask-scheduler/sidecar or worker a:director-v2 issue related with the director-v2 service a:resource-usage-tracker resource usage tracker service
Projects
None yet
Development

No branches or pull requests

6 participants