Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint for applications list in UHS (not YK) returns 'null' when UHS db shows many apps #358

Closed
richscott opened this issue Dec 2, 2024 · 1 comment · Fixed by #361
Assignees

Comments

@richscott
Copy link
Collaborator

richscott commented Dec 2, 2024

When UHS collects applications from YK, the UHS database does have those entries, however the /api/v1/partition/:partition/queue/:queue/applications just returns null.

To reproduce, go to a copy of the UHS repo:

$ make kind-all
$ make run

Then submit the attached jobs file:

$ kubectl apply -f denis-jobs.yml

(This is just a regular decent-sized list of jobs to submit to Yunikorn - you can use most any other job submission file).
Verify that the Yunikorn web UI shows the applications are queued and/or running, by visiting http://localhost:30001/#/dashboard in your browser.z

Optionally, you can also check the UHS database, by running a postgresql psql client in the cluster, e.g.

$ kubectl run postgresql-client --rm --tty -i --restart='Never' --namespace yunikorn --image docker.io/bitnami/postgresql:17.2.0-debian-12-r1 --env="PGPASSWORD=psw" --command -- psql --host postgresql -U postgres -d uhs -p 5432
uhs=# select count(*) from applications;
81
\q
$

The actual bug is in the applications endpoint on UHS - query it by doing:

$curl  http://localhost:8989/api/v1/partition/default/queue/root/applications
null
$

This should instead return a large JSON list of the applications that UHS detected from YK.

@sudiptob2
Copy link
Collaborator

Thank you for providing detailed information on the ticket!

After further investigation, I identified the root cause of the issue:

  1. The URL used to fetch applications is slightly incorrect. The correct URL should include partition_id and queue_id instead of partition_name and queue_name.
    Example:
    curl http://localhost:8989/api/v1/partition/01JE4M3K3KK3ZB2JPVH6A28B0F/queue/01JE4M3K3K5FF5WY2J4YWKTMXA/applications

  2. The queue_id is not included in the YuniKorn-Core response, so it is always stored as null in the queue_id field of the application table. Consequently, even if the correct ID parameters are provided, the query will still return null.
    Reference: YuniKorn-Core Code (Lines 335–339)

Solution: Added queueID in yunikorn core. After this PR is merged we should get desired result in the application endpoint.
PR: G-Research/yunikorn-core#11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants