Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra metadata to workflow runs exported files #364

Merged
merged 12 commits into from
Dec 5, 2023

Conversation

JaimeSeqLabs
Copy link
Contributor

@JaimeSeqLabs JaimeSeqLabs commented Nov 21, 2023

Description

Closes #361 #357 #356 #355

Introduces a new export file specific to workflow metadata and aggregated data that does not belong to Workflow entity.

This new file workflow-metadata.json generated by tw runs dump command containing the following fields:

  • pipelineId
  • workspaceId
  • workspaceName
  • userId
  • runUrl
  • labels

Guidelines for testing

  1. Launch a pipeline
  2. Copy the workflow run ID
  3. Run tw runs dump -w <your_workspace> -i <workflow_run_id> -o test-runs-dump.tar.gz
  4. The expected console output is:
- Tower info
- Workflow details
- Task details

  Pipeline run '2sk2IPkODYQqUY' at [Org / Wsp] workspace details dump at 'test-runs-dump.tar.gz'

  1. Inside the tar file, the workflow-metadata.json file should contain the following workflow metadata fields:
{
  "pipelineId": ...,
  "workspaceId": ...,
  "workspaceName": ...,
  "userId": ...,
  "runUrl": ...,
  "labels": [
    ...
  ]
}

@JaimeSeqLabs JaimeSeqLabs linked an issue Nov 21, 2023 that may be closed by this pull request
4 tasks
@JaimeSeqLabs JaimeSeqLabs self-assigned this Nov 21, 2023
@JaimeSeqLabs JaimeSeqLabs added the enhancement New feature or request label Nov 21, 2023
@JaimeSeqLabs JaimeSeqLabs added this to the v0.9.1 milestone Nov 21, 2023
@Test
void testDumpRuns(MockServerClient mock) throws IOException {

mock.when(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the test, I think it would be nice to abstract the duplicated code into a single method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean setting up the test endpoints in a separate method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. I see that this code is the same every time, except a couple of parameters that might change. Since we are just setting up the mock, I think that we can abstract it in a method.

Suggested change
mock.when(
private void configMockWith(MockServerClient mock, String method, String path, int statusCode, int count, String resourcePath) {
mock.when(
request().withMethod(method).withPath(path), exactly(count)
).respond(
response().withStatusCode(statusCode).withBody(loadResource(resourcePath)).withContentType(MediaType.APPLICATION_JSON)
);
}

Feel free to also extract the MediaType as a parameter, if it can change.

// Launch info
Launch launch = null;
Long pipelineId = null;
if (workflow.getLaunchId() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we create a separate method with a descriptive name of what is happening here, instead of directly put the if clause?

launch = launchById(wspId, workflow.getLaunchId());

DescribeWorkflowLaunchResponse wfLaunchResponse = workflowLaunchById(wspId, workflow.getId());
if (wfLaunchResponse != null && wfLaunchResponse.getLaunch() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. What is the "if" checking? I think it would be better to create a separate method that returns a boolean, and has a descriptive name of what we are actually checking. Example: isLaunchResponseValid() or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if condition is clear. We check if we got a response and if the response contains a Launch object (we don't have the null chain operator in Java).
Creating a separate method just for this simple check is not justified in my opinion, although It would be if the condition check is re-used multiple times in the class.

Long userId = workflow.getOwnerId();
String userMail = null;
String userName = workflow.getUserName();
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This Try catch could be hidden on a separate method with a descriptive name. That way all this method becomes way simpler and easy to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that moving fragments of the code to a separate method makes sense if said method is getting reused more than once or if the fragment is overly complex. Otherwise we are just scattering the code around without actually reducing complexity.
I'll add some comments given that this filtering fragment is a bit convoluted. If we find an easier approach I'll update it.

Copy link
Contributor

@pgeadas pgeadas Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there are other advantages in splitting the code in smaller methods, rather than just for code reuse:

  • Single responsibility principle: a class should be only responsible of doing one thing. The same thing can be applied to methods.
  • Testability: smaller methods can be unit tested easily since they are only responsible of doing one thing.
  • Legibility: if we have one main method that calls other methods, the code will be way more understandable and, in many cases, you don't even need to look at the smaller methods' code in order to understand what the code intends to do.
  • Reduce comments: if the smaller methods have descriptive names, you avoid using comments that do not add much and only pollute the code base.

I will give you an example:
image

Both methods that were extracted are only used once, but in your opinion:
Which one looks cleaner? Which one takes less time to understand? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are not pollution in the code base. We strongly encourage adding comments wherever is necessary as long as they explain the "why" rather than the "what" in the code.

Which one looks cleaner?

This is subjective, sure the wrapper method is 3 lines but then you have moved the same code into two methods, so you end with N+3 lines and additional jumps that makes debugging more difficult.

P.S: Thanks a lot for the recommendation, I enjoyed reading that book back in the day. Let me return the favor and recommend you another book.
If you are more curious about this and other pragmatic principles please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the same code into two methods

Not really, what happens is that both methods will call a single method where all the duplicate code is. However, that part is not shown in the screenshot.

makes debugging more difficult
How so?

Thanks for the book recommendation 👍🏻

@JaimeSeqLabs JaimeSeqLabs marked this pull request as draft November 27, 2023 15:59
@JaimeSeqLabs JaimeSeqLabs marked this pull request as ready for review November 28, 2023 10:26
@JaimeSeqLabs
Copy link
Contributor Author

I linked all the three issues related to workflow metadata in this PR, their implementations are similar one to another.

@drpatelh
Copy link
Member

drpatelh commented Nov 28, 2023

Thank you for this guys! 🙏🏽

With regard to this issue #357

Would we be able to add pipeline labels to the default JSON file created by tw -o runs view too please?

At the moment, resource labels are dumped but we need the pipeline labels to filter runs for our QuickLaunch implementations:

$ tw -o json runs view --workspace=community/showcase --id=1v9SlFbY4LR7wl       
{
  "general" : {
    "id" : "1v9SlFbY4LR7wl",
    "operationId" : "f2c38f50-a801-461c-9527-bb4be471359e",
    "runName" : "ridiculous_ritchie",
    "startingDate" : "2023-11-07T16:53:43Z",
    "commitId" : "3bec2331cac2b5ff88a1dc71a21fab6529b57a0f",
    "sessionId" : "e6b26621-e067-47a4-9523-ff46b98fb74f",
    "username" : "jonathanyoung305",
    "workdir" : "s3://nf-tower-bucket/scratch/1v9SlFbY4LR7wl",
    "container" : "",
    "executors" : "awsbatch",
    "computeEnv" : "AWS_Batch_Ireland_FusionV2_NVMe",
    "nextflowVersion" : "23.08.0-edge",
    "status" : "SUCCEEDED",
    "labels" : "owner=seqera,rnaseq,workspace=showcase"
  }
}

It just means we don't need to download the entire tar archive first to extract this info.

cc @ejseqera

@drpatelh
Copy link
Member

Actually scratch the request in my former comment. Looks like you can do this already. I tested with a pipeline that didn't have pipeline labels 🤦🏽

$ tw -o json runs view --workspace=seqeralabs/showcase --id=1bGQVwwOs1zHt6
{
  "general" : {
    "id" : "1bGQVwwOs1zHt6",
    "operationId" : "10e64152-4799-46a3-bd3c-e2e45a6aac28",
    "runName" : "intergalactic_torvalds",
    "startingDate" : "2023-11-28T17:17:52Z",
    "commitId" : "94bd58250591c29e23d9afbb9deedf95800fe3fc",
    "sessionId" : "c1499562-9513-4d67-b3fe-0e86777603aa",
    "username" : "drpatelhh",
    "workdir" : "s3://seqeralabs-showcase/scratch/1bGQVwwOs1zHt6",
    "executors" : "awsbatch",
    "computeEnv" : "seqera_aws_ireland_fusionv2_nvme",
    "nextflowVersion" : "23.11.0-edge",
    "status" : "SUCCEEDED",
    "labels" : "gpu,owner=harshil,structure-prediction,workspace=showcase"
  }
}

Would be nice however, if we can disambiguate between pipeline labels and resource labels rather than having everything in one labels field. But that can come as a separate feature request.

@JaimeSeqLabs
Copy link
Contributor Author

Would be nice however, if we can disambiguate between pipeline labels and resource labels rather than having everything in one labels field. But that can come as a separate feature request.

If we really needed it can be separated into a feature, sure.
But is it justified? resource vs non-resource can be distinguished by filtering the labels containing '='.

@JaimeSeqLabs
Copy link
Contributor Author

@ejseqera , @drpatelh I'm trying to refactor the method to obtain the launch user email because the current approach feels very hacky.

We are taking the userEmail from the workspace participant list by matching the userName from the workflow to the participant userName.

This approach doesn't work if we dump the runs in the user context (there is no workspace nor participants) so in that case there is no userMail field in the json file.

Are you extracting the userMail field with a different technique? Otherwise I'll need to make changes on Seqera Platform side to include the mail in the wokflow response.

@JaimeSeqLabs
Copy link
Contributor Author

Hi @ejseqera @drpatelh ,
After some discussions in the backend team we concluded that there is no safe way of exposing the launcher user email without extensive modifications or risking personal data (email) leaks.
The closest that we can get to the requested functionality is the workspace participant scan which doesn't work for user contexts.

Sadly we have to de-scope the email metadata.

@drpatelh
Copy link
Member

drpatelh commented Dec 5, 2023

Thanks alot @JaimeSeqLabs. This all looks great from our side! We will test it out after the release and let you know if we have any more feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
6 participants