Add extra metadata to workflow runs exported files #364

JaimeSeqLabs · 2023-11-21T17:03:44Z

Description

Introduces a new export file specific to workflow metadata and aggregated data that does not belong to Workflow entity.

This new file workflow-metadata.json generated by tw runs dump command containing the following fields:

pipelineId
workspaceId
workspaceName
userId
runUrl
labels

Guidelines for testing

Launch a pipeline
Copy the workflow run ID
Run tw runs dump -w <your_workspace> -i <workflow_run_id> -o test-runs-dump.tar.gz
The expected console output is:

- Tower info
- Workflow details
- Task details

  Pipeline run '2sk2IPkODYQqUY' at [Org / Wsp] workspace details dump at 'test-runs-dump.tar.gz'

Inside the tar file, the workflow-metadata.json file should contain the following workflow metadata fields:

{
  "pipelineId": ...,
  "workspaceId": ...,
  "workspaceName": ...,
  "userId": ...,
  "runUrl": ...,
  "labels": [
    ...
  ]
}

src/main/java/io/seqera/tower/cli/commands/runs/DumpCmd.java

src/main/java/io/seqera/tower/cli/shared/WorkflowDumpExportFormat.java

pgeadas · 2023-11-22T14:17:02Z

src/test/java/io/seqera/tower/cli/runs/RunsCmdTest.java

+    @Test
+    void testDumpRuns(MockServerClient mock) throws IOException {
+
+        mock.when(


Same for the test, I think it would be nice to abstract the duplicated code into a single method.

Do you mean setting up the test endpoints in a separate method?

I guess so. I see that this code is the same every time, except a couple of parameters that might change. Since we are just setting up the mock, I think that we can abstract it in a method.

Suggested change

mock.when(

private void configMockWith(MockServerClient mock, String method, String path, int statusCode, int count, String resourcePath) {

mock.when(

request().withMethod(method).withPath(path), exactly(count)

).respond(

response().withStatusCode(statusCode).withBody(loadResource(resourcePath)).withContentType(MediaType.APPLICATION_JSON)

);

}

Feel free to also extract the MediaType as a parameter, if it can change.

src/test/java/io/seqera/tower/cli/shared/WorkflowDumpExportFormatTest.java

src/main/java/io/seqera/tower/cli/commands/runs/DumpCmd.java

pgeadas · 2023-11-27T10:50:13Z

src/main/java/io/seqera/tower/cli/commands/runs/DumpCmd.java

+        // Launch info
+        Launch launch = null;
+        Long pipelineId = null;
+        if (workflow.getLaunchId() != null) {


Can't we create a separate method with a descriptive name of what is happening here, instead of directly put the if clause?

pgeadas · 2023-11-27T10:52:32Z

src/main/java/io/seqera/tower/cli/commands/runs/DumpCmd.java

+            launch = launchById(wspId, workflow.getLaunchId());
+
+            DescribeWorkflowLaunchResponse wfLaunchResponse = workflowLaunchById(wspId, workflow.getId());
+            if (wfLaunchResponse != null && wfLaunchResponse.getLaunch() != null) {


Same here. What is the "if" checking? I think it would be better to create a separate method that returns a boolean, and has a descriptive name of what we are actually checking. Example: isLaunchResponseValid() or something.

I think the if condition is clear. We check if we got a response and if the response contains a Launch object (we don't have the null chain operator in Java).
Creating a separate method just for this simple check is not justified in my opinion, although It would be if the condition check is re-used multiple times in the class.

pgeadas · 2023-11-27T10:53:46Z

src/main/java/io/seqera/tower/cli/commands/runs/DumpCmd.java

+        Long userId = workflow.getOwnerId();
+        String userMail = null;
+        String userName = workflow.getUserName();
+        try {


Same here. This Try catch could be hidden on a separate method with a descriptive name. That way all this method becomes way simpler and easy to understand.

I think that moving fragments of the code to a separate method makes sense if said method is getting reused more than once or if the fragment is overly complex. Otherwise we are just scattering the code around without actually reducing complexity.
I'll add some comments given that this filtering fragment is a bit convoluted. If we find an easier approach I'll update it.

I think that there are other advantages in splitting the code in smaller methods, rather than just for code reuse:

Single responsibility principle: a class should be only responsible of doing one thing. The same thing can be applied to methods.

Testability: smaller methods can be unit tested easily since they are only responsible of doing one thing.

Legibility: if we have one main method that calls other methods, the code will be way more understandable and, in many cases, you don't even need to look at the smaller methods' code in order to understand what the code intends to do.

Reduce comments: if the smaller methods have descriptive names, you avoid using comments that do not add much and only pollute the code base.

I will give you an example:

Both methods that were extracted are only used once, but in your opinion:
Which one looks cleaner? Which one takes less time to understand? :)

Comments are not pollution in the code base. We strongly encourage adding comments wherever is necessary as long as they explain the "why" rather than the "what" in the code.

Which one looks cleaner?

This is subjective, sure the wrapper method is 3 lines but then you have moved the same code into two methods, so you end with N+3 lines and additional jumps that makes debugging more difficult.

P.S: Thanks a lot for the recommendation, I enjoyed reading that book back in the day. Let me return the favor and recommend you another book.
If you are more curious about this and other pragmatic principles please let me know.

moved the same code into two methods

Not really, what happens is that both methods will call a single method where all the duplicate code is. However, that part is not shown in the screenshot.

makes debugging more difficult
How so?

Thanks for the book recommendation 👍🏻

JaimeSeqLabs · 2023-11-28T10:28:07Z

I linked all the three issues related to workflow metadata in this PR, their implementations are similar one to another.

drpatelh · 2023-11-28T18:53:00Z

Thank you for this guys! 🙏🏽

With regard to this issue #357

Would we be able to add pipeline labels to the default JSON file created by tw -o runs view too please?

At the moment, resource labels are dumped but we need the pipeline labels to filter runs for our QuickLaunch implementations:

$ tw -o json runs view --workspace=community/showcase --id=1v9SlFbY4LR7wl       
{
  "general" : {
    "id" : "1v9SlFbY4LR7wl",
    "operationId" : "f2c38f50-a801-461c-9527-bb4be471359e",
    "runName" : "ridiculous_ritchie",
    "startingDate" : "2023-11-07T16:53:43Z",
    "commitId" : "3bec2331cac2b5ff88a1dc71a21fab6529b57a0f",
    "sessionId" : "e6b26621-e067-47a4-9523-ff46b98fb74f",
    "username" : "jonathanyoung305",
    "workdir" : "s3://nf-tower-bucket/scratch/1v9SlFbY4LR7wl",
    "container" : "",
    "executors" : "awsbatch",
    "computeEnv" : "AWS_Batch_Ireland_FusionV2_NVMe",
    "nextflowVersion" : "23.08.0-edge",
    "status" : "SUCCEEDED",
    "labels" : "owner=seqera,rnaseq,workspace=showcase"
  }
}

It just means we don't need to download the entire tar archive first to extract this info.

cc @ejseqera

drpatelh · 2023-11-28T19:17:11Z

Actually scratch the request in my former comment. Looks like you can do this already. I tested with a pipeline that didn't have pipeline labels 🤦🏽

$ tw -o json runs view --workspace=seqeralabs/showcase --id=1bGQVwwOs1zHt6
{
  "general" : {
    "id" : "1bGQVwwOs1zHt6",
    "operationId" : "10e64152-4799-46a3-bd3c-e2e45a6aac28",
    "runName" : "intergalactic_torvalds",
    "startingDate" : "2023-11-28T17:17:52Z",
    "commitId" : "94bd58250591c29e23d9afbb9deedf95800fe3fc",
    "sessionId" : "c1499562-9513-4d67-b3fe-0e86777603aa",
    "username" : "drpatelhh",
    "workdir" : "s3://seqeralabs-showcase/scratch/1bGQVwwOs1zHt6",
    "executors" : "awsbatch",
    "computeEnv" : "seqera_aws_ireland_fusionv2_nvme",
    "nextflowVersion" : "23.11.0-edge",
    "status" : "SUCCEEDED",
    "labels" : "gpu,owner=harshil,structure-prediction,workspace=showcase"
  }
}

Would be nice however, if we can disambiguate between pipeline labels and resource labels rather than having everything in one labels field. But that can come as a separate feature request.

JaimeSeqLabs · 2023-11-29T10:59:15Z

Would be nice however, if we can disambiguate between pipeline labels and resource labels rather than having everything in one labels field. But that can come as a separate feature request.

If we really needed it can be separated into a feature, sure.
But is it justified? resource vs non-resource can be distinguished by filtering the labels containing '='.

JaimeSeqLabs · 2023-11-29T11:43:24Z

@ejseqera , @drpatelh I'm trying to refactor the method to obtain the launch user email because the current approach feels very hacky.

We are taking the userEmail from the workspace participant list by matching the userName from the workflow to the participant userName.

This approach doesn't work if we dump the runs in the user context (there is no workspace nor participants) so in that case there is no userMail field in the json file.

Are you extracting the userMail field with a different technique? Otherwise I'll need to make changes on Seqera Platform side to include the mail in the wokflow response.

JaimeSeqLabs · 2023-12-04T16:47:02Z

Hi @ejseqera @drpatelh ,
After some discussions in the backend team we concluded that there is no safe way of exposing the launcher user email without extensive modifications or risking personal data (email) leaks.
The closest that we can get to the requested functionality is the workspace participant scan which doesn't work for user contexts.

Sadly we have to de-scope the email metadata.

drpatelh · 2023-12-05T13:01:42Z

Thanks alot @JaimeSeqLabs. This all looks great from our side! We will test it out after the release and let you know if we have any more feedback.

JaimeSeqLabs added 2 commits November 21, 2023 17:03

extend workflow export format with virtual fields

1d81404

runs dump cmd, extend workflow export fields

fbbd0c8

JaimeSeqLabs linked an issue Nov 21, 2023 that may be closed by this pull request

Add keys to tw runs dump output to complete metadata for a workflow #361

Closed

4 tasks

JaimeSeqLabs requested review from jordeu and ejseqera November 21, 2023 17:03

JaimeSeqLabs self-assigned this Nov 21, 2023

JaimeSeqLabs added the enhancement New feature or request label Nov 21, 2023

JaimeSeqLabs added this to the v0.9.1 milestone Nov 21, 2023

JaimeSeqLabs requested review from pgeadas and endre-seqera November 21, 2023 17:04

use tower's object mapper for virtual extension formats

b23fa01

jordeu reviewed Nov 22, 2023

View reviewed changes

src/main/java/io/seqera/tower/cli/commands/runs/DumpCmd.java Outdated Show resolved Hide resolved

src/main/java/io/seqera/tower/cli/commands/runs/DumpCmd.java Outdated Show resolved Hide resolved

endre-seqera reviewed Nov 22, 2023

View reviewed changes

src/main/java/io/seqera/tower/cli/commands/runs/DumpCmd.java Outdated Show resolved Hide resolved

src/main/java/io/seqera/tower/cli/shared/WorkflowDumpExportFormat.java Outdated Show resolved Hide resolved

pgeadas reviewed Nov 22, 2023

View reviewed changes

src/main/java/io/seqera/tower/cli/shared/WorkflowDumpExportFormat.java Outdated Show resolved Hide resolved

pgeadas reviewed Nov 22, 2023

View reviewed changes

src/main/java/io/seqera/tower/cli/shared/WorkflowDumpExportFormat.java Outdated Show resolved Hide resolved

pgeadas reviewed Nov 22, 2023

View reviewed changes

src/test/java/io/seqera/tower/cli/shared/WorkflowDumpExportFormatTest.java Outdated Show resolved Hide resolved

JaimeSeqLabs added 3 commits November 24, 2023 18:29

clean json nodes handling in export formats

1c47147

export workflow metadata to separate file

6690566

reflection files

bc889a2

JaimeSeqLabs commented Nov 27, 2023

View reviewed changes

src/main/java/io/seqera/tower/cli/commands/runs/DumpCmd.java Outdated Show resolved Hide resolved

pgeadas reviewed Nov 27, 2023

View reviewed changes

JaimeSeqLabs added 2 commits November 27, 2023 12:37

comment for user info filtering

bb50149

add workspace name to metadata

aefb518

JaimeSeqLabs marked this pull request as draft November 27, 2023 15:59

JaimeSeqLabs added 3 commits November 27, 2023 18:03

add workflow run URL to metadata

2690e4d

add workflow labels to metadata

6b4350c

fix reflection files

12a6aa3

JaimeSeqLabs marked this pull request as ready for review November 28, 2023 10:26

remove user email from metadata

faaec0f

JaimeSeqLabs merged commit 3e14dca into master Dec 5, 2023
11 checks passed

JaimeSeqLabs linked an issue Dec 5, 2023 that may be closed by this pull request

Add Run URL in JSON from tw runs dump #356

Closed

JaimeSeqLabs mentioned this pull request Dec 5, 2023

Add Run URL in JSON from tw runs dump #356

Closed

JaimeSeqLabs linked an issue Dec 5, 2023 that may be closed by this pull request

Add Run labels in JSON from tw runs dump #357

Closed

This was referenced Dec 5, 2023

Add Run labels in JSON from tw runs dump #357

Closed

Add Workspace name in JSON from tw runs dump #355

Closed

adamrtalbot mentioned this pull request Dec 7, 2023

SD-50 Extract data via single step seqeralabs/showcase-automation#12

Closed

adamrtalbot mentioned this pull request Dec 19, 2023

Add organisation to tw runs info/dump #375

Closed

ejseqera mentioned this pull request Feb 6, 2024

Pipeline and resource labels missing from tw runs dump output #384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extra metadata to workflow runs exported files #364

Add extra metadata to workflow runs exported files #364

JaimeSeqLabs commented Nov 21, 2023 •

edited

Loading

pgeadas Nov 22, 2023

JaimeSeqLabs Nov 22, 2023

pgeadas Nov 22, 2023

pgeadas Nov 27, 2023

pgeadas Nov 27, 2023

JaimeSeqLabs Nov 27, 2023

pgeadas Nov 27, 2023

JaimeSeqLabs Nov 27, 2023

pgeadas Nov 27, 2023 •

edited

Loading

JaimeSeqLabs Nov 27, 2023

pgeadas Nov 27, 2023

JaimeSeqLabs commented Nov 28, 2023

drpatelh commented Nov 28, 2023 •

edited

Loading

drpatelh commented Nov 28, 2023

JaimeSeqLabs commented Nov 29, 2023

JaimeSeqLabs commented Nov 29, 2023

JaimeSeqLabs commented Dec 4, 2023

drpatelh commented Dec 5, 2023

-        mock.when(
+            private void configMockWith(MockServerClient mock, String method, String path, int statusCode, int count, String resourcePath) {
+        mock.when(
+                request().withMethod(method).withPath(path), exactly(count)
+        ).respond(
+                response().withStatusCode(statusCode).withBody(loadResource(resourcePath)).withContentType(MediaType.APPLICATION_JSON)
+        );
+    }

Add extra metadata to workflow runs exported files #364

Add extra metadata to workflow runs exported files #364

Conversation

JaimeSeqLabs commented Nov 21, 2023 • edited Loading

Description

Guidelines for testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgeadas Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaimeSeqLabs commented Nov 28, 2023

drpatelh commented Nov 28, 2023 • edited Loading

drpatelh commented Nov 28, 2023

JaimeSeqLabs commented Nov 29, 2023

JaimeSeqLabs commented Nov 29, 2023

JaimeSeqLabs commented Dec 4, 2023

drpatelh commented Dec 5, 2023

JaimeSeqLabs commented Nov 21, 2023 •

edited

Loading

pgeadas Nov 27, 2023 •

edited

Loading

drpatelh commented Nov 28, 2023 •

edited

Loading