feat: download artifacts from current workflow run, improve API usage #9

AlCalzone · 2022-09-12T11:33:26Z

I've been playing around with this action, which currently seems to be the only sane option for using Turborepo caching on GH actions. Unfortunately because my workflow is split into separate phases, I ran into a few limitations which this PR is supposed to solve:

Artifacts from previous jobs of the same workflow are not found
This is a limitation of Github Actions - apparently list and delete only works when the workflow was completed. This resulted in caching not working at all in subsequent jobs. I worked around this by downloading all previously uploaded artifacts from the same workflow before starting the server. Then when handling a request, the cache files are checked before falling back to listing other artifacts.

Reduce API usage
In the middle of testing, I actually hit my Github Actions rate limit of 1000/h. I believe this can be avoided using the following tricks:

Check file system first, then the API
Cache list request for serving GET requests. The turbo server is usually short-lived, so it should be relatively unlikely that we miss an artifact this way. And if we do, it's not the end of the world
Re-upload "remote" artifacts as workflow artifacts. I'm not 100% sold on this, but in theory we should be able avoid the list request on subsequent job runs this way, since these are automatically downloaded in followup jobs.

Oh and I implemented a stub for the statistics endpoint that newer versions of turborepo are using. Otherwise you get a lot of unnecessary errors in the turborepo logs.

this prevents doing multiple list request in a single job and unnecessary list requests in subsequent jobs

felixmosh · 2022-09-12T12:08:47Z

Hi @AlCalzone thank you for this PR,
Can you provide a small example of GH action that can represent your setup?
What do you mean by list & delete are not available?

AlCalzone · 2022-09-12T12:13:26Z

Here's a PR that uses the updated workflow - not sure if you see logs:
zwave-js/node-zwave-js#5050
Here's the corresponding workflow run: https://github.com/zwave-js/node-zwave-js/actions/runs/3036837290, the ones after this one are failing because I'm cleaning up other stuff.

What do you mean by list & delete are not available?

This comment explains it - essentially the artifacts generated by the current workflow never show up in the artifact list request this action is using.
actions/upload-artifact#53 (comment)
The @actions/artifact download method uses some different paths related to the current workflow, so it does find them.

felixmosh · 2022-09-12T12:24:02Z

src/utils/downloadArtifact.ts

+export async function downloadSameWorkflowArtifacts() {
+  const client = create();
+  // Try to download all artifacts from the current workflow, but do not fail the build if this fails
+  const artifacts = await client.downloadAllArtifacts(cacheDir).catch((e) => {


Where do you ensure that these artifacts are belongs to the current workflow?
Looks like you are downloading all the files, which may contain hundreds of gigs in my case.

That's this method: https://github.com/actions/toolkit/blob/e6257f111756d2f3567917c8e27ab57de8c3e09c/packages/artifact/src/internal/artifact-client.ts#L221
using this method to list artifacts: https://github.com/actions/toolkit/blob/e6257f111756d2f3567917c8e27ab57de8c3e09c/packages/artifact/src/internal/download-http-client.ts#L45
which in turn is using https://github.com/actions/toolkit/blob/e6257f111756d2f3567917c8e27ab57de8c3e09c/packages/artifact/src/internal/utils.ts#L222 as the download URL
which is referencing the current workflow run using the env variable here:
https://github.com/actions/toolkit/blob/e6257f111756d2f3567917c8e27ab57de8c3e09c/packages/artifact/src/internal/config-variables.ts#L50

Looks like you are downloading all the files, which may contain hundreds of gigs in my case.

Good point. An alternative would be using @actions/artifacts internals as I attempted here:
zwave-js/node-zwave-js@10121ae (#5050)
This way one could filter for files that look like a hash, but that might not be stable across releases of that package.

If you don't like this approach, I first attempted to download the files from the current workflow and fall back to the ones from other workflows if that didn't work. However, it was pretty slow (20 seconds for a handful of small artifacts).

felixmosh · 2022-09-12T12:29:51Z

I think that you are misusing turborepo for sharing assets between jobs.

As I understanding it, GH jobs are ment to parallel your tasks, meaning they must by definition not share anything, therefore, you may parallel them.
So how do you expect turbo to reuse artifacts of other jobs from the same workflow? Turborepo should reuse artifacts from previous workflows, this it's whole point.

AlCalzone · 2022-09-12T12:56:29Z

Some of the jobs depend on my TS code being compiled. As the project grows, that takes more and more time, so to avoid compiling the same stuff 6 times, I factored it out to a separate job, so the compilation would only be done once. Before migrating to Turborepo over the weekend, I shared the build output with the following jobs by uploading and downloading a single artifact.

Now if I don't share the turbo cache between the build job and the following ones, a cache miss in the build job will automatically be a cache miss in the following jobs that depend on a built project too.

I guess one way to do it would be a mix of the two approaches: Go back to downloading workflow artifacts on demand, but manually share the cache dir with the following jobs in the same workflow.

felixmosh · 2022-09-12T13:07:35Z

You can use turborepo for caching between workflow on the TS artifact, and in order to share the result of TS between jobs, you will need to upload & download like u did.

Hope this make sense for you.
Anyway, thank you for the PR, and sorry for not approving it, but I don't think that this is the responsibility of this action to share artifacts between jobs of the same workflow.

AlCalzone · 2022-09-12T13:10:38Z

What about the part where the list request is cached and the local FS is checked for existence of the artifact first? Would you accept this? Oh and the /v8/artifacts/events endpoint?

felixmosh · 2022-09-12T13:27:58Z

Can you elaborate of both of the points?
Why a new workflow will contain a turborepo cache on the disk?

felixmosh · 2022-09-12T13:30:39Z

I've checked turbo-repo code, and looks like they added /v8/artifacts/events for analytics.
You can make a separate PR with this end point, thanks.

Funny that my workflows works without it... :|

AlCalzone · 2022-09-12T13:37:29Z

Regarding the list request:
Let's say your workflow has 20 turborepo tasks. As it currently is, every GET request to the endpoint /v8/artifacts/:id will list the existing artifacts using the Github API. Unless another parallel workflow run finishes and adds another artifact, each of these 20 list requests will have the same result - i.e. there were 19 unnecessary API uses. And even if there are new artifacts between request 1 and 20, I think the chance of needing that exact artifact is relatively low.
By caching the result of the Github API request on the first GET, we can save some requests.

Regarding the local FS check:
It probably isn't terribly useful if there is only one job in a workflow, but it is useful if:

you repeat a task for some reason and need the same artifact as earlier for it
you have multiple jobs and persist the cache between jobs (so I'm going to need this).
In any case, a check if the file exists isn't expensive and can potentially prevent downloading unnecessarily.

As for the endpoint:
It doesn't cause failures, but if you run turbo with -vv, every request will log an error with a HTML page in it. I can't find a corresponding log, but it's a bit annoying when trying to debug stuff.

felixmosh · 2022-09-12T13:43:04Z

Make PR's for both of the points you've mentioned, (separately), I will review them, and we will see if it makes the code more complex or not.

AlCalzone · 2022-09-12T13:43:26Z

Will do 👍️

AlCalzone added 6 commits September 12, 2022 11:45

feat: support downloading artifacts from current workflow run

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode

25fa53e

feat: log when artifact found locally

6a0a7b3

feat: handle statistics requests

e354a63

feat: download current workflow artifacts before starting the server

c7be2f4

feat: re-upload artifacts from other workflows, cache list response

11f2f24

this prevents doing multiple list request in a single job and unnecessary list requests in subsequent jobs

fix: logging for local artifacts

d4ac2ed

felixmosh reviewed Sep 12, 2022

View reviewed changes

AlCalzone mentioned this pull request Sep 12, 2022

feat: respond with status code 200 on analytics endpoint #10

Merged

felixmosh closed this Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: download artifacts from current workflow run, improve API usage #9

feat: download artifacts from current workflow run, improve API usage #9

AlCalzone commented Sep 12, 2022 •

edited

Loading

felixmosh commented Sep 12, 2022 •

edited

Loading

AlCalzone commented Sep 12, 2022

felixmosh Sep 12, 2022 •

edited

Loading

AlCalzone Sep 12, 2022

AlCalzone Sep 12, 2022

felixmosh commented Sep 12, 2022

AlCalzone commented Sep 12, 2022

felixmosh commented Sep 12, 2022 •

edited

Loading

AlCalzone commented Sep 12, 2022 •

edited

Loading

felixmosh commented Sep 12, 2022

felixmosh commented Sep 12, 2022

AlCalzone commented Sep 12, 2022

felixmosh commented Sep 12, 2022

AlCalzone commented Sep 12, 2022

feat: download artifacts from current workflow run, improve API usage #9

feat: download artifacts from current workflow run, improve API usage #9

Conversation

AlCalzone commented Sep 12, 2022 • edited Loading

felixmosh commented Sep 12, 2022 • edited Loading

AlCalzone commented Sep 12, 2022

felixmosh Sep 12, 2022 • edited Loading

Choose a reason for hiding this comment

AlCalzone Sep 12, 2022

Choose a reason for hiding this comment

AlCalzone Sep 12, 2022

Choose a reason for hiding this comment

felixmosh commented Sep 12, 2022

AlCalzone commented Sep 12, 2022

felixmosh commented Sep 12, 2022 • edited Loading

AlCalzone commented Sep 12, 2022 • edited Loading

felixmosh commented Sep 12, 2022

felixmosh commented Sep 12, 2022

AlCalzone commented Sep 12, 2022

felixmosh commented Sep 12, 2022

AlCalzone commented Sep 12, 2022

AlCalzone commented Sep 12, 2022 •

edited

Loading

felixmosh commented Sep 12, 2022 •

edited

Loading

felixmosh Sep 12, 2022 •

edited

Loading

felixmosh commented Sep 12, 2022 •

edited

Loading

AlCalzone commented Sep 12, 2022 •

edited

Loading