- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: download artifacts from current workflow run, improve API usage #9
Conversation
this prevents doing multiple list request in a single job and unnecessary list requests in subsequent jobs
Hi @AlCalzone thank you for this PR, |
Here's a PR that uses the updated workflow - not sure if you see logs:
This comment explains it - essentially the artifacts generated by the current workflow never show up in the artifact list request this action is using. |
export async function downloadSameWorkflowArtifacts() { | ||
const client = create(); | ||
// Try to download all artifacts from the current workflow, but do not fail the build if this fails | ||
const artifacts = await client.downloadAllArtifacts(cacheDir).catch((e) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you ensure that these artifacts are belongs to the current workflow?
Looks like you are downloading all the files, which may contain hundreds of gigs in my case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's this method: https://github.com/actions/toolkit/blob/e6257f111756d2f3567917c8e27ab57de8c3e09c/packages/artifact/src/internal/artifact-client.ts#L221
using this method to list artifacts: https://github.com/actions/toolkit/blob/e6257f111756d2f3567917c8e27ab57de8c3e09c/packages/artifact/src/internal/download-http-client.ts#L45
which in turn is using https://github.com/actions/toolkit/blob/e6257f111756d2f3567917c8e27ab57de8c3e09c/packages/artifact/src/internal/utils.ts#L222 as the download URL
which is referencing the current workflow run using the env variable here:
https://github.com/actions/toolkit/blob/e6257f111756d2f3567917c8e27ab57de8c3e09c/packages/artifact/src/internal/config-variables.ts#L50
Looks like you are downloading all the files, which may contain hundreds of gigs in my case.
Good point. An alternative would be using @actions/artifact
s internals as I attempted here:
zwave-js/node-zwave-js@10121ae
(#5050)
This way one could filter for files that look like a hash, but that might not be stable across releases of that package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't like this approach, I first attempted to download the files from the current workflow and fall back to the ones from other workflows if that didn't work. However, it was pretty slow (20 seconds for a handful of small artifacts).
I think that you are misusing turborepo for sharing assets between jobs. As I understanding it, GH jobs are ment to parallel your tasks, meaning they must by definition not share anything, therefore, you may parallel them. |
Some of the jobs depend on my TS code being compiled. As the project grows, that takes more and more time, so to avoid compiling the same stuff 6 times, I factored it out to a separate job, so the compilation would only be done once. Before migrating to Turborepo over the weekend, I shared the build output with the following jobs by uploading and downloading a single artifact. Now if I don't share the turbo cache between the build job and the following ones, a cache miss in the build job will automatically be a cache miss in the following jobs that depend on a built project too. I guess one way to do it would be a mix of the two approaches: Go back to downloading workflow artifacts on demand, but manually share the cache dir with the following jobs in the same workflow. |
You can use turborepo for caching between workflow on the TS artifact, and in order to share the result of TS between jobs, you will need to upload & download like u did. Hope this make sense for you. |
What about the part where the list request is cached and the local FS is checked for existence of the artifact first? Would you accept this? Oh and the /v8/artifacts/events endpoint? |
Can you elaborate of both of the points? |
I've checked turbo-repo code, and looks like they added Funny that my workflows works without it... :| |
Regarding the list request: Regarding the local FS check:
As for the endpoint: |
Make PR's for both of the points you've mentioned, (separately), I will review them, and we will see if it makes the code more complex or not. |
Will do 👍️ |
I've been playing around with this action, which currently seems to be the only sane option for using Turborepo caching on GH actions. Unfortunately because my workflow is split into separate phases, I ran into a few limitations which this PR is supposed to solve:
Artifacts from previous jobs of the same workflow are not found
This is a limitation of Github Actions - apparently list and delete only works when the workflow was completed. This resulted in caching not working at all in subsequent jobs. I worked around this by downloading all previously uploaded artifacts from the same workflow before starting the server. Then when handling a request, the cache files are checked before falling back to listing other artifacts.
Reduce API usage
In the middle of testing, I actually hit my Github Actions rate limit of 1000/h. I believe this can be avoided using the following tricks:
Oh and I implemented a stub for the statistics endpoint that newer versions of turborepo are using. Otherwise you get a lot of unnecessary errors in the turborepo logs.