Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a separate action for removing old wheels #95

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
d6264b8
Add a preliminary action for removing wheels
agriyakhetarpal Sep 27, 2024
9d4f667
Clarify some of the inputs, improve descriptions
agriyakhetarpal Sep 27, 2024
002d2e5
Rename original action to reflect wheel uploads
agriyakhetarpal Sep 27, 2024
40cd6bc
Convert remove wheels step to a bash script
agriyakhetarpal Sep 27, 2024
8a33f20
Fix up messages, improve comments
agriyakhetarpal Sep 27, 2024
b18b4a1
Rename org to user in sync with action's YAML
agriyakhetarpal Sep 27, 2024
deb3cba
Oops, don't let `pixi` `pip`-install `curl` and `jq`
agriyakhetarpal Sep 27, 2024
05770f3
Use `./remove-wheels` internally
agriyakhetarpal Sep 27, 2024
199b1c8
Don't mention "Anaconda Cloud" explicitly
agriyakhetarpal Sep 27, 2024
f8b96a0
Ensure consistency: use ANACONDA_USER
agriyakhetarpal Sep 27, 2024
70f8a3d
Mark TODO about macOS support
agriyakhetarpal Sep 27, 2024
2fac172
Clean up, add more TODOs and comments
agriyakhetarpal Sep 27, 2024
6a1b65e
Merge branch 'main' into feat/separate-action-for-artifact-removals
agriyakhetarpal Sep 30, 2024
8bb7bb6
Rename `anaconda_user` for consistency
agriyakhetarpal Sep 30, 2024
bae5ad6
Let ANACONDA_USER env var be empty
agriyakhetarpal Sep 30, 2024
f916adc
Merge branch 'main' into feat/separate-action-for-artifact-removals
agriyakhetarpal Sep 30, 2024
afedc16
Fix Anaconda org input
agriyakhetarpal Sep 30, 2024
35c9d59
Add some docs sections
agriyakhetarpal Sep 30, 2024
7a7dbbc
Merge branch 'main' into feat/separate-action-for-artifact-removals
bsipocz Sep 30, 2024
7830eae
Merge branch 'main' into feat/separate-action-for-artifact-removals
matthewfeickert Oct 1, 2024
2f62d85
Add some docs suggestions from code review
agriyakhetarpal Oct 1, 2024
1f04be6
Rename upload token to just token for clarity
agriyakhetarpal Oct 1, 2024
b6dbd44
Use single line for command
agriyakhetarpal Oct 1, 2024
88ce687
Fixes for `pixi` and shell script filename
agriyakhetarpal Oct 1, 2024
03979b0
Add `jq` from conda-forge as a dependency
agriyakhetarpal Oct 1, 2024
bf8e993
Revert change to "Nightly upload" section
agriyakhetarpal Oct 1, 2024
433e7c3
Revert `jq`'s addition to `pixi` global manifest file
agriyakhetarpal Oct 1, 2024
4a3497d
Add a new manifest for the `remove-wheels` environment
agriyakhetarpal Oct 1, 2024
71ede79
Generate `pixi.lock` file for `remove-wheels`
agriyakhetarpal Oct 1, 2024
2d2e62a
Add a note about how tokens for packages work
agriyakhetarpal Oct 1, 2024
556e9a6
Change to secondary-level heading
agriyakhetarpal Oct 3, 2024
e403c81
Move docs up, and workflow example below
agriyakhetarpal Oct 3, 2024
15b617e
Update pixi lockfile to be version 5 compliant
agriyakhetarpal Oct 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 8 additions & 72 deletions .github/workflows/remove-wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: Remove old wheels
on:
# Run daily at 1:23 UTC
schedule:
- cron: '23 1 * * *'
- cron: '23 1 * * *'
workflow_dispatch:

concurrency:
Expand All @@ -30,16 +30,7 @@ jobs:
name: remove-old-wheels

steps:
- name: Install micromamba and anaconda-client
uses: mamba-org/setup-micromamba@f8b8a1e23a26f60a44c853292711bacfd3eac822 # v1.9.0
with:
environment-name: remove-wheels
create-args: >-
python=3.12
anaconda-client=1.12.3
curl
jq

- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Show environment
run: env

Expand All @@ -49,64 +40,9 @@ jobs:
echo ""
anaconda remove --help

- name: Query package index for packages
run: |
curl https://raw.githubusercontent.com/scientific-python/upload-nightly-action/main/packages-ignore-from-cleanup.txt --output packages-ignore-from-cleanup.txt
anaconda show "${ANACONDA_USER}" &> >(grep "${ANACONDA_USER}/") | \
awk '{print $1}' | \
sed 's|.*/||g' | \
grep -vf packages-ignore-from-cleanup.txt > package-names.txt

- name: Remove old uploads to save space
run: |
# Remove all _but_ the last ${N_LATEST_UPLOADS} package versions and
# remove all package versions older than 30 days.

if [ -s package-names.txt ]; then
threshold_date="$(date +%F -d '30 days ago')"

# Remember can't quote subshell as need to split on (space seperated) token
for package_name in $(cat package-names.txt); do

echo -e "\n# package: ${package_name}"

curl --silent https://api.anaconda.org/package/"${ANACONDA_USER}/${package_name}" | \
jq -r '.releases[].version' > package-versions.txt
head --lines "-${N_LATEST_UPLOADS}" package-versions.txt > remove-package-versions.txt

for package_version in $(cat package-versions.txt); do
# c.f. https://github.com/Anaconda-Platform/anaconda-client/issues/682#issuecomment-1677283067
upload_date=$(curl --silent https://api.anaconda.org/release/"${ANACONDA_USER}/${package_name}/${package_version}" | \
jq -r '.distributions[].upload_time' | \
sort | \
tail --lines 1 | \
awk '{print $1}')

# check upload_date is YYYY-MM-DD formatted
# c.f. https://github.com/scientific-python/upload-nightly-action/issues/73
if [[ "${upload_date}" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]; then
if [[ "${upload_date}" < "${threshold_date}" ]]; then
echo "# ${ANACONDA_USER}/${package_name}/${package_version} last uploaded on ${upload_date}"
echo "${package_version}" >> remove-package-versions.txt
fi
else
echo "# ERROR: ${ANACONDA_USER}/${package_name}/${package_version} upload date ${upload_date} is not YYYY-MM-DD."
fi

done

if [ -s remove-package-versions.txt ]; then
# Guard against duplicate entries from packages over
# count and time thresholds
sort --output remove-package-versions.txt --unique remove-package-versions.txt

for package_version in $(cat remove-package-versions.txt); do
echo "# Removing ${ANACONDA_USER}/${package_name}/${package_version}"
anaconda --token ${{ secrets.ANACONDA_TOKEN }} remove \
--force \
"${ANACONDA_USER}/${package_name}/${package_version}"
done
fi

done
fi
- name: Remove old wheels
uses: ./remove-wheels
with:
n_latest_uploads: ${{ env.N_LATEST_UPLOADS }}
anaconda_nightly_upload_organization: ${{ env.ANACONDA_USER }}
anaconda_token: ${{ secrets.ANACONDA_TOKEN }}
46 changes: 41 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Nightly upload

This is a GitHub Action that uploads nightly builds to the [scientific-python nightly channel][],
as recommended in [SPEC4 — Using and Creating Nightly Wheels][].
This is a GitHub Action that uploads (and helps remove) nightly builds to
the [scientific-python nightly channel][], as recommended in
[SPEC4 — Using and Creating Nightly Wheels][].
agriyakhetarpal marked this conversation as resolved.
Show resolved Hide resolved

In a GitHub Actions workflow (`.github/workflows/*.yaml`), use the
following snippet on a Linux or macOS runner to upload built wheels to the
Expand All @@ -18,11 +19,45 @@ jobs:
anaconda_nightly_upload_token: ${{secrets.UPLOAD_TOKEN}}
```

Note that we recommend pinning the action against a specific SHA
> [!IMPORTANT]
> Note that we recommend pinning the action against a specific SHA
(rather than a tag), to guard against the unlikely event of upstream
being compromised.

## Updating the action
# Removing old nightly builds
agriyakhetarpal marked this conversation as resolved.
Show resolved Hide resolved

This repository also ships with an action to ease removals of older nightly wheels from a channel.

To use this functionality, add the following snippet to your workflow:

```yml
jobs:
steps:
...
- name: Remove old wheels
uses: scientific-python/upload-nightly-action/remove-wheels@cantknowhashyet # 0.6.0
with:
n_latest_uploads: ${{ env.N_LATEST_UPLOADS }}
anaconda_nightly_upload_token: ${{secrets.UPLOAD_TOKEN}}
agriyakhetarpal marked this conversation as resolved.
Show resolved Hide resolved
```

Which will remove all but the `n_latest_uploads` latest uploads from the channel. This is useful
to avoid hosting outdated development versions, as well as to clean up space.

The channel to remove wheels from is set to the ``scientific-python-nightly-wheels`` channel
by default. If you are using this channel, please note that this repository will automatically
clean up old artifacts for you.
agriyakhetarpal marked this conversation as resolved.
Show resolved Hide resolved

If you do not wish to have this automated cleanup, please open an issue on this repository
to be added to the list of packages exempt from it. The current ones are named in
`packages-ignore-from-cleanup.txt`.
agriyakhetarpal marked this conversation as resolved.
Show resolved Hide resolved

Please refer to the [artifact cleanup policy][] for more information.

To remove wheels from a different channel, set the ``anaconda_nightly_upload_organization``
agriyakhetarpal marked this conversation as resolved.
Show resolved Hide resolved
variable to the desired organization.

## Updating the actions

You can [use Dependabot to keep the GitHub Action up to date][],
with a `.github/dependabot.yml` config file similar to:
Expand All @@ -45,7 +80,7 @@ then generate a token at `https://anaconda.org/<anaconda cloud user name>/settin
with permissions to _Allow write access to the API site_ and _Allow uploads to Standard Python repositories_,
and add the token as a secret to your GitHub repository.

## Using a different channel
## Using a different channel other than ``scientific-python-nightly-wheels``
agriyakhetarpal marked this conversation as resolved.
Show resolved Hide resolved

This Github Action can upload your nightly builds to a different channel. To do so,
define the `anaconda_nightly_upload_organization` variable. Furthermore,
Expand Down Expand Up @@ -112,3 +147,4 @@ dependencies:
[PyPI]: https://pypi.org/
[scientific-python nightly channel]: https://anaconda.org/scientific-python-nightly-wheels
[SPEC4 — Using and Creating Nightly Wheels]: https://scientific-python.org/specs/spec-0004/
[artifact cleanup policy]: #artifact-cleanup-policy-at-the-scientific-python-nightly-wheels-channel
2 changes: 1 addition & 1 deletion action.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Upload Nightly
name: Scientific Python / Upload Nightly Wheels
description: A GitHub Action to upload artifacts nightly
permissions:
actions: read
Expand Down
50 changes: 50 additions & 0 deletions remove-wheels/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Scientific Python / Remove Old Wheels
description: A GitHub Action to remove old wheels
permissions:
actions: read
contents: read
metadata: read
author: "Scientific-Python"
# TODO: have to think about versioning; whether to version separately, or
# for it to be in sync with the version for the upload action
version: "0.1.0"

inputs:
n_latest_uploads:
description: 'The number of previous wheel uploads to keep'
required: false
default: '5'
anaconda_nightly_upload_organization:
description: 'Anaconda Cloud organisation name to remove the wheels from'
required: false
default: scientific-python-nightly-wheels
anaconda_token:
description: 'Anaconda Cloud API token'
required: true

# TODO: Linux only for now, need to see how to add macOS support
runs:
using: "composite"
steps:
- name: Set up pixi
uses: prefix-dev/setup-pixi@ba3bb36eb2066252b2363392b7739741bb777659 # v0.8.1
with:
locked: true
cache: true
cache-write: ${{ github.event_name == 'push' && github.ref_name == 'main' }}
# Avoid post cleanup errors if action run multiple times
post-cleanup: false

- name: Install dependencies
shell: bash
run: |
sudo apt-get update && sudo apt-get install -y curl jq
agriyakhetarpal marked this conversation as resolved.
Show resolved Hide resolved

- name: Remove old wheels
shell: bash
env:
INPUT_N_LATEST_UPLOADS: ${{ inputs.n_latest_uploads }}
INPUT_ANACONDA_USER: ${{ inputs.anaconda_user }}
INPUT_ANACONDA_TOKEN: ${{ inputs.anaconda_token }}
run: |
pixi run remove_old_wheels.sh
agriyakhetarpal marked this conversation as resolved.
Show resolved Hide resolved
109 changes: 109 additions & 0 deletions remove_wheels.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#!/bin/bash

# fail on undefined variables
set -u
# Prevent pipe errors to be silenced
set -o pipefail
# Exit if any command exit as non-zero
set -e
# enable trace mode (print what it does)
set -x

# get the anaconda token from the github secrets
#
# this is to prevent accidental removals
echo "Getting anaconda token from github secrets..."

ANACONDA_USER="${INPUT_ANACONDA_USER}"
ANACONDA_TOKEN="${INPUT_ANACONDA_TOKEN}"
N_LATEST_UPLOADS="${INPUT_N_LATEST_UPLOADS}"

Comment on lines +17 to +20
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the Anaconda token that is authorised to maintainers of, say, a package X is restricted to only remove wheels for said package X, or whether it can remove all packages in the index. This is because while users of SPNW won't be affected since we handle the deletions, users with their own organisation with wheels for packages X, Y, and Z being uploaded to it would either:

  • run this action somewhere centrally to remove all (but the latest N) uploads of X, Y, and Z; or
  • they would want to run the deletions from separate repositories for X, Y, and Z and want to delete just X, Y, and Z at a time on each package's own accord and deletion schedule.

Which situation would be more plausible? If it is the latter, we'd have to provide another input in the action for a comma-separated string of packages to delete uploads for (and maybe * for deleting all packages?). To replace the existence of packages-ignore-from-cleanup.txt (which would exist only in this repository – see below), it might also make sense to include a whitelist input, too. If it's the former, it gets easier to implement, but users won't have fine-grained control on automations for what packages are being deleted.

P.S. This is all valid only if I'm not missing something about how the index and its permissions are structured :) Happy to receive others' thoughts!

Copy link
Author

@agriyakhetarpal agriyakhetarpal Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @tupui's suggestion in #95 (comment), users will need to run the action multiple times to remove old uploads for multiple packages. Hence, the second option of allowing per-package deletions is better, and an input for a list of packages to delete wheels for isn't required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if someone has an appropriately scoped token for their organisation, they can very well remove multiple wheels from the index at a time, and some users might want to do that. So, it might be necessary to mention in the documentation how the tokens work and pre-emptively warn about possible deletions in an admonition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, users have full control over their project. So they can add or remove any wheel they want on that project. People can just not add members to their project and they need to ask admins.


# if the ANACONDA_TOKEN is empty, exit with status -1
# this is to prevent accidental removals
if [ -z "${ANACONDA_TOKEN}" ]; then
echo "ANACONDA_TOKEN is empty, exiting..."
exit -1
fi

# if the N_LATEST_UPLOADS is empty, exit with status -1
# as this should be set in by the user and it is better
# to fail on this to signal a problem. i.e.,
# explicit is better than implicit.
if [ -z "${N_LATEST_UPLOADS}" ]; then
echo "N_LATEST_UPLOADS is empty, exiting..."
exit -1
fi


# Query the package index for packages
#
# TODO: should be possible to alter this, since separating the workflow
# into two steps, one for uploading and one for cleanup, should make it
# possible for users to manually trigger the cleanup step before/after the
# upload step has completed in their own repos instead of us having to do it.
#
# TODO: raises questions on how to moderate cleanups among multiple users
# operating on the same channel, but that might be a different issue.
curl https://raw.githubusercontent.com/scientific-python/upload-nightly-action/main/packages-ignore-from-cleanup.txt --output packages-ignore-from-cleanup.txt
Comment on lines +41 to +48
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see above for context and my question on this. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put some thought into it, and this can be included as an input in the action. There would be two cases:

  • An API token for package X that can delete older wheels for X
  • An API token for the organisation that can perform index-wide deletions of older wheels for packages

While the former would be recommended to limit deletion scopes, when the latter is used by the action users (as it is being done here for this repo), a whitelist_packages: input can be added to include a comma-separated list of packages ("openblas-libs" in our case).

Another reasonable message could be to add a warning: "Wheels for package X requested for deletion, but X is whitelisted. Please either remove it from the whitelist or try to delete a different package as needed.".

anaconda show "${ANACONDA_USER}" &> >(grep "${ANACONDA_USER}/") | \
awk '{print $1}' | \
sed 's|.*/||g' | \
grep -vf packages-ignore-from-cleanup.txt > package-names.txt

# Remove old uploads to save space
# Remove all _but_ the last ${N_LATEST_UPLOADS} package versions and
# remove all package versions older than 30 days.
if [ -s package-names.txt ]; then
threshold_date="$(date +%F -d '30 days ago')"

# Remember can't quote subshell as need to split on (space separated) token
for package_name in $(cat package-names.txt); do
# TODO: this outer loop can be removed when ready since there will be
# just one package to remove when the action is triggered manually from
# a user's (different) repo.

echo -e "\n# package: ${package_name}"
Comment on lines +60 to +66
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO item would also be resolved if we have a way forward with the questions above, since, if multiple packages are removed at once, this loop can stay; if we were to restrict the action to removing just one wheel at a time (which I don't think we should), then we could remove this. I think an action of the form:

- name: Remove old wheels
  uses: scientific-python/upload-nightly-action/remove-wheels@cantknowhashyet # 0.6.0
  with:
    n_latest_uploads: ${{ env.N_LATEST_UPLOADS }}
    anaconda_nightly_upload_organization: "your-organization"
    anaconda_nightly_token: ${{secrets.ANACONDA_TOKEN}}
    packages_to_remove: "mypackage1,mypackage2,mypackage3" # or just "*"
    # I could have suggested "all" here, but that breaks in the
    # case where "all" is also the name of a package that has
    # been uploaded (possible, albeit quite unlikely)

is better than specifying the step and authenticating multiple times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packages: ["a", "b", "c"] hopefully!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a list of strings sounds elegant! :)


curl --silent https://api.anaconda.org/package/"${ANACONDA_USER}/${package_name}" | \
jq -r '.releases[].version' > package-versions.txt
head --lines "-${N_LATEST_UPLOADS}" package-versions.txt > remove-package-versions.txt

for package_version in $(cat package-versions.txt); do
# c.f. https://github.com/Anaconda-Platform/anaconda-client/issues/682#issuecomment-1677283067
upload_date=$(curl --silent https://api.anaconda.org/release/"${ANACONDA_USER}/${package_name}/${package_version}" | \
jq -r '.distributions[].upload_time' | \
sort | \
tail --lines 1 | \
awk '{print $1}')

# check upload_date is YYYY-MM-DD formatted
# c.f. https://github.com/scientific-python/upload-nightly-action/issues/73
if [[ "${upload_date}" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]; then
if [[ "${upload_date}" < "${threshold_date}" ]]; then
echo "# ${ANACONDA_USER}/${package_name}/${package_version} last uploaded on ${upload_date}"
echo "${package_version}" >> remove-package-versions.txt
fi
else
echo "# ERROR: ${ANACONDA_USER}/${package_name}/${package_version} upload date ${upload_date} is not YYYY-MM-DD."
fi

done

if [ -s remove-package-versions.txt ]; then
# Guard against duplicate entries from packages over
# count and time thresholds
sort --output remove-package-versions.txt --unique remove-package-versions.txt

for package_version in $(cat remove-package-versions.txt); do
echo "# Removing ${ANACONDA_USER}/${package_name}/${package_version}"
anaconda --token "${ANACONDA_TOKEN}" remove \
--force \
"${ANACONDA_USER}/${package_name}/${package_version}"
done
fi

done
fi

echo "Finished removing old wheels except the last ${N_LATEST_UPLOADS} uploads from the ${ANACONDA_USER} channel."
Loading