Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate job restrictions plugin to improve build/test/(trestle?) machine isolation #3379

Closed
Tracked by #3380
sxa opened this issue Feb 12, 2024 · 7 comments
Closed
Tracked by #3380
Assignees
Labels

Comments

@sxa
Copy link
Member

sxa commented Feb 12, 2024

Part of SSDF phase 3 PO 5.2

We installed the job restrictions plugin last year. This issue will cover testing it out and seeing if we can prevent jobs run by the test-triage team from being scheduled on production build machines. This should reduce the risks of any security issues on those machines. While we have improved isolation on Linux through the use of containerised systems, there is still a risk elsewhere, or if test jobs are scheuled on systems used for hosting build jobs.

@sxa sxa mentioned this issue Feb 12, 2024
21 tasks
@sxa sxa changed the title Evaluation job restrictions plugin to improve build/test/(trestle?) machine isolation Evaluate job restrictions plugin to improve build/test/(trestle?) machine isolation Feb 12, 2024
@sxa sxa added the security label May 3, 2024
@sxa sxa self-assigned this May 3, 2024
@sxa sxa added this to the 2024-05 (May) milestone May 3, 2024
@sxa
Copy link
Member Author

sxa commented Jun 11, 2024

Initial tests have not been fruitful - may be an issue with the expressions but not clear what the issues are.

@jiekang jiekang moved this from Todo to In Progress in 2024 2Q Adoptium Plan Jun 26, 2024
@sxa sxa modified the milestones: 2024-05 (May), 2024-08 (August) Jul 31, 2024
@smlambert smlambert moved this from Todo to In Progress in Adoptium Backlog Oct 2, 2024
@sxa
Copy link
Member Author

sxa commented Oct 4, 2024

Tried this again. I have tested this on one of the machines and it successfully restricts execution of anything other than build jobs on the machine. I will look at adding this criteria onto the AIX, ppc64le and windows x64 dockerhost_* and build_* machines for further testing (Doing it on Windows/x64 will also enhance our testing on ephemeral machines since there will be fewer real ones available!) This should give us good coverage in terms of seeing if there are any problems, although there may be cases where some jobs can run on "any" machine with e.g. build&&linux which we won't pick up with this testing (I've allowed centos7_docker_image_updater on the ppc64le ones for now but there may well be others) which should be shown up with these tests.

image

This has been applied to the following

I've kicked off https://ci.adoptium.net/job/build-scripts/job/openjdk11-pipeline/2723 to see how it goes and where the jobs end up. Individual build pipelines are as follows:

If a build fails to get scheduled you'll see something like this:

15:44:59  LABEL: docker-osuosl-ubuntu2004-ppc64le-1
[Pipeline] stage
[Pipeline] { (Queue)
[Pipeline] nodesByLabel
15:44:59  Found a total of 1 nodes with the 'docker-osuosl-ubuntu2004-ppc64le-1' label
[Pipeline] echo
15:44:59  dynamicAgents: [fyre]
[Pipeline] node
15:45:14  Still waiting to schedule task
[...]
15:45:14  ‘build-marist-rhel8-s390x-1’ doesn’t have label ‘docker-osuosl-ubuntu2004-ppc64le-1’
[...]

And then no further progress

@sxa
Copy link
Member Author

sxa commented Oct 4, 2024

@sxa
Copy link
Member Author

sxa commented Nov 5, 2024

October release is complete, so this can now been rolled out to other machines. Expression should be:

@sxa
Copy link
Member Author

sxa commented Nov 5, 2024

If you hit an issue with the plugin rejecting your job you will typically see something like this in the log (similar to if you specify an incorrect label that cannot be matched):

16:51:24  ‘build-azure-solaris10-x64-1’ doesn’t have label ‘build-azure-win2022-x64-1’
16:51:24  ‘build-azure-win2022-x64-2’ doesn’t have label ‘build-azure-win2022-x64-1’
16:51:24  ‘build-azure-win2022-x64-3’ doesn’t have label ‘build-azure-win2022-x64-1’
16:51:24  ‘build-digitalocean-centos69-x64-2’ doesn’t have label ‘build-azure-win2022-x64-1’
16:51:24  ‘build-equinix-ubuntu1604-armv7l-2’ doesn’t have label ‘build-azure-win2022-x64-1’
16:51:24  ‘build-ibmcloud-win2022-x64-1’ doesn’t have label ‘build-azure-win2022-x64-1’
16:51:24  ‘build-linux-x64-dd9180’ doesn’t have label ‘build-azure-win2022-x64-1’
[...] Repeated for every agent on jenkins.

@sxa
Copy link
Member Author

sxa commented Nov 13, 2024

Closing this as it has now been implemented and appears to be working. Any problems can be raised as separate issues.

@sxa sxa closed this as completed Nov 13, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in 2024 4Q Adoptium Plan Nov 13, 2024
@sxa
Copy link
Member Author

sxa commented Nov 20, 2024

ci.role.test has been removed from all active build machines with the exception of the one AIX machine that we use for JDK23+ as we have no others at present. build-osuosl-aix72-ppc64-3 in order to avoid any confusion with scheduling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

1 participant