Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

RFE | Re-run the CI before officially merging a PR #3974

Open
fidencio opened this issue Sep 16, 2021 · 10 comments
Open

RFE | Re-run the CI before officially merging a PR #3974

fidencio opened this issue Sep 16, 2021 · 10 comments
Labels
area/ci enhancement Improvement to an existing feature

Comments

@fidencio
Copy link
Member

We've hit issues in the past where a PR "A" is opened against one state of our tree, all tests pass, but due to another PR "B" being merged earlier, tests break after "A" getting merged.

A way to solve this would be to ensure that before merging a PR to our tree we run the tests again with this PR "merged" atop of the current HEAD.

Examples of such issues are:

This is something that will take advantage and also provide more safeguards to the green ci effort.

I'm not sure whether GitHub provides something we can easily use, but that's open for discussion.

cc @Jakob-Naucke @bergwolf @chavafg @GabyCT @ariel-adam @wainersm

Ah, by the way, CRI-O does something like that already.

@fidencio fidencio added enhancement Improvement to an existing feature needs-review Needs to be assessed by the team. labels Sep 16, 2021
@fidencio fidencio changed the title Re-run the CI before officially merging a PR RFE | Re-run the CI before officially merging a PR Sep 16, 2021
@Jakob-Naucke
Copy link
Member

Jakob-Naucke commented Sep 16, 2021

Ah, by the way, CRI-O does something like that already.

By CRI-O, you mean our CRI-O Jenkins CIs?

This sounds sweet at first glance, but I suppose we have to die one of two deaths:

  • You click "merge", the actual merge happens 30 minutes later because tests have to be run (during which all other merges would have to be locked? because now we're in possible conflict again)
  • For every update to the base branch, tests are re-run (CI load, costs, and energy consumption go brrr)

@fidencio
Copy link
Member Author

Ah, by the way, CRI-O does something like that already.

By CRI-O, you mean our CRI-O Jenkins CIs?

No, I mean CRI-O does that with the PRs going to their project.

This sounds sweet at first glance, but I suppose we have to die one of two deaths:

  • You click "merge", the actual merge happens 30 minutes later because tests have to be run (during which all other merges would have to be locked? because now we're in possible conflict again)
  • For every update to the base branch, tests are re-run (CI load, costs, and energy consumption go brrr)

Yep, but I think it's still worth the cost when we consider the time of n developers looking for a regression at some given point.

@GabyCT
Copy link
Contributor

GabyCT commented Sep 17, 2021

I am not sure about how much that will increase our costs and how limited we are in terms of budget
^ @chavafg
As everytime that we are running the CI we are spending and this and the daily main CI will impact

@Jakob-Naucke
Copy link
Member

@GabyCT Well, I mean, if we follow the approach of

  • You click "merge", the actual merge happens 30 minutes later because tests have to be run (during which all other merges would have to be locked? because now we're in possible conflict again)

I guess it won't be all that bad (how many PR tests are we running? how many PRs are we merging? this is the difference we'd be looking at). The other approach... is going to be more. But we don't have to follow it.

@GabyCT
Copy link
Contributor

GabyCT commented Sep 17, 2021

Maybe something that we can do first is to create the baseline jobs for the CI and then @chavafg can analyze how much impact in the cost we have and then we can see how much will this have...everytime that a CI job is trigger is less money that we have

@fidencio
Copy link
Member Author

Maybe something that we can do first is to create the baseline jobs for the CI and then @chavafg can analyze how much impact in the cost we have and then we can see how much will this have...everytime that a CI job is trigger is less money that we have

Totally, I don't think we should do this with the current state of our CI, but I think this will help to keep baseline green once we have this effort running.

@wainersm
Copy link
Contributor

We've hit issues in the past where a PR "A" is opened against one state of our tree, all tests pass, but due to another PR "B" being merged earlier, tests break after "A" getting merged.

Thanks for bringing it up, this is a legit problem. In your experience, @fidencio, does it occurs very often on our project; or the two examples are just unfortunate ones (maybe #2655 just show how unlucky @Jakob-Naucke is :)

I'm not sure whether GitHub provides something we can easily use, but that's open for discussion.

I don't know if GitHub provides such as feature either. I never worked on a project that run CI tests at the moment someone hits the "merge" button. What seems more common is to run jobs after the merge actually.

In the "green CI" proposal we have the "maintainer" role. If we run the baseline jobs after each merge then the maintainers will immediately notice jobs that became red. They can immediately action on it (e.g. revert patches); it doesn't prevent but rather mitigate the problem. On another hand there is this concern of consuming our CI re$ources. That's why I started asking you @fidencio , if this problem happens often enough to justify the extra costs ...

All in all, I tend to work with "baby-steps" in mind. So I would implement the initial "green CI" proposal as is and observe over time the areas that could (and must!) be improved.

Notes: regarding the worries about our CI costs... my recent initiates to ease the execution of the jobs on developer's local machine can help us to save some good money. After this #3965 I think the Vagrantfile will be in good shape for broaden use; and I will continue to work on #3751 . After merging those two PR I should be probably advocating for our developers to stop running the CI jobs for the sake of development.

@fidencio
Copy link
Member Author

We've hit issues in the past where a PR "A" is opened against one state of our tree, all tests pass, but due to another PR "B" being merged earlier, tests break after "A" getting merged.

Thanks for bringing it up, this is a legit problem. In your experience, @fidencio, does it occurs very often on our project; or the two examples are just unfortunate ones (maybe #2655 just show how unlucky @Jakob-Naucke is :)

This doesn't happen often, but I we haven't had many refactors happening often either.

I'm not sure whether GitHub provides something we can easily use, but that's open for discussion.

I don't know if GitHub provides such as feature either. I never worked on a project that run CI tests at the moment someone hits the "merge" button. What seems more common is to run jobs after the merge actually.

You've been working with OpenShift CI and I'm pretty sure you can have a better answer for that than I do.
What I see on CRI-O is that merges are not done by maintainers, but by a bot. So, there they enforce the need of an /approve and a /lgtm, once those are there and all the tests are green, their bot will merge the code, and at this time the CI is triggered again.

Do you think we could leverage the infra they use in our favour? :-)
Mainly considering that kata-containers is now (even indirectly) part of OpenShift?

In the "green CI" proposal we have the "maintainer" role. If we run the baseline jobs after each merge then the maintainers will immediately notice jobs that became red. They can immediately action on it (e.g. revert patches); it doesn't prevent but rather mitigate the problem. On another hand there is this concern of consuming our CI re$ources. That's why I started asking you @fidencio , if this problem happens often enough to justify the extra costs ...

I'd be interested to know the costs and then judge whether it's worth it or not. :-)

All in all, I tend to work with "baby-steps" in mind. So I would implement the iitial "green CI" proposal as is and observe over time the areas that could (and must!) be improved.

Yep, and in this case I think having the green-ci may be a pre-req for this one.

@c3d c3d removed the needs-review Needs to be assessed by the team. label Sep 28, 2021
@c3d
Copy link
Member

c3d commented Sep 28, 2021

Do we have a project for the "green CI" yet? If so, this could be linked to it.

@c3d c3d added the area/ci label Sep 28, 2021
@Jakob-Naucke
Copy link
Member

Thanks for bringing it up, this is a legit problem. In your experience, @fidencio, does it occurs very often on our project; or the two examples are just unfortunate ones (maybe #2655 just show how unlucky @Jakob-Naucke is :)

@wainersm maybe I'm just extra unlucky but we saw this again in kata-containers/kata-containers#2736.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/ci enhancement Improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

5 participants