Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 1.x changes can break builds on main #2350

Open
andrross opened this issue Mar 4, 2022 · 8 comments
Open

[BUG] 1.x changes can break builds on main #2350

andrross opened this issue Mar 4, 2022 · 8 comments
Labels
bug Something isn't working discuss Issues intended to help drive brainstorming and decision making

Comments

@andrross
Copy link
Member

andrross commented Mar 4, 2022

The OpenSearch build process (i.e. ./gradlew check) runs backward compatibility tests against the previous major version. For example, the main branch is currently building version 2.0, so the bwc tests check out the latest commit on the 1.x branch to build the opensearch engine artifacts to run tests that create a cluster with mixed versions and ensure compatibility. (Details on the branching strategy can be found here). There have been two recent examples (#2143 #2334) where a change committed to 1.x caused tests to break on main. I think the basic problem is that 1.x must maintain forward compatibility with main, but there are not automated tests that verify such compatibility as a part of the PR process. This means breaking changes can get committed and result in a bit of scramble to track down the root cause. How can we catch these failures as a part of the 1.x PR process before merging?

@andrross andrross added bug Something isn't working untriaged labels Mar 4, 2022
@dblock
Copy link
Member

dblock commented Mar 7, 2022

Is there a set of tests that can be run on branches (everywhere except main) that can catch these on PR?

@dblock dblock added discuss Issues intended to help drive brainstorming and decision making and removed untriaged labels Mar 7, 2022
@andrross
Copy link
Member Author

Using terms from the branching doc, I think the next minor version needs to run the backward compatibility tests of the next major version. That would mean today that 1.x PRs would run the bwc tests from main.

However, a common cause of this problem is that a new field is added to a transport protocol message on main, with the guard that it only expects to see that field when talking to another main version server. That feature is then backported to 1.x, which means that the guard in the main code needs to change the version check since the 1.x version now uses the field. Essentially the backport requires 2 commits to 2 separate branches, so the automated checks would likely always fail in this case.

@peternied
Copy link
Member

Tests that make sure we are backward compatible are useful and should be enforced, these inherently rely on different branches of OpenSearch. Broadly speaking, this is a moving target.

Contributors who make a change are only responsible for the state of the world when they created their PR. There are two cases that come to mind that encroach on this 1) a version bump happens or 2) another PR is merged with a BWC incompatible change. If one of these cases occur the contributor needs to diagnose the source of the failure, wait for the fix to be merged, then merge from main.


It looks like we might have some of the hooks to enable this behavior or something close to it. Maybe the gradle check workflow can be updated to mitigate this case, @andrross can you help find someone to investigate this more deeply?

 * We use a time based approach to make the bwc versions built deterministic and compatible with the current hash.
 * Most of the time we want to test against latest, but when running delayed exhaustive tests or wanting
 * reproducible builds we want this to be deterministic by using a hash that was the latest when the current
 * commit was made.

private String maybeAlignedRefSpec(Logger logger, String defaultRefSpec) {
if (providerFactory.systemProperty("bwc.checkout.align").isPresent() == false) {

@andrross
Copy link
Member Author

@peternied Thanks! This looks promising. I did some basic testing and created an issue to discuss specifically enabling this: opensearch-project/opensearch-build-libraries#420

@rishabh6788
Copy link
Contributor

The feature to add bwc.checkout.align=true parameter to gradle-check jobs triggered by pull_requests have been added and running in production.

@peternied
Copy link
Member

@rishabh6788 Have we been able to confirm that PRs haven't needed to rebase when a version update is rolled out in another branch?

@rishabh6788
Copy link
Contributor

I believe we will be able to confirm this once 2.16 branch is cut from 2.x release and bumped to 2.17. The 2.16 branch cut sholud happen by the end of this week.

@peternied
Copy link
Member

Cool, once its confirmed, lets close this issue - thanks @rishabh6788

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discuss Issues intended to help drive brainstorming and decision making
Projects
None yet
Development

No branches or pull requests

4 participants