Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use original commit version as output version after a merge #253

Closed
rliebz opened this issue Apr 10, 2019 · 6 comments · Fixed by #262
Closed

Use original commit version as output version after a merge #253

rliebz opened this issue Apr 10, 2019 · 6 comments · Fixed by #262

Comments

@rliebz
Copy link
Contributor

rliebz commented Apr 10, 2019

Currently, if the merge parameter is set for out, the version ref used will be the result of the merge commit. I believe the version ref should instead be the original commit.

Consider my poorly drawn diagram of a commit history:

*  m - Merge commit of x and o
|\
* |  x - Commit we're racing against
| *  o - Good commit that we want
|/
* p - Parent commit

And my poorly drawn pipeline:

git-repo --> [ run-tests ] --> git-repo --> [ bump-version-and-tag ] --> git-repo-bumped --> [ deploy ]

In this case, I've made some changes in commit o as part of my pipeline, and I'd like to tag and push commit o. To prevent pipeline failures if some commit x ends up in my target branch before the pipeline is finished, I've set merge: true, which merges the latest target branch here, combining x and o into merge commit m:

git-resource/assets/out

Lines 177 to 192 in 9129dd9

while true; do
echo "rebasing..."
git fetch push-target "refs/notes/*:refs/notes/*"
git pull --rebase=preserve push-target $branch
result="0"
push_with_result_check result
if [ "$result" = "0" ]; then
break
elif [ "$result" = "1" ]; then
exit 1
fi
echo "rebasing and trying again..."
done

And then ultimately uses the current commit, which would be commit m. This means the rest of my pipeline will now include commit x, despite it not being part of the relevant changeset:

git-resource/assets/out

Lines 199 to 202 in 9129dd9

jq -n "{
version: {ref: $(git rev-parse HEAD | jq -R .)},
metadata: $(git_metadata)
}" >&3

Because merging creates a non-linear history where commit o will still be intact, I think commit o should be used as the version ref. This would mean future parts of the pipeline can operate on the original change set without picking up racing commits in the target branch, while still being able to output the change into the target branch without races.

Edit: If this is something you're interested in, I'd be happy to contribute the change.

@vito
Copy link
Member

vito commented May 14, 2019

🤔 I see your point but I have an alternative suggestion.

I think it should emit the merge commit, because there are times where that becomes the artifact you care about for the rest of the pipeline, and if it doesn't return that commit there's really no way to propagate it along. To me, conceptually the put has created a new commit, so it should return it (or, with concourse/rfcs#24, return all the commits that it pushed).

It seems like you have a use case for wanting to push + merge but then continue along with the original commit for the rest of the pipeline. And this isn't possible right now because the passed constraint for downstream jobs is selecting the merged version rather than the original version.

We're working on a new algorithm (concourse/concourse#3602) for v6.0, and along with this new algorithm we have new passed constraint semantics. In v6.0, the x in get: x, passed: [y] will refer to a named output from y, instead of just referring to a resource.

This means you could do something like this:

jobs:
- name: merger
  plan:
  - get: my-repo
  - put: my-repo-merged # give it a different name
    resource: my-repo   # ...but use the same resource
    params: {merge: true}

- name: downstream-job-using-original-commit
  plan:
  - get: my-repo
    passed: [merger]

- name: downstream-job-using-merged-commit
  plan:
  - get: my-repo-merged
    passed: [merger]

In the meantime, you could also simulate this sort of thing today by having a duplicate resource definition that you push your merge commits to:

resources:
- name: my-repo
  type: git
  source: &my-repo-source
    uri: ...
    private_key: ...

- name: my-repo-merged
  type: git
  source: *my-repo-source

jobs:
- name: merger
  plan:
  - get: my-repo
  - put: my-repo-merged
    params: {merge: true}

- name: downstream-job-using-original-commit
  plan:
  - get: my-repo
    passed: [merger]

- name: downstream-job-using-merged-commit
  plan:
  - get: my-repo-merged
    passed: [merger]

Does that solve your problem?

@rliebz
Copy link
Contributor Author

rliebz commented May 15, 2019

Referring to named outputs that are not resources would solve the problem, probably in a more intuitive way overall, so 👍 for that in v6.0.

Unfortunately, I cannot get the desired behavior with the current system, because the changes that we're putting are coming from a task output rather than from the original resource:

jobs:
- name: pipeline
  plan:
  - get: my-repo
  - task: make-changes
    # outputs: [repo-changed]
    # ...
  - task: run-tests-and-stuff
    input_mapping: {repo: repo-changed}
    # ...
  - put: repo-merged
    params:
      repository: repo-changed

So in this case, there can be 3 different refs:

  • my-repo: The original commit we pulled
  • repo-changed: Whatever modifications we make during the pipeline
  • repo-merged: The merge commit, which may include our changes and other changes

I don't believe it is possible to refer to repo-changed in later jobs, which is the one we're interested in.

@vito
Copy link
Member

vito commented May 15, 2019

FWIW, you also would not be able to do that in v6.0 - it won't be able to fetch task outputs, only things that were a get or a put - the only thing v6.0 enables is specifying the name of the get/put to fetch in situations where there are multiple occurrences of the same resource in one build plan.

Wouldn't you need to push repo-changed to a branch or something in order to propagate it along on its own?

@rliebz
Copy link
Contributor Author

rliebz commented May 15, 2019

When a merge occurs, the ref of the pre-merge commit is also pushed as part of that, since a merge preserves the full history of commits. So that ref will be available on the remote git server, even though it won't be the most recent commit in that branch's history. That is to say, repo-merged must by definition contain repo-changed as part of its history. The same thing would not be possible with a rebase, though, since the original commit would be re-created with a new SHA.

The purpose of the merge for my use case is not to combine histories and leverage the combined history later on in the pipeline, but to ensure that the changes that were made as part of the pipeline make it to the upstream branch, even in the case of races.

Using the pre-merge commit lets a pipeline refer to the same version that we were interested in before pushing, because in the case of a merge commit, the version that ends up in the upstream branch is not the same version of the code that we started with.

So in an operation like this, where we pull down a repo and put it somewhere else:

jobs:
- name: get-and-put
  plan:
  - get: repo-one
  - put: repo-two
    params: 
      repository: repo-one
      merge: true

- name: get-merged
  plan:
  - get: repo-two
    passed: [get-and-put]

Getting repo-two would actually give us a merged combination of repo-one and repo-two's history. If this is all we do, we could just pull down repo-one instead, but if we wanted to do any work in between those two steps, there's no way to reference that intermediate step.

@vito
Copy link
Member

vito commented May 15, 2019

Ah, got it, that makes sense. I can see why you would want to propagate a newly made commit along. Maybe this could just be accomplished via a param?

put: repo-two
params:
  repository: repo-one
  merge: true
  returning: (merged | unmerged)

This does seem like it would be useful for situations where the merged commit isn't really the interesting part, because - as you said - it would potentially include commits that have nothing to do with the commit under test. For example if you were to tag the vetted code with a version number or something you would probably want to tag the unmerged commit, not the merge, so it doesn't include code changes that aren't actually in the version.

@rliebz
Copy link
Contributor Author

rliebz commented May 15, 2019

Having a param would solve for my use case and let us avoid making a backward incompatible change by preserving the original behavior when unspecified.

👍

rliebz added a commit to rliebz/git-resource that referenced this issue May 21, 2019
Setting the `returning` param to `unmerged` allows using the original
commit as the resource output when setting the `merge` param to `true`.

Resolves concourse#253
rliebz added a commit to rliebz/git-resource that referenced this issue May 21, 2019
Setting the `returning` param to `unmerged` allows using the original
commit as the resource output when setting the `merge` param to `true`.

Resolves concourse#253

Signed-off-by: Rob Liebowitz <[email protected]>
rliebz added a commit to rliebz/git-resource that referenced this issue May 29, 2019
Setting the `returning` param to `unmerged` allows using the original
commit as the resource output when setting the `merge` param to `true`.

Resolves concourse#253

Signed-off-by: Rob Liebowitz <[email protected]>
rliebz added a commit to rliebz/git-resource that referenced this issue May 29, 2019
Setting the `returning` param to `unmerged` allows using the original
commit as the resource output when setting the `merge` param to `true`.

Resolves concourse#253

Signed-off-by: Rob Liebowitz <[email protected]>
@vito vito closed this as completed in #262 May 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants