Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Validate reconciler works correctly #13648

Closed
eseliger opened this issue Sep 4, 2020 · 8 comments
Closed

Validate reconciler works correctly #13648

eseliger opened this issue Sep 4, 2020 · 8 comments
Assignees
Labels
batch-changes Issues related to Batch Changes bug An error, flaw or fault that produces an incorrect or unexpected result, or behavior. estimate/1d planned/3.20
Milestone

Comments

@eseliger
Copy link
Member

eseliger commented Sep 4, 2020

I've just ran a test campaign that creates changesets on GitLab, GitHub and Bitbucket Server, and the behavior seems odd to me.

  1. The BBS PR has been created successfully.
  2. The GitHub PR switched to "failed" correctly, because I reused a branchname.
  3. The GitLab PR stayed queued for quite a long time (few minutes)
  4. The GitHub PR keeps getting retried infinitely (according to the logs)
  5. The GitLab PR failed pushing the commit, now has an error message, but is still "processing" and continuously retries as well.

Screenshot:
image

Logs:
18:13:21                      repo-updater | INFO Dequeued record for processing, name: , id: 39
18:13:21                      repo-updater | INFO Reconciler processing changeset, changeset: 39, action: publish
18:13:21                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:13:21                      repo-updater | INFO Dequeued record for processing, name: , id: 37
18:13:21                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:13:21                      repo-updater | INFO Reconciler processing changeset, changeset: 37, action: publish
18:13:21                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:13:21                      repo-updater | INFO Handled record, name: , id: 38
18:13:21                         gitserver | INFO command ran successfully, prefix: 1 gitlab.sgdev.org/sourcegraph/automation-testing , command: git init, duration: 13.465183ms, output: Initialized empty Git repository in /Users/erik/.sourcegraph/repos/.tmp/patch-repo-696411267/.git/
18:13:21                         gitserver | INFO command ran successfully, prefix: 2 bitbucket.sgdev.org/SOUR/automation-testing , command: git init, duration: 10.256229ms, output: Initialized empty Git repository in /Users/erik/.sourcegraph/repos/.tmp/patch-repo-153787398/.git/
18:13:21                         gitserver | INFO command ran successfully, prefix: 1 gitlab.sgdev.org/sourcegraph/automation-testing , command: git reset -q 3b79a5d479d2af9cfe91e0aad4e9dddca7278150, duration: 11.265111ms, output: 
18:13:21                         gitserver | INFO command ran successfully, prefix: 2 bitbucket.sgdev.org/SOUR/automation-testing , command: git reset -q 1e256a405ec07c904f0a4e681c8136cc9fca3b87, duration: 9.065984ms, output: 
18:13:21                         gitserver | INFO command ran successfully, prefix: 1 gitlab.sgdev.org/sourcegraph/automation-testing , command: git apply --cached -p0, duration: 8.577269ms, output: 
18:13:21                         gitserver | INFO command ran successfully, prefix: 2 bitbucket.sgdev.org/SOUR/automation-testing , command: git apply --cached -p0, duration: 6.84489ms, output: 
18:13:21                         gitserver | INFO command ran successfully, prefix: 2 bitbucket.sgdev.org/SOUR/automation-testing , command: git commit -m Append Hello World to all README.md files, duration: 15.782106ms, output: [master 945ea51] Append Hello World to all README.md files
18:13:21                         gitserver |  1 file changed, 1 insertion(+)
18:13:21                         gitserver | INFO command ran successfully, prefix: 1 gitlab.sgdev.org/sourcegraph/automation-testing , command: git commit -m Append Hello World to all README.md files, duration: 18.903035ms, output: [master 8b15bcc] Append Hello World to all README.md files
18:13:21                         gitserver |  1 file changed, 1 insertion(+)
18:13:23                         gitserver | INFO command ran successfully, prefix: 2 bitbucket.sgdev.org/SOUR/automation-testing , command: git push --force https://milton:[email protected]/scm/sour/automation-testing.git 945ea515c806f5665998b95a7dc09f23061e1773:refs/heads/hello-world, duration: 2.428522252s, output: remote: 
18:13:23                         gitserver | remote: Create pull request for hello-world:        
18:13:23                         gitserver | remote:   https://bitbucket.sgdev.org/projects/SOUR/repos/automation-testing/compare/commits?sourceBranch=refs/heads/hello-world        
18:13:23                         gitserver | remote: 
18:13:23                         gitserver | To https://bitbucket.sgdev.org/scm/sour/automation-testing.git
18:13:23                         gitserver |  * [new branch]      945ea515c806f5665998b95a7dc09f23061e1773 -> hello-world
18:13:23                         gitserver | INFO command ran successfully, prefix: 2 bitbucket.sgdev.org/SOUR/automation-testing , command: git update-ref -- refs/heads/hello-world 945ea515c806f5665998b95a7dc09f23061e1773, duration: 7.567189ms, output: 
18:13:26                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:13:26                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:13:26                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:13:26                      repo-updater | INFO Handled record, name: , id: 38
18:13:26                      repo-updater | INFO Marked record as complete, name: , id: 37
18:13:26                      repo-updater | INFO Handled record, name: , id: 37
18:13:33                    syntect_server | 2020/09/04 16:13:33 request / http://127.0.0.1:33043
18:13:34                    syntect_server | 2020/09/04 16:13:34 request / http://127.0.0.1:33043
18:13:36                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:13:36                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:13:36                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:13:36                      repo-updater | INFO Handled record, name: , id: 38
18:13:38                    syntect_server | 2020/09/04 16:13:38 request / http://127.0.0.1:33043
18:13:38                    syntect_server | 2020/09/04 16:13:38 request / http://127.0.0.1:33043
18:13:42                    syntect_server | 2020/09/04 16:13:42 request / http://127.0.0.1:42419
18:13:42                    syntect_server | 2020/09/04 16:13:42 request / http://127.0.0.1:42419
18:13:46                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:13:46                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:13:46                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:13:46                      repo-updater | INFO Handled record, name: , id: 38
18:13:56                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:13:56                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:13:56                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:13:56                      repo-updater | INFO Handled record, name: , id: 38
18:14:01                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:14:01                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:14:01                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:14:01                      repo-updater | INFO Handled record, name: , id: 38
18:14:06                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:14:06                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:14:06                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:14:06                      repo-updater | INFO Handled record, name: , id: 38
18:14:11                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:14:11                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:14:11                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:14:11                      repo-updater | INFO Handled record, name: , id: 38
18:14:21                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:14:21                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:14:21                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:14:21                      repo-updater | INFO Handled record, name: , id: 38
18:14:31                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:14:31                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:14:31                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:14:31                      repo-updater | INFO Handled record, name: , id: 38
18:14:41                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:14:41                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:14:41                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:14:41                      repo-updater | INFO Handled record, name: , id: 38
18:14:51                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:14:51                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:14:51                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:14:51                      repo-updater | INFO Handled record, name: , id: 38
18:14:56                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:14:56                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:14:56                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:14:56                      repo-updater | INFO Handled record, name: , id: 38
18:15:04                         gitserver | INFO command failed, prefix: 1 gitlab.sgdev.org/sourcegraph/automation-testing , command: git push --force https://git:[email protected]/sourcegraph/automation-testing.git 8b15bcc5ef306ce93ffb6ee15e75a6e59b998505:refs/heads/hello-world, duration: 1m42.896111205s, error: exit status 1, output: error: RPC failed; HTTP 524 curl 22 The requested URL returned error: 524
18:15:04                         gitserver | fatal: the remote end hung up unexpectedly
18:15:04                         gitserver | fatal: the remote end hung up unexpectedly
18:15:04                         gitserver | Everything up-to-date
18:15:04                         gitserver | ERROR Failed to push, ref: refs/heads/hello-world, commit: 8b15bcc5ef306ce93ffb6ee15e75a6e59b998505, output: error: RPC failed; HTTP 524 curl 22 The requested URL returned error: 524
18:15:04                         gitserver | fatal: the remote end hung up unexpectedly
18:15:04                         gitserver | fatal: the remote end hung up unexpectedly
18:15:04                         gitserver | Everything up-to-date
18:15:04                      repo-updater | WARN Marked record as errored, name: , id: 39, err: creating commit from patch for repository "gitlab.sgdev.org/sourcegraph/automation-testing": gitserver: pushing ref: exit status 1
18:15:04                      repo-updater | ```
18:15:04                      repo-updater | $ git push --force https://git:<redacted>@gitlab.sgdev.org/sourcegraph/automation-testing.git 8b15bcc5ef306ce93ffb6ee15e75a6e59b998505:refs/heads/hello-world
18:15:04                      repo-updater | error: RPC failed; HTTP 524 curl 22 The requested URL returned error: 524
18:15:04                      repo-updater | fatal: the remote end hung up unexpectedly
18:15:04                      repo-updater | fatal: the remote end hung up unexpectedly
18:15:04                      repo-updater | Everything up-to-date
18:15:04                      repo-updater | ```
18:15:04                      repo-updater | INFO Handled record, name: , id: 39
18:15:06                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:15:06                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:15:06                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:15:06                      repo-updater | INFO Handled record, name: , id: 38
18:15:11                               web | ℹ 「wdm」: Compiling...
18:15:11                      repo-updater | INFO Dequeued record for processing, name: , id: 39
18:15:11                      repo-updater | INFO Reconciler processing changeset, changeset: 39, action: publish
18:15:11                      repo-updater | INFO Dequeued record for processing, name: , id: 38
18:15:11                      repo-updater | INFO Reconciler processing changeset, changeset: 38, action: publish
18:15:11                      repo-updater | WARN Marked record as errored, name: , id: 38, err: cannot create changeset on the same branch in multiple campaigns
18:15:11                      repo-updater | INFO Handled record, name: , id: 38
18:15:11                         gitserver | INFO command ran successfully, prefix: 3 gitlab.sgdev.org/sourcegraph/automation-testing , command: git init, duration: 12.603589ms, output: Initialized empty Git repository in /Users/erik/.sourcegraph/repos/.tmp/patch-repo-142971176/.git/
18:15:11                         gitserver | INFO command ran successfully, prefix: 3 gitlab.sgdev.org/sourcegraph/automation-testing , command: git reset -q 3b79a5d479d2af9cfe91e0aad4e9dddca7278150, duration: 9.087691ms, output: 
18:15:11                         gitserver | INFO command ran successfully, prefix: 3 gitlab.sgdev.org/sourcegraph/automation-testing , command: git apply --cached -p0, duration: 8.225385ms, output: 
18:15:11                         gitserver | INFO command ran successfully, prefix: 3 gitlab.sgdev.org/sourcegraph/automation-testing , command: git commit -m Append Hello World to all README.md files, duration: 17.488747ms, output: [master 8b15bcc] Append Hello World to all README.md files
18:15:11                         gitserver |  1 file changed, 1 insertion(+)
@eseliger eseliger added bug An error, flaw or fault that produces an incorrect or unexpected result, or behavior. batch-changes Issues related to Batch Changes labels Sep 4, 2020
@eseliger eseliger added this to the 3.20 milestone Sep 4, 2020
@LawnGnome LawnGnome self-assigned this Sep 9, 2020
@LawnGnome
Copy link
Contributor

LawnGnome commented Sep 11, 2020

I've been able to reproduce at least part of this:

  1. Publish a campaign that has a GitLab MR.
  2. Change the campaign spec to match a branch that already exists on the GitLab repo.
  3. Re-apply the campaign.

💥

I managed to get this crash from the backend:

18:17:00                          frontend |     id
18:17:00                          frontend |     namespaceName
18:17:00                          frontend |     url
18:17:00                          frontend | }
18:17:00                          frontend | VARIABLES
18:17:00                          frontend | ---------
18:17:00                          frontend | map[campaignSpec:Q2FtcGFpZ25TcGVjOiJBNUprS1hhZFZYVCI=]
18:17:00                      repo-updater | INFO Reconciler processing changeset, changeset: 42, action: close
18:17:00                      repo-updater | INFO Reconciler processing changeset, changeset: 41, action: close
18:17:00                      repo-updater | INFO Reconciler processing changeset, changeset: 40, action: close
18:17:00                      repo-updater | panic: runtime error: invalid memory address or nil pointer dereference
18:17:00                      repo-updater | [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x10abc99]
18:17:00                      repo-updater | goroutine 9627 [running]:
18:17:00                      repo-updater | github.com/sourcegraph/sourcegraph/cmd/repo-updater/repos.(*GitLabSource).CloseChangeset(0xc00061af40, 0x20db620, 0xc002155a10, 0xc002006780, 0x0, 0x0)
18:17:00                      repo-updater | 	/home/adam/trees/sourcegraph/sourcegraph/cmd/repo-updater/repos/gitlab.go:347 +0x179
18:17:00                      repo-updater | github.com/sourcegraph/sourcegraph/enterprise/internal/campaigns.(*reconciler).closeChangeset(0xc0014014a0, 0x20db620, 0xc002155a10, 0xc001730950, 0xc002456000, 0x0, 0x0)
18:17:00                      repo-updater | 	/home/adam/trees/sourcegraph/sourcegraph/enterprise/internal/campaigns/reconciler.go:285 +0x3c2
18:17:00                      repo-updater | github.com/sourcegraph/sourcegraph/enterprise/internal/campaigns.(*reconciler).process(0xc0014014a0, 0x20db620, 0xc002155a10, 0xc001730950, 0xc002456000, 0x0, 0x0)
18:17:00                      repo-updater | 	/home/adam/trees/sourcegraph/sourcegraph/enterprise/internal/campaigns/reconciler.go:81 +0x670
18:17:00                      repo-updater | github.com/sourcegraph/sourcegraph/enterprise/internal/campaigns.(*reconciler).HandlerFunc.func1(0x20db620, 0xc002155a10, 0x20e4d00, 0xc001442780, 0x20b54a0, 0xc002456000, 0x0, 0x0)
18:17:00                      repo-updater | 	/home/adam/trees/sourcegraph/sourcegraph/enterprise/internal/campaigns/reconciler.go:44 +0x105
18:17:00                      repo-updater | github.com/sourcegraph/sourcegraph/internal/workerutil/dbworker.HandlerFunc.Handle(0xc00051b970, 0x20db620, 0xc002155a10, 0x20e4d00, 0xc001442780, 0x20b54a0, 0xc002456000, 0x0, 0x0)
18:17:00                      repo-updater | 	/home/adam/trees/sourcegraph/sourcegraph/internal/workerutil/dbworker/handler.go:31 +0x7a
18:17:00                      repo-updater | github.com/sourcegraph/sourcegraph/internal/workerutil/dbworker.(*handlerShim).Handle(0xc00051bb70, 0x20db620, 0xc002155a10, 0x20dbb20, 0xc001fbadc0, 0x20b54a0, 0xc002456000, 0x0, 0x0)
18:17:00                      repo-updater | 	/home/adam/trees/sourcegraph/sourcegraph/internal/workerutil/dbworker/handler_shim.go:25 +0xbc
18:17:00                      repo-updater | github.com/sourcegraph/sourcegraph/internal/workerutil.(*Worker).handle(0xc0012c0870, 0x20dbb20, 0xc001fbadc0, 0x20b54a0, 0xc002456000, 0xe, 0x0)
18:17:00                      repo-updater | 	/home/adam/trees/sourcegraph/sourcegraph/internal/workerutil/worker.go:190 +0x2ab
18:17:00                      repo-updater | github.com/sourcegraph/sourcegraph/internal/workerutil.(*Worker).dequeueAndHandle.func2(0xc0012c0870, 0x20b54a0, 0xc002456000, 0x20dbb20, 0xc001fbadc0)
18:17:00                      repo-updater | 	/home/adam/trees/sourcegraph/sourcegraph/internal/workerutil/worker.go:164 +0xd4
18:17:00                      repo-updater | created by github.com/sourcegraph/sourcegraph/internal/workerutil.(*Worker).dequeueAndHandle
18:17:00                      repo-updater | 	/home/adam/trees/sourcegraph/sourcegraph/internal/workerutil/worker.go:154 +0x83e

After restarting, the reconciler doesn't crash, but does leave the GitLab MR in the processing state. Manually kicking it back to queued results in the MR being reconciled as expected. This would imply that the crash isn't rolling back the transaction.

More to come tomorrow.

@mrnugget
Copy link
Contributor

but does leave the GitLab MR in the processing state.

I think that's when the reset-stalled-job/MaxStalledAge mechanism of workerutils should kick in.

This would imply that the crash isn't rolling back the transaction.

I'm sure you've found this by now, but the dequeue-and-set-to-processing happens outside the transaction:

https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/internal/workerutil/dbworker/store/store.go#L259-275

So if a worker crashes while it's processing a record the transaction that modified the record is rolled back (well, it's not committed), but the status is still set to processing. The new worker process that you boot up should reset the stalled jobs after MaxStalledAge

@mrnugget
Copy link
Contributor

btw. continually retrying is a feature, not a bug, see https://github.com/sourcegraph/sourcegraph/issues/12700#issuecomment-685419544

mrnugget referenced this issue Sep 14, 2020
This fixes the crash in this stacktrace https://github.com/sourcegraph/sourcegraph/issues/13648#issuecomment-690817025

It was only a problem in `CloseChangesets`, but I added the check to the
FakeChangesetSource in every call.
mrnugget referenced this issue Sep 14, 2020
…13795)

This fixes the crash in this stacktrace https://github.com/sourcegraph/sourcegraph/issues/13648#issuecomment-690817025

It was only a problem in `CloseChangesets`, but I added the check to the
FakeChangesetSource in every call.
@davejrt
Copy link
Contributor

davejrt commented Sep 14, 2020

Dear all,

This is your release captain speaking. 🚂🚂🚂

Branch cut for the 3.20 release is scheduled for tomorrow.

Is this issue / PR going to make it in time? Please change the milestone accordingly.
When in doubt, reach out!

Thank you

@LawnGnome
Copy link
Contributor

To be clear, this didn't restart after StalledMaxAge, so I'm going to investigate this further in 3.21, although the immediate issue was fixed (or at least obscured) by #13790.

@LawnGnome
Copy link
Contributor

This hasn't been observed again, so deprioritising for now.

@eseliger
Copy link
Member Author

Given that we had a couple of weeks to find this "by accident" and we didn't, I guess we can close this ticket? @sourcegraph/campaigns

@chrispine
Copy link
Contributor

Yes, that's fine with me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
batch-changes Issues related to Batch Changes bug An error, flaw or fault that produces an incorrect or unexpected result, or behavior. estimate/1d planned/3.20
Projects
None yet
Development

No branches or pull requests

5 participants