Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Gateway bucket shadowing crashes with error 'panic: parent id "xxx-xxxx" is missing' #959

Closed
yonahforst opened this issue Jul 2, 2015 · 24 comments
Assignees
Labels
Milestone

Comments

@yonahforst
Copy link

Sync Gateway crashes with error 'panic: parent id "xxx-xxxx" is missing'

To trigger this error I have the revs_limit set to 3 and I'm updating a document once a second.

Full crash log here: https://gist.github.com/joshblour/59058ac4e8f9a5dd900d

You can reproduce it with my modified version of helloCBL here: https://github.com/joshblour/couchbase-lite-tutorial-ios/tree/sync_rapid_updates
I've added continuous push & pull replications and a timer to update the document once a second.

I'm running sync_gateway version 1.1.0-26 on my local machine with the following config:

{
        "interface":":5984",
        "adminInterface":":5985",
        "log":["REST"],
        "verbose":"true",
        "databases":{
                "sync_gateway" :{
                        "server":"http://localhost:8091",
                        "revs_limit" : 3,
            "users": { "GUEST": { "disabled": false, "admin_channels": ["*"] } },
                        "bucket": "sync_gateway",
                        "shadow": {
                                "server": "http://localhost:8091",
                                "bucket": "default"
                        },
                        "sync":`function(doc){
                                 if (doc.deleted) {
                                        channel("deleted")
                                } else if (doc.isExpired) {
                                        channel("expired")
                                } else {
                                        channel("active")
                                }
                        }`
                }
        }
}
@adamcfraser
Copy link
Collaborator

Thanks for the detailed report @joshblour - we'll look into this.

@tleyden
Copy link
Contributor

tleyden commented Jul 2, 2015

Here's the code where that panic is coming from:

// Records a revision in a RevTree.
func (tree RevTree) addRevision(info RevInfo) {
    revid := info.ID
    if revid == "" {
        panic("empty revid is illegal")
    }
    if tree.contains(revid) {
        panic(fmt.Sprintf("already contains rev %q", revid))
    }
    parent := info.Parent
    if parent != "" && !tree.contains(parent) {
        panic(fmt.Sprintf("parent id %q is missing", parent))  // <----------
    }
    tree[revid] = &info
}

@tleyden
Copy link
Contributor

tleyden commented Jul 2, 2015

Possible related issue: #807

@tleyden tleyden self-assigned this Jul 2, 2015
@tleyden
Copy link
Contributor

tleyden commented Jul 2, 2015

@joshblour how long did you need to run it before reproducing the error?

@tleyden
Copy link
Contributor

tleyden commented Jul 2, 2015

@joshblour nevermind, from the logs I can see that you had it running for about 5 mins. Thanks for the posting the logs!

@tleyden
Copy link
Contributor

tleyden commented Jul 2, 2015

I was able to reproduce with the following changes:

  • Set GOMAXPROCS to 4 (on my machine, it was using 8 by default)
  • Update HCAppDelegate.m to [NSTimer scheduledTimerWithTimeInterval:0.1 (change timer to 0.1 seconds from 1 second)

Crash logs

@tleyden
Copy link
Contributor

tleyden commented Jul 2, 2015

After initially reproducing it, now it's crashing on Sync Gateway startup:

Crash Logs SG startup

@tleyden tleyden changed the title Sync Gateway crashes with error 'panic: parent id "xxx-xxxx" is missing' Sync Gateway bucket shadowing crashes with error 'panic: parent id "xxx-xxxx" is missing' Jul 2, 2015
@tleyden
Copy link
Contributor

tleyden commented Jul 2, 2015

With additional logging:

2015-07-02T13:35:04.374-07:00 Shadow+: Pulling "-hgX0zVED0QOmjtmCr05ucN", CAS=a9b727647adb ... have UpstreamRev="272-068bb6cc18baa2b153d48b013f869351", UpstreamCAS=c2080ecbe0
2015-07-02T13:35:04.374-07:00 CRUD: addRevision called with {ID:273-c596128f3a7e3be4133992c78ced9070 Parent:272-068bb6cc18baa2b153d48b013f869351 Deleted:false Body:[] Channels:{}}.  
    tree: map[282-e6c633def7f8c4d763dc871df4487c1a:0xc208204820 
            283-771a5552d761e65d2066d3a6a869f657:0xc208204870 
            284-ca387c135cf92aa13c4e6e79320975d2:0xc2082048c0]
panic: parent id "272-068bb6cc18baa2b153d48b013f869351" is missing

@tleyden
Copy link
Contributor

tleyden commented Jul 2, 2015

Here is the raw document in couchbase server.

@tleyden
Copy link
Contributor

tleyden commented Jul 3, 2015

Here is another crash log with more debugging added. Details to follow.

@tleyden
Copy link
Contributor

tleyden commented Jul 6, 2015

Since the Shadower runs asynchronously with the rest of the mutations, the following can happen:

@tleyden
Copy link
Contributor

tleyden commented Jul 6, 2015

High level description of the problem:

Bucket shadowing needs the rev history to be maintained until pending revisions come over the tap feed. If you set it the rev pruning value too low, and the revision update frequency is this too high, the bucket shadower will look for revisions in the history that have already been pruned away, and the logic will fall apart at that point.

There is no easy fix for this -- it would take some design discussions to figure out the best path forward.

@joshblour I would suggest that you try to estimate your max doc update frequency, and assume that things can take up to 10 seconds for docs to show up to the bucket shadower, and calculate your rev tree size based on that. (and give it some slack)

So if you are updating a doc once per second, set your max rev tree size to 20, which should be enough to avoid this case. If not, try setting it higher until you are unable to reproduce.

@tleyden
Copy link
Contributor

tleyden commented Jul 6, 2015

@zgramana -- let's review this in the next sprint planning.

@tleyden tleyden added review and removed in progress labels Jul 6, 2015
@yonahforst
Copy link
Author

@tleyden will do. Thank you for looking into this.

@vfernandezg
Copy link

I'm having this issue too, is any temporary solution for start the sync again?

@yonahforst
Copy link
Author

@vfernandezg , in order to start the sync again, I had to flush the sync_gateway database from the couchbase server admin portal

@zgramana
Copy link
Contributor

We now have the repro code on issue #994.

@tleyden
Copy link
Contributor

tleyden commented Jul 31, 2015

Action item:

  • Generate the ticket that describes the refactoring (probably an epic)

@tleyden
Copy link
Contributor

tleyden commented Jul 31, 2015

Action item:

  • Separate ticket for Jeff with exact updates we want for config docs for revs limit. If you are using bucket shadowing and doc update rate is greater than X, you may encounter issues

@JeffThomasWriter
Copy link

I am working on the config parameters section now. Is this too much to add? Suggestions? Thanks.
revsLimit integer Maximum depth to which a document's revision tree can grow

Note:If you are using bucket shadowing, setting revsLimit to a value that is too small relative to the frequency of document revisions can have negative consequences. Bucket shadowing needs the revision history to be maintained until pending revisions are reconciled. We recommend that you estimate the maximum update frequency in documents per second (for example, one per second), assume that 10 seconds are needed for documents to be available to the bucket shadower, and then add some slack (for example, set the revsLimit to 20.

@tleyden
Copy link
Contributor

tleyden commented Aug 10, 2015

@JeffThomasWriter this part is good:

Note:If you are using bucket shadowing, setting revsLimit to a value that is too small relative to the frequency of document revisions can have negative consequences. Bucket shadowing needs the revision history to be maintained until pending revisions are reconciled.

but I think this part will be very confusing to users:

We recommend that you estimate the maximum update frequency in documents per second (for example, one per second), assume that 10 seconds are needed for documents to be available to the bucket shadower, and then add some slack (for example, set the revsLimit to 20.

Not sure the best approach here...

@tleyden
Copy link
Contributor

tleyden commented Aug 28, 2015

@JeffThomasWriter - any update on this? Did these docs get committed anywhere?

@JeffThomasWriter
Copy link

Documented with this note for revsLimit: Note: If you use bucket shadowing, setting revsLimit to a value that is too small relative to the frequency of document revisions can have negative consequences. Bucket shadowing needs the revision history to be maintained until pending revisions are reconciled.

@vfernandezg
Copy link

I have 40 revisions as a limit and I'm still having this issue. Is there another way to prevent this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants