Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First-time sync optimized backfill mode #1503

Closed
zgramana opened this issue Jan 22, 2016 · 9 comments
Closed

First-time sync optimized backfill mode #1503

zgramana opened this issue Jan 22, 2016 · 9 comments
Assignees
Milestone

Comments

@zgramana
Copy link
Contributor

Proposal

Add a new option to the style parameter of the _changes endpoint: all_active. This would be a similar to the existing all_docs behavior, but would exclude _removed entries as well as tombstoned documents.

Motivations

During a first-time replication, where Couchbase Lite is backfilling up to the current last_sequence, current all_docs requests will include a number of items unlikely to be of interest, such as deleted documents and _removed revisions. For a long running database, this can result in more data being transmitted to Couchbase Lite and is desired. This proposal allows a more streamlined, concise first-time backfill.

@zgramana zgramana changed the title First-time sync optimizations First-time sync optimized backfill mode Jan 22, 2016
@zgramana zgramana added this to the 1.2.0 milestone Jan 22, 2016
@adamcfraser
Copy link
Collaborator

In addition to filtering deleted/removed out of the set of active revisions, we should also remove them from the set of non-winning leaf revisions returned.

@zgramana
Copy link
Contributor Author

Proposed change to Couchbase Lite needed to support this would look something like this (using CBL.NET as the example)

if (includeConflicts)
{
       if (lastSequenceID == null) 
       {
            path.Append("&style=all_active");
       }
       else
       {
            path.Append("&style=all_docs");
       }
}

@snej
Copy link
Contributor

snej commented Jan 22, 2016

Looks good to me!

@snej
Copy link
Contributor

snej commented Jan 22, 2016

CBL could also add &include_docs=true, and get the doc bodies in the same stream, which would eliminate the need to send the bulk_get. Does SG implement that option?

@adamcfraser
Copy link
Collaborator

Yes - include_docs is implemented by SG, and would work with the proposed style=all_active.

@snej
Copy link
Contributor

snej commented Jan 23, 2016

I noticed a problem with the proposal: If we send ?style=all_active to a server that doesn't understand it (SG 1.1, CouchDB, ...), the server will probably ignore the parameter and use the default style value. That means we won't get the all_docs behavior, so we won't get info about conflicts.

Unfortunately we can't do a version check of the server before sending the feed, because (due to recent pull optimizations) _changes is the first request we send the server on a first-time pull.

As an alternative, I suggest we add a new parameter, something like ?include_deleted=false. That way if the server doesn't recognize it, we still have style=all_docs in the URL and will get the conflicts.

@adamcfraser
Copy link
Collaborator

Good point Jens. I agree that we need to have a backwards-compatible approach, and an additional parameter makes that possible without intrusive client changes.

I'd suggest that we use the parameter name ?active_only=true, in order to better describe the fact that both deleted and removed revisions will be excluded.

snej added a commit to couchbase/couchbase-lite-ios that referenced this issue Jan 23, 2016
Send "?active_only=true" as a _changes feed parameter when there's no
"since" parameter, to tell SG 1.2+ to skip sending deleted (or removed)
docs. This can speed up the initial pull quite a lot.

Fixes #1052 (see also couchbase/sync_gateway#1503 )
adamcfraser added a commit that referenced this issue Jan 24, 2016
Fixes #1503.

If the active_only parameter is set to true on a changes request:
 - deleted revisions will not be returned
 - documents removed from all channels visible to the user will not be returned
 - when using style=all_docs, deleted conflicting revisions will not be included in the set of leaf revisions returned.
adamcfraser added a commit that referenced this issue Jan 25, 2016
Fixes #1503.

If the active_only parameter is set to true on a changes request:
 - deleted revisions will not be returned
 - documents removed from all channels visible to the user will not be returned
 - when using style=all_docs, deleted conflicting revisions will not be included in the set of leaf revisions returned.
@ethanfrey
Copy link

@adamcfraser

I compiled this commit from the current master 1444ebc locally and unfortunately, it didn't seem to fix the problem. Basically, http://localhost:4985/default/_changes?style=all_docs&active_only=true returns the exact same results as http://localhost:4985/default/_changes I have some "resolved" conflicts, where all but one leaf nodes were deleted and an unresolved conflict, where there are two leaf nodes, not deleted.

The first case works well, all deleted nodes are not shown in the changes feed. In the second case, I would expect both exisiting (conflicting) leaf revisions to appear in http://localhost:4985/default/_changes?style=all_docs&active_only=true Here is an example:

http://localhost:4985/default/_changes?style=all_docs&active_only=true:

{
seq: 67,
id: "actions:tab:32383a4d-89d3-455e-807e-a2dfa73d9108",
changes: [
{
rev: "30-3c34652c89532fae4a987ee36b98864c"
}
]
},...

http://localhost:4985/default/_changes?style=all_docs:

...
{
seq: 67,
id: "actions:tab:32383a4d-89d3-455e-807e-a2dfa73d9108",
changes: [
{
rev: "30-3c34652c89532fae4a987ee36b98864c"
},
{
rev: "8-601096f71316c503b57aadaa4cc976bb"
}
]
},...

And now for the two revisions:
http://localhost:4985/default/actions:tab:32383a4d-89d3-455e-807e-a2dfa73d9108?rev=30-3c34652c89532fae4a987ee36b98864c:

{
_id: "actions:tab:32383a4d-89d3-455e-807e-a2dfa73d9108",
_rev: "30-3c34652c89532fae4a987ee36b98864c",
... doc data here...
}

http://localhost:4985/default/actions:tab:32383a4d-89d3-455e-807e-a2dfa73d9108?rev=8-601096f71316c503b57aadaa4cc976bb:

{
_id: "actions:tab:32383a4d-89d3-455e-807e-a2dfa73d9108",
_rev: "8-601096f71316c503b57aadaa4cc976bb",
... doc data here ...
}

Neither of these two leaf node has a _deleted: true element

@ethanfrey
Copy link

I took a look at the fix and think I found the issue. At least it now provides correct results on my local data set. Simply a small error in the conditional logic. Making a pull request now, please review it.

ajres pushed a commit that referenced this issue Jan 26, 2016
Fixes #1503.

If the active_only parameter is set to true on a changes request:
 - deleted revisions will not be returned
 - documents removed from all channels visible to the user will not be returned
 - when using style=all_docs, deleted conflicting revisions will not be included in the set of leaf revisions returned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants