Offline sync for Android SDK [working draft]

From https://github.com/strongloop/loopback-sdk-android/issues/71

Goal: Synchronize objects between an Android client application and Loopback backend.

To do this, leverage the server-side Synchronization API that's already available and implement a custom Android client invoking this API. Transferring Data Using Sync Adapters indicates that Android's ContentProvider/SyncAdapter may be the right abstraction to use.

The LoopBack replication API does not rely on timestamps (like if-modified-since), but rather uses the concept of checkpoints. A checkpoint is a number increased whenever a change replication occurs. The client the can query for changes made after the last checkpoint (see the since argument of PersistedModel.changes()).

If you are looking for one-way synchronisation only (Android client fetching updates from the server), then implement a SyncAdapter using the current change-replication API.

However, I was just curious about whether I could simply enable Sync on the models in my Loopback server to automatically provide me with extra endpoints to provide diffs, revisions, etc.

Yes, the flag is called trackChanges. See Enable change tracking. The other changes described in that section may not be needed if all you want is to fetch changes from the server.

Our docs contains also a list of methods that are related to change replication (link), see also the source code of the method implementing change replication: lib/persisted-model.js#L1120-L1308.

A simplified algorithm for pulling the changes from the server would be:

Create a new server checkpoint by calling POST /api/mymodels/checkpoint
Get a list of changes made since the last checkpoint we pulled (CP) by calling GET /api/mymodels/changes?since=CP
You can skip the diff step because there are no changes made locally to diff against
Get a list of updates to perform locally via POST /api/mymodels/createUpdates, sending the list of changes from step 2 in the request body (I am not entirely sure about the format, the request data may require addition manipulation)
Apply the updates returned by the server (update/create/delete affected model instances), see the implementation of bulkUpdate to learn more.
Save the checkpoint number returned in Step1 as the CP value to use in the next run.

I have enabled trackChanges on one of my models (and also changed the id property to an auto-generated GUID as per the documentation). I then found it incredibly useful to be able to use the StrongLoop API Explorer to see the new endpoints this introduced and experiment with them. I'm ever so impressed by how all this works.

One concern I have is that, for a client to see if there are new changes at the server using the algorithm you propose, the client would be bumping up the checkpoint number ever time it polls for changes. Does this risk creating any unnecessary bloat in the database over time? The slight variation to this strategy could be for the client to simply do GET /api/mymodels/changes?since=CP in the first instance (without creating a checkpoint first); and only if there were new changes would the client then get the 'head' checkpoint (and create one if one doesn't exist at the 'head').

The other concern I have is how much the change history in the database will grow over time, which might be an unnecessary overhead if I don't need the change history (when all I really want to achieve is "is the server data newer than my local data?". Is it possible to remove older change history?

Finally there's this 'known issue' in the documentation:

LoopBack does not fully support fine-grained access control to a selected subset of model instances, therefore it is not possible to replicate models where the user can access only a subset of instances (for example only the instances the user has created)

Does this mean that if the myModels collection contains objects created by different users, and a particular client only wants diffs to models owned by that particular user, would I have an issue using the Sync API?

I now see that the checkpoint just exists as one record. And, if I ever wanted to, it seems I can safely trim the contents of myModels-Change.

One concern I have is that, for a client to see if there are new changes at the server using the algorithm you propose, the client would be bumping up the checkpoint number ever time it polls for changes. Does this risk creating any unnecessary bloat in the database over time? The slight variation to this strategy could be for the client to simply do GET /api/mymodels/changes?since=CP in the first instance (without creating a checkpoint first); and only if there were new changes would the client then get the 'head' checkpoint (and create one if one doesn't exist at the 'head').

I guess if you have many clients that are checking for changes often, then the checkpoint number can eventually overflow int32/int64 limit. The variation you proposed looks sensible to me, as long as the cost of the extra request is not significant.

Does this mean that if the myModels collection contains objects created by different users, and a particular client only wants diffs to models owned by that particular user, would I have an issue using the Sync API?

Yes. Right now, the built-in sync API does not provide any way for filtering the model instances. I.e. your change list will include instances created by other users, and the "bulkUpdate" endpoint will allow the clients to update any instances, including those created by other users.

BTW this is true for the built-in find method too - it cannot filter the results based on the currently-logged-in user and if there are ACLs using $owner role, then find returns 401 unauthorized IIRC.

I think there may be a solution though:

Disable bulkUpdate and any other unused replication endpoints - see https://loopback.io/doc/en/lb3/Exposing-models-over-REST.html
Provide a custom implementation of changes method that will fill filter argument in such way that only the records of the currently logged-in user are shown.
Provide a custom implementation of createUpdates that will restrict the accessed models to those allowed for the current user. Alternatively, modify your client to fetch all changes individually by making one request per each changed model, instead of calling createUpdates. This may be actually a better option, see below.

// common/models/my-model.js
module.exports = function(MyModel) {
  // the code here is not tested, may not work out-of-the-box

  MyModel.disableRemoteMethod('changes');
  MyModel.disableRemoteMethod('createUpdates');
  MyModel.disableRemoteMethod('bulkUpdate');
  // etc.
  
  MyModel.myChanges = function(since, options, cb) {
    var currentUserId = options && options.accessToken && options.accessToken.userId;
    if (!currentUserId) {
      var err = new Error('Not Authorized');
      err.statusCode = 401;
      return cb(err);
    }

   // assuming "owner" is the FK mapping to users
   var filter = {where: { owner: currentUserId }};
   this.changes(since, filter, cb);
  };

  MyModel.remoteMethod('myChanges', {
    accepts: [
      {arg: 'since', type: 'number', description:
        'Only return changes since this checkpoint'},
      {arg: 'options', type: 'object', http: 'optionsFromRequest'},
    ],
    returns: {arg: 'changes', type: 'array', root: true},
    http: {verb: 'get', path: '/my-changes'},
   });
};

(The code is relying on https://github.com/strongloop/loopback/issues/1495 which was not published to npmjs.org yet.)

Loosely related:

Once strongloop/loopback#2959 is landed, filtering changes by model properties should be more performant.
strongloop/loopback#2961 is fixing the issue where createUpdates produces response that's too large to receive in a reasonable time. This may affect your client too, and therefore it may be better to avoid using createUpdates at all.

The proposed myChanges method will be extremely useful. I have already implemented a new remote method called 'mine' to get all objects that only belong to that owner, accessed at /myObjects/mine. So the endpoint you propose above will sit alongside that quite nicely. I believe that the if (!currentUserId) check you're doing can be eliminated -- I assume that currentUserId simply has to exist, because my ACL restricts the endpoint to $authenicated.

Another idea is to expand my MyUser model to provide each user with a private checkpoint of sorts.

I will definitely try to split a stand-alone Android demonstration of this from my project if I can.

A few questions/concerns that still exist are:

Using the Sync feature requires me to enable automatic assignment of GUIDs, but personally I'd like to continue using the MongoDB ObjectID format. Actually I doubt this is an issue because my client already self-generates and assigns the ObjectIDs to the objects before they are submitted to the API.
Sync feature requires strict validation. which conflicts with my need to have a lot of arbitrary properties. I've avoided turning on strict validation and the library hasn't issued a warning. EDIT: With strict validation turned off, at least the /changes endpoint still works exactly as it says on the tin, which is actually going to be sufficient for my needs.

Using the Sync feature requires me to enable automatic assignment of GUIDs, but personally I'd like to continue using the MongoDB ObjectID format. Actually I doubt this is an issue because my client already self-generates and assigns the ObjectIDs to the objects before they are submitted to the API.

Agreed - having a client to generate globally-unique ids (like ObjectIDs) and the server letting MongoDB to generate these ids should be equivalent to using LoopBack's GUID generator.

Sync feature requires strict validation. which conflicts with my need to have a lot of arbitrary properties. I've avoided turning on strict validation and the library hasn't issued a warning. EDIT: With strict validation turned off, at least the /changes endpoint still works exactly as it says on the tin, which is actually going to be sufficient for my needs.

The strict validation is required to reliably apply changes through LoopBack API.

Consider the following use case:

Original model: { handle: 'bajtos', facebook: 'bajtos' }
Updated data - user is no longer using Facebook: { handle: 'bajtos' }.

bulkUpdate uses updateAll method under the hood, this method cannot delete properties. (The combination of strict validation + persisting undefined as nulls solves this problem.)

Also:

Original model: { handle: 'bajtos' }
Server side data (change was made at the same time while we are applying client changes): { handle: 'bajtos', facebook: 'mbajtos' }
Client data (update to apply): { handle: 'bajtos', facebook: 'bajtos' }

bulkUpdate is verifying that the data in the database is matching the data assumed by the client before making the change and reports a conflict if another party changed the record "under our hands". We don't have any mechanism for detecting that extra properties were added, we can only detect that a value of a known property was changed.

I think you should be able to code your application in such way that will avoid these two problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline sync for Android SDK [working draft]

Clone this wiki locally