-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimal Persistence Strategy for a Chat Client #310
Comments
hey @fredguest That was quite a read! I would have gone down the first route too, only difference is I would have used
That way you get told which individual things got added rather than having to replace them all. It seems that you are doing a diff between the updates anyways so thats fine. I'm quite curious where the lag is there, as it's the approach we almost all the time as well! In regards to your second approach:
If you use paths
We actually have pretty strict data guarantees in place. While the callback on a record set is considered safest it's also the slowest since it depends on the data actually being set in the database rather than just in the cache. Deepstream goes with a "everything is good until I tell you it isn't" approach which means if a database error occurs you will be told, but it only requires a write to the cache to actually send out the data. We also have version numbers on records and lists. What this means if two clients happen to update the exact same record at the exact same time then they will both try to update the record version to the same number as well. Since all records with the same name process their updates in sync this means the second one in the queue will be rejected and inform the client that there was a merge conflict to resolve. https://deepstream.io/tutorials/core/handling-data-conflicts/ On a single deepstream node due to the nature of the algorithm I haven't been able to see this break. Obviously as your cluster scales this may become an issue but is something that your cache setup would mitigate depending on the consistency levels you set vs speed. I hope that explains a few of your questions, feel free to let me know what I didn't or any more you may have! |
Thanks for getting through the long read @yasserf, I hope you’ll stay with me! 😬 I refactored the code based on your comments, and I also removed Redux just so that all of the state management for each strategy can be seen in a single file. Just as a quick aside, I don’t see the Taking advantage of import React, { Component } from 'react';
import $ from '../../styles/demo/AnimalsView';
import { css } from 'aphrodite/no-important';
import { client } from '../../utils/client';
import Button from '../shared/Button';
import List from '../shared/List';
class AnimalsView extends Component {
state = { animals: [] }
componentDidMount() {
this.list = client.record.getList('animals');
this.list.whenReady(() =>
this.list.getEntries().forEach((name) =>
client.record.snapshot(name, (error, data) =>
this.setState(({ animals }) => ({ animals: [ ...animals, data ] }))
)
)
);
this.list.on('entry-added', (name) =>
client.record.snapshot(name, (error, data) =>
this.setState(({ animals }) => ({ animals: [ ...animals, data ] }))
)
);
}
componentWillUnmount() {
this.list.discard();
}
render() {
return (
<div className={css($.AnimalsView)}>
<Button label="Add Animal" onClick={this.onClick.bind(this)} />
<List items={this.state.animals} />
</div>
);
}
onClick() {
const recordName = `animals/${client.getUid()}`;
const record = client.record.getRecord(recordName);
record.set({ name: recordName, foo: 'bar' }, () =>
this.list.addEntry(recordName)
);
}
}
export default AnimalsView; As you can see it’s no longer necessary to use
I was also able to refactor the second strategy quite a bit, but I wasn’t able to get import React, { Component } from 'react';
import $ from '../../styles/demo/VegetablesView';
import { css } from 'aphrodite/no-important';
import { client } from '../../utils/client';
import Button from '../shared/Button';
import List from '../shared/List';
class VegetablesView extends Component {
state = { vegetables: {}, names: [] }
componentDidMount() {
this.record = client.record.getRecord('myApp');
this.record.whenReady((record) => {
const vegetables = record.get('vegetables') || {};
const names = Object.values(vegetables).sort((a, b) => a.createdAt - b.createdAt).map((vegetable) => vegetable.name);
this.setState({ vegetables, names });
});
this.record.subscribe('vegetables', (vegetables) => {
const names = Object.values(vegetables).sort((a, b) => a.createdAt - b.createdAt).map((vegetable) => vegetable.name);
this.setState({ vegetables, names });
});
}
componentWillUnmount() {
this.record.discard();
}
render() {
return (
<div className={css($.VegetablesView)}>
<Button label="Add Vegetable" onClick={this.onClick.bind(this)} />
<List items={this.state.names.map((name) => this.state.vegetables[name])} />
</div>
);
}
onClick() {
const itemName = `vegetables/${client.getUid()}`;
this.record.set(`vegetables[${itemName}]`, { name: itemName, createdAt: Date.now() });
}
}
export default VegetablesView; I actually prefer this strategy, not only because it is faster, but because it is a truer expression of the requirements. As discussed initially, individual observable records are not required in this case, simply an observable collection, and this strategy avoids creating a new observable record for each item in the list. The only problem is that in the this.record.subscribe('vegetables', (vegetables) => {
const names = Object.values(vegetables).sort((a, b) => a.createdAt - b.createdAt).map((vegetable) => vegetable.name);
this.setState({ vegetables, names });
}); …which sends a lot of data over the wire and must be resorted every time which is silly. If I was able to only receive the individual new items as they were added to that path, then I would be able to do this: this.record.subscribe('vegetables', (newVegetable) => {
this.setState(({ vegetables, names }) => ({
vegetables: { ...vegetables, [newVegetable.name]: newVegetable },
names: [ ...names, newVegetable.name ]
}));
}); …which is infinitely nicer and more efficient, because you only need to sort the names once on the initial If you want to confirm the performance difference, feel free to clone the client https://github.com/choppur/www and run it against your own Deepstream server and DB. I'm using a Deepstream server with a RethinkDB instance, each running on their own EC2, and no standalone caching layer. |
Hey Fred, Apologies for not getting back, we are going live with deepstreamHub in a week and combined with the length of your explanation it unfortunately slipped through the cracks! If you have any questions regarding API usage please let me know, but if you can keep them slightly more bit size for now I'll definitely be able to get back quicker :$ Thanks! |
No problem! The idea was to give clear, code examples of the issues, but I can see that also raises the level of investment in responding, no worries. Good luck with the launch of deepstreamHub! I wish you guys lots of success so your open source projects can continue to thrive! :) |
Thank you! |
Hey @WolframHempel and @yasserf very cool ecosystem of libraries, thanks for building and OSing them!
I’m coming from traditional REST APIs so Deepstream
records
andlists
are new concepts for me. I built a little demo app with React and Redux to explore the best implementation ofrecords
andlists
, and the requirements in this case are for an observable collection, but that each item in the collection does not need to be observable, much in the way the collection of messages might work in a chat client. Think Slack for example, the individual messages themselves do not need to be observable, but the collection of messages does so that new messages can be rendered when necessary. I landed on two distinct strategies, each with pros and cons, and I was hoping you might weigh in.You can find the full codebase for the demo app here: https://github.com/choppur/www
In the first strategy, I used an observable
list
in order to know when newrecords
are added, and then fetched snapshots of the referencedrecords
, as follows:What I like about this strategy is that it’s fairly efficient in terms of the amount of data sent over the wire, because even though the entire
list
is being sent to the client on every update, the nature of alist
is lightweight, and the data of eachrecord
only ever needs to be fetched once. What I don’t love about this strategy is that it feels like there are more callbacks than I would ideally like to be required to synchronize the data safely. Also, as you will see in the video below, it’s a bit laggy.In the second strategy, I used an observable
record
without alist
, updated a path within therecord
to persist the array of data, and subscribed to changes in that path in order to know when new items are added, as follows:What I like about this strategy is that it lends itself nicely to atomic state (reduxjs/redux#1385), as you could easily create a single
record
to store the entire state of your application, and then subscribe to isolated paths within thatrecord
from the relevant nodes in your component tree. I also like the fact that it’s less code, and simpler code, than the first strategy. What I don’t love about this strategy is that it’s less efficient in terms of the amount of data sent over the wire because it fetches all of the data for all of the items in the array any time the array changes, instead of just sending thelist
as in the first strategy. Furthermore, this line:...feels unsafe to me. If two different clients disagree about
this.props.vegetables
, some vegetables could be lost 😝. I could refetch that path within therecord
immediately before that line of code and use the returned data instead ofthis.props.vegetables
, but that seems wildly inefficient, and I’m not sure it would even guarantee safety.In considering the second strategy, the
record
API feels incomplete to me. Instead of simply usingset()
on a path, which replaces the entire path, what I would ideally like to do is update a path rather than replace it, based on it’s existing state in Deepstream, maybe something like this:…which would add a new item to the array at that path based on the existing state of the array at the moment Deepstream received the new item, without having to send all of that back to the client first. And in the case of an object, maybe something like this:
…which would merge a new key and it’s value into the object at that path in a similar manner.
Finally, comparing the apparent speed of these two strategies, with the caveat of testing a very small data set, the second strategy is quite noticeably faster:
I would love to hear any thoughts, suggestions, or code modifications you may have to share, and thanks again for all of the great work you’ve open sourced here!
The text was updated successfully, but these errors were encountered: