-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use initial schema during bulk load. #3333
Conversation
Use the initial schema during bulk load, otherwise data might not be loaded correctly for those predicates. In particular, I observed that when loading the 21million dataset with dgraph.type triples added to it, queries would not respond and would eventually timeout. I figured this was because the index for that predicate was not built during the bulkload. I reloaded the dataset with my changes included and queries respond immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @manishrjain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @martinmr)
schema/schema.go, line 423 at r1 (raw file):
// them later than miss some of them. An example of such situation is during bulk // loading. func CompleteInitialSchema() []*pb.SchemaUpdate {
So, does it mean that bulk loader would always spit out the schemas for acls, even when the user only wants the open source version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain)
schema/schema.go, line 423 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
So, does it mean that bulk loader would always spit out the schemas for acls, even when the user only wants the open source version?
Initial schema will consider the worker options when deciding if a predicate should be created at start up. Right now there are two type of optional predicates (_predicate_
and ACL predicates) but there might be more in the future so I thought it would be safer to add all of them during the bulk process rather than risking missing some of them and not creating the proper indices during bulk load.
In any case, the ACL predicates are considered reserved even if ACL is off, so users should not be able to create another predicate with the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain)
Use the initial schema during bulk load, otherwise data might not be loaded correctly for those predicates. In particular, I observed that when loading the 21million dataset with dgraph.type triples added to it, queries would not respond and would eventually timeout. I figured this was because the index for that predicate was not built during the bulkload. I reloaded the dataset with my changes included and queries respond immediately.
Use the initial schema during bulk load, otherwise data might not be
loaded correctly for those predicates.
In particular, I observed that when loading the 21million dataset with
dgraph.type triples added to it, queries would not respond and would
eventually timeout. I figured this was because the index for that
predicate was not built during the bulkload. I reloaded the dataset with
my changes included and queries respond immediately.
Fixes #3329
This change is