Use initial schema during bulk load. #3333

martinmr · 2019-04-26T23:39:09Z

Use the initial schema during bulk load, otherwise data might not be
loaded correctly for those predicates.

In particular, I observed that when loading the 21million dataset with
dgraph.type triples added to it, queries would not respond and would
eventually timeout. I figured this was because the index for that
predicate was not built during the bulkload. I reloaded the dataset with
my changes included and queries respond immediately.

Fixes #3329

This change is

Use the initial schema during bulk load, otherwise data might not be loaded correctly for those predicates. In particular, I observed that when loading the 21million dataset with dgraph.type triples added to it, queries would not respond and would eventually timeout. I figured this was because the index for that predicate was not built during the bulkload. I reloaded the dataset with my changes included and queries respond immediately.

codexnull

Reviewed 2 of 2 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @manishrjain)

manishrjain

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @martinmr)

schema/schema.go, line 423 at r1 (raw file):

// them later than miss some of them. An example of such situation is during bulk
// loading.
func CompleteInitialSchema() []*pb.SchemaUpdate {

So, does it mean that bulk loader would always spit out the schemas for acls, even when the user only wants the open source version?

martinmr

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain)

schema/schema.go, line 423 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

So, does it mean that bulk loader would always spit out the schemas for acls, even when the user only wants the open source version?

Initial schema will consider the worker options when deciding if a predicate should be created at start up. Right now there are two type of optional predicates (_predicate_ and ACL predicates) but there might be more in the future so I thought it would be safer to add all of them during the bulk process rather than risking missing some of them and not creating the proper indices during bulk load.

In any case, the ACL predicates are considered reserved even if ACL is off, so users should not be able to create another predicate with the same name.

manishrjain

Reviewed 1 of 2 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain)

Use the initial schema during bulk load, otherwise data might not be loaded correctly for those predicates. In particular, I observed that when loading the 21million dataset with dgraph.type triples added to it, queries would not respond and would eventually timeout. I figured this was because the index for that predicate was not built during the bulkload. I reloaded the dataset with my changes included and queries respond immediately.

martinmr requested a review from manishrjain as a code owner April 26, 2019 23:39

martinmr requested a review from a team April 26, 2019 23:39

codexnull approved these changes Apr 27, 2019

View reviewed changes

manishrjain suggested changes Apr 27, 2019

View reviewed changes

martinmr commented Apr 29, 2019

View reviewed changes

manishrjain approved these changes Apr 29, 2019

View reviewed changes

martinmr merged commit 85c608f into master Apr 29, 2019

martinmr deleted the martinmr/bulk-load-initial-schema branch April 29, 2019 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use initial schema during bulk load. #3333

Use initial schema during bulk load. #3333

martinmr commented Apr 26, 2019 •

edited

Loading

codexnull left a comment

manishrjain left a comment

martinmr left a comment

manishrjain left a comment

Use initial schema during bulk load. #3333

Use initial schema during bulk load. #3333

Conversation

martinmr commented Apr 26, 2019 • edited Loading

codexnull left a comment

Choose a reason for hiding this comment

manishrjain left a comment

Choose a reason for hiding this comment

martinmr left a comment

Choose a reason for hiding this comment

manishrjain left a comment

Choose a reason for hiding this comment

martinmr commented Apr 26, 2019 •

edited

Loading