Remove _predicate_ from Dgraph. #3262

martinmr · 2019-04-05T22:30:26Z

From now on, expand queries will rely solely on the type of the node.
This change removes all mentions of the _predicate_ internal pred that
was previously being used to serve this type of queries.

There is also another small change to allow the value of a list variable
to be used as a list of predicates in expand queries.

Also, some of the expand tests in the query package were simplified so that the
output of the test queries is not as long.

Most of the changes due to removing or replacing tests. These changes
are due to one of the following reasons.

Tests that were checking for the state of _predicate_ and are no
longer valid.
Tests that were doing checks that should be done inside of the query
package. They have been removed if there was an existing query that was
doing the same or a new test has been added if no similar test existed.
Some of the expand tests inside the query package were removed
because there are tests verifying the expand functionality using types.
Since types are now the only way to make expand queries work, these
tests became obsolete.

This change is

From now on, expand queries will rely solely on the type of the node. This change removes all mentions of the _predicate_ internal pred that was previously being used to serve this type of queries. There is also another small change to allow the value of a list variable to be used as a list of predicates in expand queries. Most of the changes due to removing or replacing tests. These changes are due to one of the following reasons. 1. Tests that were checking for the state of _predicate_ and are no longer valid. 2. Tests that were doing checks that should be done inside of the query package. They have been removed if there was an existing query that was doing the same or a new test has been added if no similar test existed. 3. Some of the expand tests inside the query package were removed because there are tests verifying the expand functionality using types. Since types are now the only way to make expand queries work, these tests became obsolete.

dgraph/cmd/bulk/schema.go

dgraph/cmd/zero/oracle.go

martinmr · 2019-04-08T20:41:39Z

Ran some benchmarks comparing the performance of the live and bulk loaders when _predicate_ is enabled/disabled.

For the bulkloader (using the 21 million dataset), I observed an improvement but it was not drastic. The number of edges went down from 120 million to 99 million. We do not observe a 50% reduction in the number of edges because there are nodes that contain more than an edge for a given predicate.

During the mapping phase, it seems that the cost of adding a new _predicate_ edge is exercised when the first edge is added and repeated edges are more of a no-op. The reduced number of edges speeds up the reduce phase but only by the same percent in the reduction of total edges.

In average, the version without _predicate_ took around 1m45s to complete the bulk load while the current version took around 2m. The reduction in edges is slightly bigger than the reduction in total time.

For the live loader I used the smaller 1 million dataset. I could not observe any performance differences between the two versions. My hypothesis is that the process is throttled by transactions (which are not involved when the bulk loader is used) so the reduction in edges does not result in a reduction in the total time.

martinmr

Reviewable status: 0 of 31 files reviewed, 2 unresolved discussions (waiting on @golangcibot)

dgraph/cmd/bulk/schema.go, line 40 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

File is not goimports-ed (from goimports)

Done.

dgraph/cmd/zero/oracle.go, line 336 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

File is not goimports-ed (from goimports)

Done.

martinmr · 2019-04-09T21:42:49Z

Re did the bulk loader benchmark, this time without any indices in the schema. This time I observe the 50% reduction although the time savings are not 50% but are nonetheless noticeable (1m5s vs 42s).

martinmr · 2019-04-09T22:04:26Z

I ran the live loader benchmarks. Without _predicate_, the running time (with the 21 million dataset) is 3m40s. With _predicate_ is 4m45s. I see a decrease in the running time but not by 50%, just like in the bulk loader.

codexnull

Reviewed 27 of 30 files at r1, 4 of 4 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @golangcibot)

codexnull

It's a pretty big changeset, but I think I looked at everything.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @golangcibot)

manishrjain

Wao! This PR is a dream come true.

This PR also has a real potential of breaking transactions and queries. So, before you merge this, please do these things:

Run Jepsen bank tests overnight. That means, 20 times or more, each test running for half an hour.
Try out as many expand all queries as you can, manually with the 21 million dataset. Do this irrespective of whether you think unit tests cover all the cases or not.

Make the outcome of this verification process part of your PR commit message (when you squash and merge).

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @golangcibot and @manishrjain)

…erscore-predicate

From now on, expand queries will rely solely on the type of the node. This change removes all mentions of the _predicate_ internal pred that was previously being used to serve this type of queries. There is also another small change to allow the value of a list variable to be used as a list of predicates in expand queries. Most of the changes due to removing or replacing tests. These changes are due to one of the following reasons. 1. Tests that were checking for the state of _predicate_ and are no longer valid. 2. Tests that were doing checks that should be done inside of the query package. They have been removed if there was an existing query that was doing the same or a new test has been added if no similar test existed. 3. Some of the expand tests inside the query package were removed because there are tests verifying the expand functionality using types. Since types are now the only way to make expand queries work, these tests became obsolete. Verified jepsen tests work and that the 21 million dataset works as expected by manually adding types to certain nodes. Types and type definitons will be added to the entire dataset later.

martinmr requested a review from a team April 5, 2019 22:30

golangcibot reviewed Apr 5, 2019

View reviewed changes

dgraph/cmd/bulk/schema.go Outdated Show resolved Hide resolved

dgraph/cmd/zero/oracle.go Outdated Show resolved Hide resolved

go fmt

208c9b9

martinmr commented Apr 9, 2019

View reviewed changes

codexnull approved these changes Apr 10, 2019

View reviewed changes

martinmr requested a review from manishrjain April 11, 2019 17:56

manishrjain approved these changes Apr 18, 2019

View reviewed changes

martinmr added 9 commits April 19, 2019 17:21

Merge remote-tracking branch 'origin/master' into martinmr/remove-und…

3fdc069

…erscore-predicate

Merge remote-tracking branch 'origin/master' into martinmr/remove-und…

fe3aa25

…erscore-predicate

Ensure list of unique predicates during expand queries.

fddfcf3

Merge remote-tracking branch 'origin/master' into martinmr/remove-und…

50fc302

…erscore-predicate

Update test to not include _predicate_.

32a3557

Merge remote-tracking branch 'origin/master' into martinmr/remove-und…

a9beadd

…erscore-predicate

Fix test-bulk-schema.sh test

3fa1d48

Remove mentions of _predicate_ from gql tests.

71aa60a

Merge remote-tracking branch 'origin/master' into martinmr/remove-und…

fdc49a7

…erscore-predicate

martinmr merged commit d7f7d5f into master May 9, 2019

martinmr deleted the martinmr/remove-underscore-predicate branch May 9, 2019 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove _predicate_ from Dgraph. #3262

Remove _predicate_ from Dgraph. #3262

martinmr commented Apr 5, 2019 •

edited by manishrjain

Loading

martinmr commented Apr 8, 2019

martinmr left a comment

martinmr commented Apr 9, 2019

martinmr commented Apr 9, 2019

codexnull left a comment

codexnull left a comment

manishrjain left a comment

Remove _predicate_ from Dgraph. #3262

Remove _predicate_ from Dgraph. #3262

Conversation

martinmr commented Apr 5, 2019 • edited by manishrjain Loading

martinmr commented Apr 8, 2019

martinmr left a comment

Choose a reason for hiding this comment

martinmr commented Apr 9, 2019

martinmr commented Apr 9, 2019

codexnull left a comment

Choose a reason for hiding this comment

codexnull left a comment

Choose a reason for hiding this comment

manishrjain left a comment

Choose a reason for hiding this comment

martinmr commented Apr 5, 2019 •

edited by manishrjain

Loading