feat(GraphService): Add Dgraph implementation of GraphService #3261

EnricoMi · 2021-09-18T09:08:26Z

This implements the GraphService API for Dgraph (https://dgraph.io). This implementation passes all GraphService tests added in #3011. Integration of Dgraph into GMS factories and the Docker setup is happening in a follow-up PR.

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable)

Fixes #3084.

EnricoMi · 2021-09-20T11:40:05Z

Tests in master are not flaky: https://github.com/linkedin/datahub/actions/runs/1253184869.

camelliazhang · 2021-10-06T17:11:33Z

build.gradle

@@ -41,6 +41,7 @@ project.ext.externalDependency = [
    'commonsLang': 'commons-lang:commons-lang:2.6',
    'commonsCollections': 'commons-collections:commons-collections:3.2.2',
    'data' : 'com.linkedin.pegasus:data:' + pegasusVersion,
+    'dgraph4j' : 'io.dgraph:dgraph4j:21.03.1',


We have different DAOs in datahub-gma, for example, gmaNeo4jDao. How about adding DgraphDAO?

What would the purpose be? Who would be using it other than the DgraphGraphService?

Agreed, I don't see a need for a DAO at the moment. If we need to create a shared DAO later down the line, we can always refactor DgraphGraphService

metadata-io/src/main/java/com/linkedin/metadata/graph/DgraphExecutor.java

gabe-lyons · 2021-10-11T18:08:40Z

metadata-io/src/main/java/com/linkedin/metadata/graph/DgraphExecutor.java

+                )) {
+                    try {
+                        // wait 0.01s, 0.02s, 0.04s, 0.08s, ..., 10.24s
+                        long time = (long) Math.pow(2, Math.min(retry, 10)) * 10;


maybe declare this as a named const?

Done, called Duration INITIAL_DURATION and double BACKOFF_MULTIPLIER.

gabe-lyons · 2021-10-11T18:11:17Z

metadata-io/src/main/java/com/linkedin/metadata/graph/DgraphGraphService.java

+                    synchronized (System.out) {
+                        System.out.println(System.currentTimeMillis() + ": schema not available yet, waiting 10s");
+                    }
+                    TimeUnit.SECONDS.sleep(10);


10s seems like an aggressive wait time- how long does it usually take for this schema to become available? Will this block gms from starting up?

This only happens rarely and only when you start a new Dgraph instance. The service is up but the schema is not yet initialized. For a production service or cluster the schema is always there.

Maybe all this is not really needed. There are two possible situations when no schema information are contained in the response (no schema info meaning not even saying the schema is empty):

there is no schema on the Dgraph cluster

there is a schema on the Dgraph cluster

In 1) we want to create un-seen types and relationship types, in 2) we don't want to create what is already there. But since creating those types in the Dgraph schema should be idempotent anyway, we do not get into an inconsistent state when we create those existing types again.

I will rework this and assume an empty schema when no schema information are returned.

gabe-lyons · 2021-10-11T18:18:02Z

metadata-io/src/main/java/com/linkedin/metadata/graph/DgraphGraphService.java

+                        )) {
+                    try {
+                        // wait 0.01s, 0.02s, 0.04s, 0.08s, ..., 10.24s
+                        long time = (long) Math.pow(2, Math.min(retry, 10)) * 10;


should this backoff method be shared with Dgraph executor? Seems like there are some opportunities to factor out some shared code here.

this method was meant to be moved into DgraphExecutor, looks like I forgot to delete it here. This method is not referenced any more, removing.

gabe-lyons · 2021-10-11T18:19:54Z

metadata-io/src/test/java/com/linkedin/metadata/ElasticSearchTestUtils.java

+    // request options for all requests
+    private static final RequestOptions OPTIONS = RequestOptions.DEFAULT;
+
+    private interface ThrowingSupplier<T, E extends Exception> {
+        T get() throws E;
+    }
+
+    // We are retrying requests, otherwise concurrency tests will see exceptions like these:
+    //   java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-1 [ACTIVE]
+    private static <T> T retry(ThrowingSupplier<T, Exception> func) {
+        int attempts = 3;
+        Exception exception = null;
+
+        while (attempts > 0) {
+            try {
+                attempts--;
+                return func.get();
+            } catch (Exception e) {
+                exception = e;
+                if (e instanceof SocketTimeoutException) {
+                    continue;
+                }
+                break;
+            }
+        }
+
+        throw new RuntimeException(exception);
+    }
+


should this change come in a separate PR? Is this a blocker for the dgraph changes? seems like it may be easier to debug in case something goes wrong by separating this change out

right, will move that out

Moved into #3377, which uses Resilience4j library for retry logic. The exponential backoff retry logic in DgraphExecutor also moved to use Resilience4j.

This reverts commit ccdb28e.

- Use Resilience4j for retry logic - Do not retry schema retrieval, assume empty schema if it misses information - Move all retry code into DgraphExecutor - Clean up DgraphExecutor

EnricoMi · 2021-11-04T11:46:29Z

@gabe-lyons any more comments? is this good to go?

gabe-lyons · 2021-11-04T15:46:15Z

Hey @EnricoMi - this is good to go from my end!

EnricoMi · 2021-11-04T18:01:34Z

Good to hear @gabe-lyons, this is ready to go into master from my side as well.

I think the failing smoke test is spurious as I cannot reproduce it locally. Maybe someone can rerun that workflow?

gabe-lyons · 2021-11-04T18:41:51Z

I agree- that looks unrelated. I've seen this test to be flakey as well cc @jjoyce0510

EnricoMi · 2021-11-04T21:10:40Z

All green, yay. Btw, these new test stats look awesome ;-)

shirshanka

LGTM!

EnricoMi marked this pull request as draft September 20, 2021 07:37

EnricoMi force-pushed the branch-test-dgraph-graph-service-thoroughly branch 4 times, most recently from 1ee76b8 to 585ad18 Compare September 27, 2021 08:03

EnricoMi force-pushed the branch-test-dgraph-graph-service-thoroughly branch 6 times, most recently from bbf53e8 to 891a451 Compare October 6, 2021 10:46

camelliazhang reviewed Oct 6, 2021

View reviewed changes

EnricoMi force-pushed the branch-test-dgraph-graph-service-thoroughly branch from 3735b48 to 12cbfc1 Compare October 6, 2021 20:12

EnricoMi marked this pull request as ready for review October 7, 2021 19:40

EnricoMi commented Oct 7, 2021

View reviewed changes

metadata-io/src/main/java/com/linkedin/metadata/graph/DgraphExecutor.java Outdated Show resolved Hide resolved

EnricoMi force-pushed the branch-test-dgraph-graph-service-thoroughly branch from e88e118 to 739b16f Compare October 7, 2021 20:15

EnricoMi mentioned this pull request Oct 11, 2021

test(metadata-io): Run metadata-io tests in parallel #3358

Merged

gabe-lyons reviewed Oct 11, 2021

View reviewed changes

EnricoMi force-pushed the branch-test-dgraph-graph-service-thoroughly branch from 23a8aeb to 801d272 Compare October 12, 2021 19:38

EnricoMi force-pushed the branch-test-dgraph-graph-service-thoroughly branch 2 times, most recently from 9e9e3c7 to 8cd23d1 Compare October 21, 2021 19:19

EnricoMi mentioned this pull request Oct 21, 2021

feat(quickstart): Simplify docker generate and compare script #3434

Merged

EnricoMi force-pushed the branch-test-dgraph-graph-service-thoroughly branch 4 times, most recently from 043b4fe to 6c71d79 Compare October 26, 2021 16:58

EnricoMi added 22 commits November 4, 2021 10:22

Add Dgraph test container logs to SLF4J logging

2986d5d

Log test container through Console, not SLF4J

a1f2984

Log all dgraph client calls

11dc1b6

Set gRPC deadline on every call sufficiently large

f08f0e4

Wait until Dgraph leader is elected

d3d031d

Add some debug messages to STDOUT for testing

33d5ef7

getSchema returns null on insufficient JSON, retrying getSchema

d711da8

Retry deadline exceeded exceptions

be4ac54

Make schema changes atomic and thread-safe

d3f3b79

Null channel and service after disconnect

3abb383

Move startup timeout out of constructor, set attempts to 3

c0f284e

Give concurrent op tests 5 minutes to complete

cdf53f7

Increased grpc deadline to 30 seconds.

19d1b84

Retry another exception

876a11a

Improving exception logging

85821b5

Run Dgraph container with tmpfs mount for working dir

02b2e40

Reduce concurrency preasure

9631d54

Revert "Run tests 30 times"

d7752fa

This reverts commit ccdb28e.

Rework retry logic

ab6ac04

- Use Resilience4j for retry logic - Do not retry schema retrieval, assume empty schema if it misses information - Move all retry code into DgraphExecutor - Clean up DgraphExecutor

Remove printouts or replace with log messages

16265d6

Fix imports

a0ee5ce

Fixes after merging master

451be33

EnricoMi force-pushed the branch-test-dgraph-graph-service-thoroughly branch from 5d85c93 to 451be33 Compare November 4, 2021 09:22

shirshanka approved these changes Nov 15, 2021

View reviewed changes

shirshanka merged commit 031e0b9 into datahub-project:master Nov 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(GraphService): Add Dgraph implementation of GraphService #3261

feat(GraphService): Add Dgraph implementation of GraphService #3261

EnricoMi commented Sep 18, 2021 •

edited

Loading

EnricoMi commented Sep 20, 2021

camelliazhang Oct 6, 2021

EnricoMi Oct 6, 2021

gabe-lyons Oct 11, 2021

gabe-lyons Oct 11, 2021

EnricoMi Oct 12, 2021

gabe-lyons Oct 11, 2021

EnricoMi Oct 12, 2021

EnricoMi Oct 12, 2021

gabe-lyons Oct 11, 2021

EnricoMi Oct 12, 2021

gabe-lyons Oct 11, 2021

EnricoMi Oct 12, 2021

EnricoMi Oct 12, 2021

EnricoMi commented Nov 4, 2021

gabe-lyons commented Nov 4, 2021

EnricoMi commented Nov 4, 2021

gabe-lyons commented Nov 4, 2021

EnricoMi commented Nov 4, 2021

shirshanka left a comment

feat(GraphService): Add Dgraph implementation of GraphService #3261

feat(GraphService): Add Dgraph implementation of GraphService #3261

Conversation

EnricoMi commented Sep 18, 2021 • edited Loading

Checklist

EnricoMi commented Sep 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EnricoMi commented Nov 4, 2021

gabe-lyons commented Nov 4, 2021

EnricoMi commented Nov 4, 2021

gabe-lyons commented Nov 4, 2021

EnricoMi commented Nov 4, 2021

shirshanka left a comment

Choose a reason for hiding this comment

EnricoMi commented Sep 18, 2021 •

edited

Loading