-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(GraphService): Add Dgraph implementation of GraphService #3261
feat(GraphService): Add Dgraph implementation of GraphService #3261
Conversation
Tests in master are not flaky: https://github.com/linkedin/datahub/actions/runs/1253184869. |
1ee76b8
to
585ad18
Compare
bbf53e8
to
891a451
Compare
@@ -41,6 +41,7 @@ project.ext.externalDependency = [ | |||
'commonsLang': 'commons-lang:commons-lang:2.6', | |||
'commonsCollections': 'commons-collections:commons-collections:3.2.2', | |||
'data' : 'com.linkedin.pegasus:data:' + pegasusVersion, | |||
'dgraph4j' : 'io.dgraph:dgraph4j:21.03.1', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have different DAOs in datahub-gma, for example, gmaNeo4jDao. How about adding DgraphDAO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the purpose be? Who would be using it other than the DgraphGraphService
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I don't see a need for a DAO at the moment. If we need to create a shared DAO later down the line, we can always refactor DgraphGraphService
3735b48
to
12cbfc1
Compare
metadata-io/src/main/java/com/linkedin/metadata/graph/DgraphExecutor.java
Outdated
Show resolved
Hide resolved
e88e118
to
739b16f
Compare
)) { | ||
try { | ||
// wait 0.01s, 0.02s, 0.04s, 0.08s, ..., 10.24s | ||
long time = (long) Math.pow(2, Math.min(retry, 10)) * 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe declare this as a named const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, called Duration INITIAL_DURATION
and double BACKOFF_MULTIPLIER
.
synchronized (System.out) { | ||
System.out.println(System.currentTimeMillis() + ": schema not available yet, waiting 10s"); | ||
} | ||
TimeUnit.SECONDS.sleep(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10s seems like an aggressive wait time- how long does it usually take for this schema to become available? Will this block gms from starting up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only happens rarely and only when you start a new Dgraph instance. The service is up but the schema is not yet initialized. For a production service or cluster the schema is always there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe all this is not really needed. There are two possible situations when no schema information are contained in the response (no schema info meaning not even saying the schema is empty):
- there is no schema on the Dgraph cluster
- there is a schema on the Dgraph cluster
In 1) we want to create un-seen types and relationship types, in 2) we don't want to create what is already there. But since creating those types in the Dgraph schema should be idempotent anyway, we do not get into an inconsistent state when we create those existing types again.
I will rework this and assume an empty schema when no schema information are returned.
)) { | ||
try { | ||
// wait 0.01s, 0.02s, 0.04s, 0.08s, ..., 10.24s | ||
long time = (long) Math.pow(2, Math.min(retry, 10)) * 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this backoff method be shared with Dgraph executor? Seems like there are some opportunities to factor out some shared code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method was meant to be moved into DgraphExecutor, looks like I forgot to delete it here. This method is not referenced any more, removing.
// request options for all requests | ||
private static final RequestOptions OPTIONS = RequestOptions.DEFAULT; | ||
|
||
private interface ThrowingSupplier<T, E extends Exception> { | ||
T get() throws E; | ||
} | ||
|
||
// We are retrying requests, otherwise concurrency tests will see exceptions like these: | ||
// java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-1 [ACTIVE] | ||
private static <T> T retry(ThrowingSupplier<T, Exception> func) { | ||
int attempts = 3; | ||
Exception exception = null; | ||
|
||
while (attempts > 0) { | ||
try { | ||
attempts--; | ||
return func.get(); | ||
} catch (Exception e) { | ||
exception = e; | ||
if (e instanceof SocketTimeoutException) { | ||
continue; | ||
} | ||
break; | ||
} | ||
} | ||
|
||
throw new RuntimeException(exception); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this change come in a separate PR? Is this a blocker for the dgraph changes? seems like it may be easier to debug in case something goes wrong by separating this change out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, will move that out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved into #3377, which uses Resilience4j library for retry logic. The exponential backoff retry logic in DgraphExecutor
also moved to use Resilience4j.
23a8aeb
to
801d272
Compare
9e9e3c7
to
8cd23d1
Compare
043b4fe
to
6c71d79
Compare
This reverts commit ccdb28e.
- Use Resilience4j for retry logic - Do not retry schema retrieval, assume empty schema if it misses information - Move all retry code into DgraphExecutor - Clean up DgraphExecutor
5d85c93
to
451be33
Compare
@gabe-lyons any more comments? is this good to go? |
Hey @EnricoMi - this is good to go from my end! |
Good to hear @gabe-lyons, this is ready to go into master from my side as well. I think the failing smoke test is spurious as I cannot reproduce it locally. Maybe someone can rerun that workflow? |
I agree- that looks unrelated. I've seen this test to be flakey as well cc @jjoyce0510 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This implements the GraphService API for Dgraph (https://dgraph.io). This implementation passes all GraphService tests added in #3011. Integration of Dgraph into GMS factories and the Docker setup is happening in a follow-up PR.
Checklist
Fixes #3084.