-
Notifications
You must be signed in to change notification settings - Fork 171
[DCOS-41388] Fix bug where v1 mesos client instantiation defaults to v0 #2655
Conversation
3fee424
to
8bc7631
Compare
@@ -166,7 +167,7 @@ dependencies { | |||
|
|||
testCompile 'org.hamcrest:hamcrest-all:1.3' // note: must be above junit | |||
testCompile 'junit:junit:4.12' | |||
testCompile 'org.mockito:mockito-all:1.10.19' | |||
testCompile 'org.mockito:mockito-core:2.10.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mockito does not produce the mockito-all artifact anymore ; this one was primarily aimed at ant users, and contained other dependencies. We felt it was time to move on and remove such artifacts as they cause problems in dependency management system like maven or gradle.
classpath "com.github.jengelman.gradle.plugins:shadow:2.0.1" | ||
} | ||
plugins { | ||
id "com.github.johnrengelman.shadow" version "2.0.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove usage of Gradle internal AbstractFileCollection.
https://github.com/johnrengelman/shadow/releases/tag/2.0.4
@@ -0,0 +1 @@ | |||
mock-maker-inline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8bc7631
to
5f2071d
Compare
sdk/scheduler/src/main/java/com/mesosphere/sdk/framework/SchedulerDriverFactory.java
Outdated
Show resolved
Hide resolved
final Capabilities capabilities, | ||
final FrameworkInfo frameworkInfo, | ||
final String masterUrl, | ||
@Nullable final Credential credential, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about switching this to an Optional<Credential>
? Both here and at L139
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that but I feel there is very less value in adding that considering I still have to do credential.map(EvolverDevolver.evolve(_)).getOrElse(null)
as null
is what gets passed in to V0Mesos
and V1Mesos
constructors. I can use constructors which do not have Credential
in them but it would end up duplicating code.
Either way, I thought @Nullable ...
is less costlier than Optional<?> ..
and also ti would add very less value in this case by complicating a simple ternary operation.
sdk/scheduler/src/main/java/com/mesosphere/sdk/framework/SchedulerDriverFactory.java
Outdated
Show resolved
Hide resolved
data.middle | ||
) | ||
.apply(mock(MesosToSchedulerDriverAdapter.class)) | ||
.getClass() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be done as assertTrue([...].getClass() instanceof data.right)
? Or is that the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting a compile time error if I do a instanceof
as data.right
is of type ? extends Class<? extends Mesos>
sdk/scheduler/src/test/java/com/mesosphere/sdk/framework/SchedulerDriverFactoryTest.java
Show resolved
Hide resolved
- Performed the upgrade using ./gradlew wrapper --gradle-version=4.10 --distribution-type=bin - Did a `./gradlew clean build --warning-mode all` and fixed the warnings printed - Also removed the auto-generated gradlew.bat file
e54b300
to
73fcd53
Compare
testing/sdk_tasks.py
Outdated
@@ -423,14 +423,20 @@ def _check_tasks_updated(): | |||
_check_tasks_updated() | |||
|
|||
|
|||
def check_tasks_not_updated(service_name, prefix, old_task_ids): | |||
def check_tasks_not_updated( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some digging around and I think I know what's happening:
- I'd guess that the V1 API is faster to mark the framework as inactive, or maybe it's instead slower to mark the framework active again after the scheduler has re-registered?
sdk_tasks.py:L204
filters out frameworks with!active
when fetching the list of all frameworks.- Given 1+2,
get_task_ids()
would return an empty list if the scheduler hasn't completed re-registering.
Given this, I think a better solution here would be to explicitly check for the active framework after restarting the scheduler. This would make the expected behavior a bit more explicit:
- Implement a function (
sdk_tasks.wait_for_active_framework()
?) that waits for at least one framework with the specified service name to be active. This could be built using logic similar to the/mesos/frameworks
fetch insdk_tasks._get_service_tasks()
, where it just waits forfwk["name"] == service_name and fwk["active"]
on any framework entry. - Update the hdfs test to invoke that new function before calling
check_tasks_not_updated()
. check_tasks_not_updated()
could be reverted to its previous state without the retry.- Couldn't hurt to also add a
wait_for_active_framework()
+check_tasks_not_updated()
check to hello-worldtest_zzzrecovery.kill_scheduler
too, because it isn't currently checking for this. Would explain why the failure was only showing up in HDFS to begin with.
This is a sample timeline of the events from one of the CI runs : Schedulers view :
pytest's view :
|
The diff for changes since last approval : 2f383a0...76ecf45 |
The scheduler is trying to instantiate a V1 mesos client here https://github.com/mesosphere/dcos-commons/blob/9f1bc016c2ad2a7f5d9632f3a8fa0e4c412ec570/sdk/scheduler/src/main/java/com/mesosphere/sdk/framework/SchedulerDriverFactory.java#L112
but by calling
super.startInternal()
libmesos adapter is creating a v0 client https://github.com/mesosphere/mesos-http-adapter/blob/5965d467c86ce7a1092bda5624dd2da9b0776334/src/main/java/com/mesosphere/mesos/http-adapter/MesosToSchedulerDriverAdapter.java#L316-L318We removed the environment variables from the config.json and marathon mustache template through #2566. We should fix this without reintroducing the variable in config.json (and thus intentionally prohibiting the user to specify V0 or V1 easily but still retaining the behavior of creating V0 client if needed).
The ideal way to test this would have been calling upstream to see if client is speaking V0 or V1 (but should upstream know this?). I have written unit test and this is the behavior :
I have upgraded
mockito
as I wanted to mock final classes. While I am here, I have also upgraded the gradle version that we use. This is what I have done to upgrade gradle version :Performed the upgrade using
./gradlew wrapper --gradle-version=4.10 --distribution-type=bin
Did a
./gradlew clean build --warning-mode all
and fixed the warnings printed (Except forbecause of GradleUp/shadow#336
build.gradle
filesUpgrading gradle was not really necessary for DCOS-41388 but I did that because this would be useful for the upcoming SDK refactoring. Gradle implementation and api dependencies