[Spark] Fix check to enumerate job IDs #3111

derekwbrown · 2017-01-06T23:49:04Z

What does this PR do?

Spark integration, at least against Spark 1.5, was using the wrong value of 'id'. It appears there are two values, the generic job id name from 8080/json:

"activeapps" : [ {
"starttime" : 1483727736675,
"id" : "app-20170106103536-0024",
"name" : "PythonPi",
"user" : "db",
"memoryperslave" : 1024,
"submitdate" : "Fri Jan 06 10:35:36 PST 2017",
"state" : "RUNNING",
"duration" : 5357
} ],

However, when you chase down individual job URL, in this case 4040/api/v1/applications/, you get
[ {
"id" : "PythonPi",
"name" : "PythonPi",
"attempts" : [ {
"startTime" : "2017-01-06T18:35:34.758GMT",
"endTime" : "1969-12-31T23:59:59.999GMT",
"sparkUser" : "",
"completed" : false
} ]
} ]

When creating the URLs to retrieve the individual job statistics, the api appears to want the id from the 2nd, not the first. The code was using the id from the first, and failing.

Motivation

Customer report.

Testing Guidelines

Additional Notes

derekwbrown · 2017-01-18T17:57:40Z

I have tested on Spark 2.0. This change is not necessary for Spark 2.0, but it does still work (just one additional http request to the Spark API). So, we could
a) leave as is
b) when we get the 404 fall back to this method
c) have a configuration option for pre 2.0 spark
The fix appears to only be necessary for older (pre 2.0) versions of Spark.

derekwbrown · 2017-01-19T17:47:47Z

Added the conditional, so that the new code is only used when configured (on older versions of Spark). Did the following tests

Tried w/o config on Spark 1.5.2, verified bug reproduced.
Tried w config on Spark 1.5.2, verified bug fixed
Tried w/o config on Spark 2.0, verified works as expected
Tried w config on Spark 2.0, verified still works as expected (albeit with extra http request)

truthbk · 2017-01-19T18:27:22Z

checks.d/spark.py

@@ -198,6 +201,7 @@ def __init__(self, name, init_config, agentConfig, instances=None):
        AgentCheck.__init__(self, name, init_config, agentConfig, instances)
        self.previous_jobs = {}
        self.previous_stages = {}
+        self._spark_pre_20_mode = False


I don't think we need this as an attribute, it will change from instance to instance and it's not too expensive to look up in the instance dict (something we'd have to do anyway if you made the self._spark_pre_20_mode into a dict). We can probably just lookup the value in check() for the relevant code paths.

truthbk · 2017-01-19T18:37:03Z

checks.d/spark.py

@@ -263,6 +267,11 @@ def _get_running_apps(self, instance, tags):
            cluster_mode = SPARK_YARN_MODE

        if cluster_mode == SPARK_STANDALONE_MODE:
+            # check for PRE-20
+            pre20 = instance.get(SPARK_PRE_20_MODE)


I'd use _is_affirmative() here (you'll have to import it)...

pre20 = _is_affirmative(instance.get(SPARK_PRE_MODE, False))

By doing this we achieve two things, if the key doesn't exist (ie. it hasn't been set in the YAML) we'll default to false, and we can also handle stuff like "yes", "no", etc if it is set. You won't need to do the validation like you do in the next couple of lines.

truthbk · 2017-01-19T18:40:00Z

checks.d/spark.py

+            pre20 = instance.get(SPARK_PRE_20_MODE)
+            if pre20 is not None:
+                if pre20 is True:
+                    self._spark_pre_20_mode = True
            return self._standalone_init(master_address)


We can maybe pass the pre20 argument to _standalone_init() as an optional argument. Prototype would change to:

def _standalone_init(self, spark_master_address, pre20=False):

truthbk · 2017-01-19T18:42:23Z

checks.d/spark.py

@@ -624,6 +648,7 @@ def _rest_request(self, address, object_path, service_name, *args, **kwargs):
            url = urljoin(url, '?' + query)

        try:
+            self.log.debug('Spark check URL: %s' % url)


Do we really need this line? I can see it being of value when debugging support cases, but it'll be quite verbose, no? Your call.

It will be quite verbose, but that's why it's debug. It's the first thing we added (and then sent the customer a patch) when the card came in, so it seems worthwhile since the default will be not to log (info level)

truthbk · 2017-01-19T18:51:08Z

Also, since you have the payload formats, etc... it could be worth it to add a test for this use-case here: https://github.com/DataDog/dd-agent/blob/master/tests/checks/mock/test_spark.py

truthbk

Looks good

Didn't realize the tests hadn't been added.

truthbk

Let's add a quick test to test the change before shipping. Thanks @derekwbrown

Add a conditional for the new code path (as it's for older versions of Spark, and causes an additional HTTP request that's not necessary on newer versions) Added review feedback Add unit test

truthbk

This looks good, was just wondering if we could maybe remove some of the duplicate code in the mocks? Up to you, I just think it might be easier to maintain moving forward.

truthbk · 2017-01-27T20:27:27Z

tests/checks/mock/test_spark.py

@@ -206,6 +212,69 @@ def raise_for_status(self):
            body = f.read()
            return MockStandaloneResponse(body, 200)

+def standalone_requests_pre20_get_mock(*args, **kwargs):


Ideally we could maybe refactor some code out of standalone_requests_pre20_get_mock() and standalone_requests_get_mock() so we'd have less duplicated code?

maybe something like:

class MockStandaloneResponse: text = '' def __init__(self, json_data, status_code): self.json_data = json_data self.status_code = status_code self.text = json_data def json(self): return json.loads(self.json_data) def raise_for_status(self): return True def standalone_request_get_mock(*args, **kwargs): if args[0] == STANDALONE_APP_URL: with open(Fixtures.file('spark_standalone_apps'), 'r') as f: body = f.read() return MockStandaloneResponse(body, 200) # REMAINING logic def standalone_request_pre20_get_mock(*args, **kwargs): if args[0] == STANDALONE_SPARK_APP_URL: with open(Fixtures.file('spark_apps_pre20'), 'r') as f: body = f.read() return MockStandaloneResponse(body, 200) else: return standalone_request_get_mock(args, kwargs)

degemer added bugfix checks labels Jan 18, 2017

derekwbrown force-pushed the db/spark_fix branch from 8023121 to fc05c1d Compare January 19, 2017 17:46

truthbk suggested changes Jan 19, 2017

View reviewed changes

derekwbrown force-pushed the db/spark_fix branch from e2e86dc to 30eb58d Compare January 19, 2017 21:04

truthbk previously approved these changes Jan 26, 2017

View reviewed changes

truthbk suggested changes Jan 26, 2017

View reviewed changes

derekwbrown force-pushed the db/spark_fix branch from 30eb58d to 222a870 Compare January 27, 2017 19:20

[Spark] Fix check to enumerate job IDs

8535953

Add a conditional for the new code path (as it's for older versions of Spark, and causes an additional HTTP request that's not necessary on newer versions) Added review feedback Add unit test

derekwbrown force-pushed the db/spark_fix branch from 222a870 to 8535953 Compare January 27, 2017 19:28

truthbk approved these changes Jan 27, 2017

View reviewed changes

derekwbrown merged commit 9e49c43 into master Jan 27, 2017

gmmeyer added this to the 5.12.0 milestone Mar 8, 2017

derekwbrown deleted the db/spark_fix branch April 13, 2017 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spark] Fix check to enumerate job IDs #3111

[Spark] Fix check to enumerate job IDs #3111

derekwbrown commented Jan 6, 2017 •

edited

Loading

derekwbrown commented Jan 18, 2017

derekwbrown commented Jan 19, 2017

truthbk Jan 19, 2017

truthbk Jan 19, 2017

truthbk Jan 19, 2017

truthbk Jan 19, 2017

derekwbrown Jan 19, 2017

truthbk commented Jan 19, 2017

truthbk left a comment

truthbk left a comment

truthbk left a comment

truthbk Jan 27, 2017

[Spark] Fix check to enumerate job IDs #3111

[Spark] Fix check to enumerate job IDs #3111

Conversation

derekwbrown commented Jan 6, 2017 • edited Loading

What does this PR do?

Motivation

Testing Guidelines

Additional Notes

derekwbrown commented Jan 18, 2017

derekwbrown commented Jan 19, 2017

truthbk Jan 19, 2017

Choose a reason for hiding this comment

truthbk Jan 19, 2017

Choose a reason for hiding this comment

truthbk Jan 19, 2017

Choose a reason for hiding this comment

truthbk Jan 19, 2017

Choose a reason for hiding this comment

derekwbrown Jan 19, 2017

Choose a reason for hiding this comment

truthbk commented Jan 19, 2017

truthbk left a comment

Choose a reason for hiding this comment

truthbk left a comment

Choose a reason for hiding this comment

truthbk left a comment

Choose a reason for hiding this comment

truthbk Jan 27, 2017

Choose a reason for hiding this comment

derekwbrown commented Jan 6, 2017 •

edited

Loading