[SPARK-7735] [pyspark] Raise Exception on non-zero exit from pipe commands #6262

megatron-me-uk · 2015-05-19T11:03:04Z

This will allow problems with piped commands to be detected.
This will also allow tasks to be retried where errors are rare (such as network problems in piped commands).

This will allow problems with piped commands to be detected. This will also allow tasks to be retried where errors are rare (such as network problems in piped commands).

megatron-me-uk · 2015-05-19T11:06:38Z

A simple test of this:

a = sc.parallelize([1, 2, 3])
b = a.pipe('cc') # a clearly incorrect pipe command
b.collect()

The old behaviour is to return an empty list ([]) with no errors and fairly quiet logs if running on a distributed cluster.
The new behaviour is for the Job to fail with the following text in the logs:

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 101, in main
process()
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 96, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/rdd.py", line 270, in func
return f(iterator)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/rdd.py", line 666, in func
raise Exception("Pipe function %s' exited with error code %d" %(command, pipe.returncode) ) Exception: Pipe functioncc' exited with error code 1

srowen · 2015-05-19T11:16:36Z

@megatron-me-uk have a look at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first please -- you missed a few steps here. The change may well be fine.

megatron-me-uk · 2015-05-19T11:19:45Z

Ah, I hadn't seen that! Will take a look.

megatron-me-uk · 2015-05-19T12:07:46Z

I believe that this change will bring pyspark more in line with the operation of the scala implementation.
See:

spark/core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala

Line 161 in 4ce2782

throw new Exception("Subprocess exited with status " + exitStatus)

In scala, the output of an external command is read from stdout until EOF. Then the exit status is checked. If exitStatus is not Zero an Exception is thrown.
Here, I have implemented the same behaviour in python.

srowen · 2015-05-19T12:09:07Z

I think it makes sense; really I was getting at making a JIRA to track this.

megatron-me-uk · 2015-05-19T12:13:17Z

OK, seems I have to create an account etc. I will put it on my to-do list. Thanks for the help!

lucamartinetti · 2015-05-27T23:56:41Z

Ran into the same issue. Thanks for the patch!

JoshRosen · 2015-05-31T21:07:45Z

Jenkins, this is ok to test.

JoshRosen · 2015-05-31T21:07:54Z

Can you add a regression test for this in python/pyspark/tests.py?

SparkQA · 2015-05-31T21:17:04Z

Test build #33865 has finished for PR 6262 at commit f552d49.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-03T14:46:09Z

Test build #34085 has finished for PR 6262 at commit 5745d85.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-04T08:59:54Z

Test build #34175 has finished for PR 6262 at commit 0974f98.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

This is an error in PEP8 but not in pylint.

SparkQA · 2015-06-04T10:39:26Z

Test build #34178 has finished for PR 6262 at commit 1b3dc4e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

megatron-me-uk · 2015-06-04T16:55:14Z

The test failed on "org.apache.spark.network.netty.NettyBlockTransferSecuritySuite.security mismatch auth off on client" which I don't think is related to this pull request :(

SparkQA · 2015-06-15T15:55:04Z

Test build #34945 has finished for PR 6262 at commit 8db4073.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-15T16:16:42Z

Test build #34948 has finished for PR 6262 at commit cc1a73d.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-18T14:01:02Z

Test build #35126 has finished for PR 6262 at commit 3ab8c7a.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-18T16:09:56Z

Test build #35136 has finished for PR 6262 at commit 3344a21.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

Also be more specific about the Exception we expect to see

SparkQA · 2015-06-18T17:43:38Z

Test build #35142 has finished for PR 6262 at commit 491d3fc.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-06-23T15:04:12Z

The jenkins script is looking for results for JUnit, but we skipped all scala tests if only python or R is modified. I think we should fix the jenkins script.

Davies Liu
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

在 2015年6月23日星期二，上午8:02，Josh Rosen 写道：

Yep, it's a spurious failure in the test script infra; SparkQA says it's okay through, which is what we want.

Sent from my phone

On Jun 23, 2015, at 2:48 AM, Scott Taylor <[email protected] (mailto:[email protected])> wrote:

Recording test results
ERROR: Publisher 'Publish JUnit test result report' failed: No test report files were found. Configuration error?
Finished: FAILURE
I think that error is unrelated...

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHub (#6262 (comment)).

davies · 2015-06-24T00:08:29Z

python/pyspark/rdd.py

+                    raise Exception("Pipe function `%s' exited "
+                                    "with error code %d" % (command, pipe.returncode))
+                else:
+                    return None


I think we should make check_return_code a generator, for example:

def check_return_code(): # check return code for x in range(0): yield x

Yes, good point I will change it.

SparkQA · 2015-06-24T12:49:31Z

Test build #35676 has finished for PR 6262 at commit 8a9ef9c.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-24T15:57:17Z

Test build #35693 has finished for PR 6262 at commit a0c0161.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

permissive - do not check returncode strict - only allow returncode 0 grep - allow returncode 0 or 1

add optional argument 'mode' for rdd.pipe

SparkQA · 2015-06-30T23:22:24Z

Test build #36208 has finished for PR 6262 at commit b0ac3a4.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

return True when an exception should be raised

SparkQA · 2015-06-30T23:59:30Z

Test build #36213 has finished for PR 6262 at commit eb4801c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

megatron-me-uk · 2015-07-01T11:02:30Z

OK so I have implemented an optional argument 'mode' which by default ('permissive') maintains the current behaviour. I have added two other modes:
'strict' - raises an error if it encounters a non-zero returncode
'grep' - raises an error for returncodes that are not 1 or 0

For using grep I would recommend the 'grep' mode as it is probably better to raise an exception if grep fails for other reasons than not finding any matches.

I wanted to update the docstring with this information but haven't yet found an example where optional arguments are documented which I can use as a style guide.

davies · 2015-07-01T15:15:18Z

python/pyspark/rdd.py

@@ -687,13 +687,25 @@ def groupBy(self, f, numPartitions=None):
        return self.map(lambda x: (f(x), x)).groupByKey(numPartitions)

    @ignore_unicode_prefix
-    def pipe(self, command, env={}):
+    def pipe(self, command, env={}, mode='permissive'):
        """
        Return an RDD created by piping elements to a forked external process.



Could you add doc for mode?

davies · 2015-07-01T15:19:17Z

Could we only have two mode? Fail on non-zero code or not, then it will be easier to understand. We can call it checkCode, which is False by default, grep could work with checkCode=False. cc @JoshRosen

megatron-me-uk · 2015-07-03T09:34:03Z

I guess the benefit of having a third mode is that grep can return 1 for no results without raising an exception but if grep encounters an error of some unknown kind it will return 2 and an exception will be raised.

(rather than mode)

use boolean checkCode rather than more complicated mode optional argument. Also add param to docstring

Use boolean checkCode optional parameter

SparkQA · 2015-07-07T08:48:56Z

Test build #36666 has finished for PR 6262 at commit 574b564.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-07T09:22:56Z

Test build #36667 has finished for PR 6262 at commit 98fa101.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

megatron-me-uk · 2015-07-08T11:55:34Z

I have changed the implementation of the optional parameter to a boolean checkCode and updated the docstring to reflect that.

SparkQA · 2015-07-08T13:34:06Z

Test build #36790 has finished for PR 6262 at commit 04ae1d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

megatron-me-uk · 2015-07-10T08:33:08Z

I have simplified the optional parameter to be a boolean and added this to the docstring.

…mands This will allow problems with piped commands to be detected. This will also allow tasks to be retried where errors are rare (such as network problems in piped commands). Author: Scott Taylor <[email protected]> Closes #6262 from megatron-me-uk/patch-2 and squashes the following commits: 04ae1d5 [Scott Taylor] Remove spurious empty line 98fa101 [Scott Taylor] fix blank line style error 574b564 [Scott Taylor] Merge pull request #2 from megatron-me-uk/patch-4 0c1e762 [Scott Taylor] Update rdd pipe method for checkCode ab9a2e1 [Scott Taylor] Update rdd pipe tests for checkCode eb4801c [Scott Taylor] fix fail_condition b0ac3a4 [Scott Taylor] Merge pull request #1 from megatron-me-uk/megatron-me-uk-patch-1 a307d13 [Scott Taylor] update rdd tests to test pipe modes 34fcdc3 [Scott Taylor] add optional argument 'mode' for rdd.pipe a0c0161 [Scott Taylor] fix generator issue 8a9ef9c [Scott Taylor] make check_return_code an iterator 0486ae3 [Scott Taylor] style fixes 8ed89a6 [Scott Taylor] Chain generators to prevent potential deadlock 4153b02 [Scott Taylor] fix list.sort returns None 491d3fc [Scott Taylor] Pass a function handle to assertRaises 3344a21 [Scott Taylor] wrap assertRaises with QuietTest 3ab8c7a [Scott Taylor] remove whitespace for style cc1a73d [Scott Taylor] fix style issues in pipe test 8db4073 [Scott Taylor] Add a test for rdd pipe functions 1b3dc4e [Scott Taylor] fix missing space around operator style 0974f98 [Scott Taylor] add space between words in multiline string 45f4977 [Scott Taylor] fix line too long style error 5745d85 [Scott Taylor] Remove space to fix style f552d49 [Scott Taylor] Catch non-zero exit from pipe commands

AmplabJenkins · 2015-07-13T21:51:27Z

Can one of the admins verify this patch?

davies · 2015-07-13T21:57:51Z

@megatron-me-uk This is merged into master, could you close this PR?

Catch non-zero exit from pipe commands

f552d49

This will allow problems with piped commands to be detected. This will also allow tasks to be retried where errors are rare (such as network problems in piped commands).

megatron-me-uk changed the title ~~Raise Exception on non-zero exit from pipe commands~~ [SPARK-7735] [pyspark] Raise Exception on non-zero exit from pipe commands May 19, 2015

Remove space to fix style

5745d85

megatron-me-uk added 2 commits June 4, 2015 09:53

fix line too long style error

45f4977

add space between words in multiline string

0974f98

fix missing space around operator style

1b3dc4e

This is an error in PEP8 but not in pylint.

Add a test for rdd pipe functions

8db4073

fix style issues in pipe test

cc1a73d

remove whitespace for style

3ab8c7a

wrap assertRaises with QuietTest

3344a21

Pass a function handle to assertRaises

491d3fc

Also be more specific about the Exception we expect to see

davies reviewed Jun 24, 2015
View reviewed changes

make check_return_code an iterator

8a9ef9c

fix generator issue

a0c0161

megatron-me-uk added 3 commits June 30, 2015 23:46

add optional argument 'mode' for rdd.pipe

34fcdc3

permissive - do not check returncode strict - only allow returncode 0 grep - allow returncode 0 or 1

update rdd tests to test pipe modes

a307d13

Merge pull request #1 from megatron-me-uk/megatron-me-uk-patch-1

b0ac3a4

add optional argument 'mode' for rdd.pipe

fix fail_condition

eb4801c

return True when an exception should be raised

davies reviewed Jul 1, 2015
View reviewed changes

megatron-me-uk added 3 commits July 7, 2015 09:41

Update rdd pipe tests for checkCode

ab9a2e1

(rather than mode)

Update rdd pipe method for checkCode

0c1e762

use boolean checkCode rather than more complicated mode optional argument. Also add param to docstring

Merge pull request #2 from megatron-me-uk/patch-4

574b564

Use boolean checkCode optional parameter

fix blank line style error

98fa101

Remove spurious empty line

04ae1d5

megatron-me-uk closed this Jul 15, 2015

[SPARK-7735] [pyspark] Raise Exception on non-zero exit from pipe commands #6262

[SPARK-7735] [pyspark] Raise Exception on non-zero exit from pipe commands #6262

Conversation

megatron-me-uk commented May 19, 2015

megatron-me-uk commented May 19, 2015

srowen commented May 19, 2015

megatron-me-uk commented May 19, 2015

megatron-me-uk commented May 19, 2015

srowen commented May 19, 2015

megatron-me-uk commented May 19, 2015

lucamartinetti commented May 27, 2015

JoshRosen commented May 31, 2015

JoshRosen commented May 31, 2015

SparkQA commented May 31, 2015

SparkQA commented Jun 3, 2015

SparkQA commented Jun 4, 2015

SparkQA commented Jun 4, 2015

megatron-me-uk commented Jun 4, 2015

SparkQA commented Jun 15, 2015

SparkQA commented Jun 15, 2015

SparkQA commented Jun 18, 2015

SparkQA commented Jun 18, 2015

SparkQA commented Jun 18, 2015

davies commented Jun 23, 2015

davies Jun 24, 2015

Choose a reason for hiding this comment

megatron-me-uk Jun 24, 2015

Choose a reason for hiding this comment

SparkQA commented Jun 24, 2015

SparkQA commented Jun 24, 2015

SparkQA commented Jun 30, 2015

SparkQA commented Jun 30, 2015

megatron-me-uk commented Jul 1, 2015

davies Jul 1, 2015

Choose a reason for hiding this comment

davies commented Jul 1, 2015

megatron-me-uk commented Jul 3, 2015

SparkQA commented Jul 7, 2015

SparkQA commented Jul 7, 2015

megatron-me-uk commented Jul 8, 2015

SparkQA commented Jul 8, 2015

megatron-me-uk commented Jul 10, 2015

AmplabJenkins commented Jul 13, 2015

davies commented Jul 13, 2015