Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

[DCOS-38138] Update Spark CLI for shell-escape fix #388

Merged
merged 43 commits into from
Aug 31, 2018

Conversation

samvantran
Copy link
Contributor

@samvantran samvantran commented Aug 2, 2018

Resolves DCOS-38138 and DCOS-38462

This PR builds off mesosphere/spark#33 and updates the Spark CLI to make it possible to pass multiple --conf args in a single string eg.

--conf spark.driver.extraJavaOptions='-Dspecial.char.param=\$PATH -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam1=\"This setting has a space\"'

Previously, only one value was accepted in a typical --conf arg=val1 and would fail if there were spaces included.

An example spark run command:

dcos spark run --submit-args=" \
	--conf spark.driver.cores=1.0 \
	--conf spark.driver.memory=1g \
	--conf spark.executor.extraJavaOptions='-XX:+PrintGCDetails -XX:+PrintGCTimeStamps' \
	--conf spark.mesos.containerizer=mesos  \
	--class org.apache.spark.examples.SparkPi \
	https://s3-us-west-2.amazonaws.com/infinity-artifacts/spark/spark-examples_2.11-2.2.1.jar 100"

manifest.json Outdated
"hadoop_version": "2.6",
"uri": "https://downloads.mesosphere.com/spark/assets/spark-2.2.1-2-bin-2.6.tgz"
"hadoop_version": "2.7",
"uri": "svt-dev.s3.amazonaws.com/spark/spark-2.2.1-bin-2.7.3-dcos-38138.tgz"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once PR is in good shape and approved, I'll update the downloads.mesosphere uri to include the updated tgz

app_args="",
expected_output="spark.executor.extraJavaOptions,-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3",
service_name=service_name,
args=["--conf spark.driver.extraJavaOptions='-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3'",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is giving me trouble. This same command works via Spark CLI but not with pytests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the TeamCity output, the error seems to be thrown by the Parse function called here: https://github.com/mesosphere/spark-build/blob/DCOS-38138-shell-escape/cli/dcos-spark/submit_builder.go#L411. It might be a good idea to check whether that is behaving as expected.

Also, confirming that the following command:

dcos spark --name=spark run --verbose --submit-args="--conf spark.driver.extraJavaOptions='-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3' --conf spark.mesos.containerizer=mesos --class MultiConfs --conf spark.mesos.role=* --conf spark.driver.memory=2g http://infinity-artifacts-ci.s3.amazonaws.com/autodelete7d/spark/test-20180802-062841-eHc7mdB3GD1WgGtj/dcos-spark-scala-tests-assembly-0.2-SNAPSHOT.jar "

exactly as it is generated by the tests works from the CLI would also be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the exact same command works when submitted via command line and the stdout returns the expected args:
from CLI:

Translated spark-submit arguments: '--conf=spark.driver.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3, --conf=spark.mesos.containerizer=mesos, --class=MultiConfs, --conf=spark.mesos.role=*, --conf=spark.driver.memory=2g, http://infinity-artifacts-ci.s3.amazonaws.com/autodelete7d/spark/test-20180802-062841-eHc7mdB3GD1WgGtj/dcos-spark-scala-tests-assembly-0.2-SNAPSHOT.jar'
...
Run job succeeded. Submission id: driver-20180802152929-0002

stdout

1.448: [GC (Allocation Failure) [PSYoungGen: 64512K->8448K(75264K)] 64512K->8456K(247296K), 0.0073157 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
Running MultiConfs app. Printing out all config values:
(spark.driver.extraJavaOptions,-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3)
(spark.jars,file:/mnt/mesos/sandbox/dcos-spark-scala-tests-assembly-0.2-SNAPSHOT.jar)
...

The issue seems to be going through CI requires another layer of Python string parsing which causes the Go parsing to fail (my guess is that additional quotes are added)

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this @samvantran. The CI failure is a little disconcerting, but I do think that it's something we could narrow down in unit tests here.

The error comes from the call to _, err := submit.Parse(submitArgs) after the call to cleanUpSubmitArgs.

I noted that the unit tests drop the quotes around the multi-args, but in the output for CI these are still present which seems unexpected.

if boolVal.flagName == current[2:] {
sparkArgs = append(sparkArgs, current)
i++
continue LOOP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we pass --supervise false or something similar? The code doesn't seem to handle this case explicitly (It seems as if this will be joined in the default case though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, false will get joined with --supervise and the job will fail. I just tried it and we get a parse error

Error when parsing --submit-args: unknown long flag '--supervise false'

In this case, it might make sense to check for boolean flags and if found, check the next argument. If it's a boolean statement like true/false, we could just move the pointer forward and skip it (i+=2).

More generally though, there are a lot of ways to mess up confs. Does it make sense to have some kind of blacklist that guards against 'user error'?

func (suite *CliTestSuite) TestCleanUpSubmitArgsWithMulitpleValues() {
_, args := sparkSubmitArgSetup()
inputArgs := "--conf spark.driver.extraJavaOptions='-XX:+PrintGC -Dparam1=val1 -Dparam2=val2' main.py 100"
expected := "--conf=spark.driver.extraJavaOptions=-XX:+PrintGC -Dparam1=val1 -Dparam2=val2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the problem. The quoting is stripped from the argument. In the CI, it is most likely the second -XX that is failing, but I would expect the -D to fail here aswell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do confirm, we may want to include tests that call buildSubmitJson with the relevant arguments too.

Copy link
Contributor Author

@samvantran samvantran Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i've tested it on CI where even just setting extraJavaOptions='val1' results in error Cannot read file val1 or something like that.

I think this is Python doing its own kind of parsing (wrapping everything in single-quotes?) before getting to Go parsing and that causes the issue.

In commit 7c294c7 I added that same config passed to buildSubmitJson and it still works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I might have missed it in the cleanupSubmitArgs logic: when do the single quotes get stripped off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in our logic, it happens in shellwords.parse()

args, err := shellwords.Parse(argsStr)

@samvantran
Copy link
Contributor Author

@elezar Thanks for taking a look! Yes, the quotes are dropped from the multi-args which works when it gets to Kingpin parsing.

The problem is CI tests use Python and there are a bunch of places where "{}.format()" are called which is adding another layer of string parsing. This makes it tricky to get parsing correct across two different languages. My concern is trying to 'make it work' for Python CI when most cases will involve users submitting via DCOS Spark CLI and Python isn't involved.

@samvantran
Copy link
Contributor Author

samvantran commented Aug 2, 2018

Hm, surprisingly when I run test.sh locally (connected to a personal CCM cluster), the test passes! Maybe it isn't a Python issue... Make/Teamcity? 😞

*edit: see below

run test:

╭─  ~/Dev/mesosphere/spark-build                                                                                                                                                           07:42    DCOS-38138-shell-escape    8   14:26:45
╰ CLUSTER_URL=$(dcos config show core.dcos_url) \
STUB_UNIVERSE_URL=https://universe-converter.mesosphere.com/transform?url=https://infinity-artifacts-ci.s3.amazonaws.com/autodelete7d/spark/20180801-170442-aNDU6gcD0WLsTAVY/stub-universe-spark.json,https://universe-converter.mesosphere.com/transform?url=https://infinity-artifacts-ci.s3.amazonaws.com/autodelete7d/spark-history/20180801-015849-bYYlp28dxraqW7Bc/stub-universe-spark-history.json \
 TEST_SH_DCOS_SPARK_TEST_JAR_URL=http://infinity-artifacts-ci.s3.amazonaws.com/autodelete7d/spark/test-20180802-172502-iseM14lZPoG40Htm/dcos-spark-scala-tests-assembly-0.2-SNAPSHOT.jar \
 TEST_SH_MESOS_SPARK_TEST_JAR_URL=http://infinity-artifacts-ci.s3.amazonaws.com/autodelete7d/spark/test-20180802-202846-cKhPCqE64FKeTtMr/mesos-spark-integration-tests-assembly-0.1.0.jar \
 PYTEST_ARGS='-k test_spark_with_multi_configs' \
 ./test.sh 

test output:

[2018-08-02 21:08:13,242|sdk_cmd|INFO]: STDOUT:
1.375: [GC (Allocation Failure) [PSYoungGen: 64512K->7368K(75264K)] 64512K->7376K(247296K), 0.0061293 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
Running MultiConfs app. Printing out all config values:
(spark.driver.extraJavaOptions,-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3)
(spark.jars,file:/mnt/mesos/sandbox/dcos-spark-scala-tests-assembly-0.2-SNAPSHOT.jar)
...
PASSED
==========
======= END: test_spark_py::test_spark_with_multi_configs
==========

Copy link
Contributor

@susanxhuynh susanxhuynh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, @samvantran ! The code is more readable now, too.

@@ -280,75 +284,61 @@ func parseApplicationFile(args *sparkArgs) error {
return nil
}

// we use Kingpin to parse CLI commands and options
// spark-submit by convention uses '--arg val' while kingpin only supports --arg=val
// cleanUpSubmitArgs transforms the former to the latter
func cleanUpSubmitArgs(argsStr string, boolVals []*sparkVal) ([]string, []string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this function is misnamed - "cleanup" sounds like simply removing extra spaces or special characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, how about renaming to transformSubmitArgs? ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that this is even needed. I just tested the quota command -- also defined using kingpin and the output is as follows:

➜  dcos-spark git:(DCOS-38138-shell-escape) ✗ dcos --log-level debug spark quota create --cpus 10 foo
Using CLI config file: /Users/elezar/.dcos/clusters/c173c4f2-c398-43c6-ab72-d46b99cb326a/dcos.toml
HTTP Query (96 byte payload): POST https://mvdrf110s.scaletesting.mesosphe.re/mesos/quota
{"role":"foo","guarantee":[{"name":"cpus","type":"SCALAR","scalar":{"value":10}}],"force":false}
^CUser interrupted command with Ctrl-C
➜  dcos-spark git:(DCOS-38138-shell-escape) ✗ dcos --log-level debug spark quota create --cpus=10 foo
Using CLI config file: /Users/elezar/.dcos/clusters/c173c4f2-c398-43c6-ab72-d46b99cb326a/dcos.toml
HTTP Query (96 byte payload): POST https://mvdrf110s.scaletesting.mesosphe.re/mesos/quota
{"role":"foo","guarantee":[{"name":"cpus","type":"SCALAR","scalar":{"value":10}}],"force":false}
^CUser interrupted command with Ctrl-C

Note that kingping handles both the --arg=value and --arg value forms the same way. We could just drop this function entirely.

Copy link
Contributor Author

@samvantran samvantran Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is true, it will be a major
getinsertpic.com

*edit
thinking it over, maybe its original intent was to simply separate the spark-submit args from the application args but it may have morphed since then.

It might be worth testing deleting the cleanSubmitArgs function and seeing if all the tests still pass (or maybe adding a --app-args flags since that's needed for the application itself... but that would be a breaking API change). I also have a feeling that it does not handle long strings of args.

current := strings.TrimSpace(args[i])
switch {
// if current is a spark jar/app, we've processed all flags; add jar to sparkArgs and append the rest to appArgs
case strings.HasSuffix(current, ".jar") || strings.HasSuffix(current, ".r") || strings.HasSuffix(current, ".py"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if it matters but previously the check for an R app is capital R: https://github.com/mesosphere/spark-build/blob/3026e88dae1df008705905ac38cfde164b660220/cli/dcos-spark/submit_builder.go#L251, here it's lower case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It probably does matter so I'll change it back

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact. How about having an isAppArtifact or function or something? (with unit tests to ensure that we don't subtly change the behaviour)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 6b32e5

i += 1
default:
// if not a flag or jar, current is a continuation of the last arg and should not have been split
// eg extraJavaOptions="-Dparam1 -Dparam2" was parsed as [extraJavaOptions, -Dparam1, -Dparam2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this isn't handled by shellwords? (It sounded like it might based on the README.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No unfortunately the library splits the string on non-escaped spaces so this default case glues the string back together.

app_args="",
expected_output="spark.executor.extraJavaOptions,-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3",
service_name=service_name,
args=["--conf spark.driver.extraJavaOptions='-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a few special characters here as well? Maybe a -Dparam3=\"val3 val4\" like in the unit test?

println("Running MultiConfs app. Printing out all config values:")
val APPNAME = "MultiConfs App"
val conf = new SparkConf().setAppName(APPNAME)
conf.getAll.foreach(println)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One additional check that tests the whole thing end-to-end: pass in -Dparam1=\"valA valB\" as one of the extraJavaOptions, and then inside this app, you can check the value of System.getProperty("param1") to make sure it was set correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll update.

@samvantran
Copy link
Contributor Author

samvantran commented Aug 3, 2018

I think this may be the culprit: sdk_cmd.py#L138

Here's what Translated spark-submit arguments is supposed to look like:

Translated spark-submit arguments: '--conf=spark.driver.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3, --conf=spark.mesos.containerizer=mesos, --class=MultiConfs, --conf=spark.mesos.role=*, --conf=spark.driver.memory=2g, http://infinity-artifacts-ci.s3.amazonaws.com/autodelete7d/spark/test-20180802-202846-cKhPCqE64FKeTtMr/dcos-spark-scala-tests-assembly-0.2-SNAPSHOT.jar'

a comma separated list of strings (due to this line)

Here's what the arguments look like in the CI job:

[Step 7/8] 2018/08/02 21:10:51 Translated spark-submit arguments: '[--conf=spark.driver.extraJavaOptions='-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dparam3=val3' ....jar]'

it's one string encased inside brackets []

Interestingly, two lines up in the CI build log, you see that the dcos command about to be run looks normal:

[Step 7/8] sdk_cmd.py  137 INFO     (CLI) dcos spark --name=spark run --verbose --submit-args="--conf spark.driver.extraJavaOptions='-XX:+PrintGCDetails... 

which means somewhere in this call, something goes wrong.

I did some digging and it looks like subprocess.run(..., shell=True) is strongly discouraged (see here). Besides it being dangerous, it also means that whichever /bin/sh exists on the machine is the subprocess created and we are at its mercy. That might be why running test.sh on my machine worked but not in CI.

Recommended approaches are to instead subprocess.run() with an array of strings and set shell=False, or escape strings using shlex.quote() (link).

But really, what that means is making this Spark CLI fix work in CI requires passing a shell-escaped string through whole slew of languages Python -> Shell -> Go and that is very cumbersome 😢

Maybe calling dcos spark via a dcos task exec might be easier. @elezar @susanxhuynh what do you think?

Since python's subprocess.run is using the underlying /bin/sh we want to
put the submit-args in a file so that it doesn't do its own escaping
before reaching our Go CLI.
No need for an additional level of indirection. It's better if it's all
self-contained within the actual test since there are slight changes
that are required in different areas of the utils.
@samvantran samvantran force-pushed the DCOS-38138-shell-escape branch from 9ae7045 to 9f86664 Compare August 16, 2018 21:13
What an adventure... for about a week I tried testing running a dcos
spark run command via subprocess.run with cat cmd, entire file, sleeps,
syncs, and chmods all in an attempt to satisfy CI and then finally found
out CI was misconfigured and those stubs never made it to test cluster.
Once the PR to fix all the master branch failures goes in this will
likely be all that's needed to run the test.
@samvantran samvantran force-pushed the DCOS-38138-shell-escape branch from f86b73f to a3e5a9a Compare August 21, 2018 21:22
- add code for checking booleans with test
- rename cleanupSubmitArgs -> transformSubmitArgs
@samvantran samvantran force-pushed the DCOS-38138-shell-escape branch from a3e5a9a to 36faae7 Compare August 24, 2018 20:04
samvantran and others added 3 commits August 24, 2018 13:04
Due to the recent discovery of CI misconfiguration, it looks like this
test may not have been actually working/testing what we suspected. In CI
it does not look like the --auth-token is being included in the
CreateSubmissionRequest for Spark and thus is not triggered in the
mesos-integration-tests. This needs to be fixed but should not be in
this PR.
@samvantran
Copy link
Contributor Author

@elezar @susanxhuynh hallelujah! CI finally passes. Please review hopefully one last time 🙏

Quick recap:

  1. CI was misconfigured for the longest time. Over the last few months, only the latest spark package in universe was being tested, not any actual stubs. This explains why the 20 commits I made trying different strategies to make CI happy always failed.
  2. The upstream shell-escape PR actually needed fixing which I did in this PR. It works for our fork but eventually will need an upstream fix to apache/spark.
  3. I did have to comment out one test in commit 7865b6c for reasons I explained in this ticket

test.sh Outdated
@@ -10,12 +10,13 @@
# Exit immediately on errors
set -e

timestamp="$(date +%y%m%d-%H%M%S)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #398 covers this file change

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for keeping up with this @samvantran.

I have made some comments with regards to the boolean flags and also think it's important to create a ticket to better understand the need for the translation between kingpin spark submit args.

// collapse two or more spaces to one.
argsCompacted := collapseSpacesPattern.ReplaceAllString(argsStr, " ")
// we use Kingpin to parse CLI commands and options
// spark-submit by convention uses '--arg val' while kingpin only supports --arg=val
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we create a ticket to address this. Kingpin supports both --arg val and --arg=val.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks.

isBool = true
next := args[idx+1]
if _, err := strconv.ParseBool(next); err == nil {
skipNext = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. What if someone calls the function with --supervise false that should have different behaviour than --supervise true, but this is not captured here.

Also, as far as I know named return values have fallen out of favour in idiomatic Go.

Copy link
Contributor Author

@samvantran samvantran Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... that's true. It may make sense to remove it from the list in the false case since that is default Spark behavior.

As for point 2, I didn't know. I was just operating off the Go reference and thought it was a nifty feature.

I did some googling and found this as the latest reference but it was in favor of named return values: https://blog.minio.io/golang-internals-part-2-nice-benefits-of-named-return-values-1e95305c8687

Can you point me to resources that argue the opposite? *edit looks like there's a bit of discussion here: golang/go#20859

Anyway, as I mentioned below maybe we shouldn't cover true/false case anyhow since that would be a user not following spark-cli instructions

Copy link
Contributor

@elezar elezar Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have just been recalling a recording of a gophercon talk that I watched. In this case, I don't think it's the named return values so much as the naked return that is the issue. I stand corrected.

Here is a PR review that makes the changes to the Go standard library and style guide:

For clarity here:

Named returned values should only be used on public funcs and methods
when it contributes to the documentation.

Named return values should not be used if they're only saving the
programmer a few lines of code inside the body of the function,
especially if that means there's stutter in the documentation or it
was only there so the programmer could use a naked return
statement. (Naked returns should not be used except in very small
functions)

in this case, I would say that we are not stuttering so the named values themselves are ok, but we should consider dropping the naked return -- although we could consider this a very small function.

With regards to:

Anyway, as I mentioned below maybe we shouldn't cover true/false case anyhow since that would be a user not following spark-cli instructions

Yes, that would be ideal. With that said, this is exactly the reason why we should use a library to parse the arguments instead of doing it ourselves. Since you've already created a ticket to address this, we should instead focus on getting this PR done before tackling that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing. That all sounds good to me. When I first started learning Go and saw the naked returns (in other repos), I was confused. It wasn't until I read Effective Go that I learned its purpose.

I'm glad the community values readability so I'm +1 with you on this.

assert.True(suite.T(), isBool, "isBool should be true since --supervise is a Boolean flag")
assert.True(suite.T(), skipNext, "skipNext should be true since 'true' is a Boolean value")

inputArgs = []string{"--supervise", "false", "--conf", "spark.mesos.containerizer=mesos"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about --supervise with NO args?

I also don't see a test for the equivalence of:

  • --supervise and --supervise true
  • (missing --supervise) and --supervise false

Copy link
Contributor Author

@samvantran samvantran Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test above is for --supervise with no args: https://github.com/mesosphere/spark-build/pull/388/files/0ca555a0a3b9b95302d9635191732b1c68948407#diff-1ed61d6b454a1d6565ff2def51ff2487R100

As for equivalence, this is trickier. I can test for true bc we set it on this line https://github.com/mesosphere/spark-build/pull/388/files#diff-4b81b743bcae14a20623a9e94b14a92bR455 but false is Spark's default config so if there is no supervise flag, then Spark takes care of it when the driver starts up which is outside the scope of CLI parsing.

This kind of goes back to an earlier question, should we even need to handle cases for --supervise false? This is not a typical use case and submitting that flag value in a vanilla Spark job (not even including DCOS) would be strange

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for not supporting --bool-flag BOOL_VAL options, but then we need to change the handling of the bool args. The thing to confirm would then be that we correctly handle the different cases:

  • --supervise should pass
  • --supervise true should error
  • --supervise false should error
  • --supervise https://some.jar should pass

Are these all the test cases that we need to cover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and the CLI already provides such errors:

$ dcos spark run --name=spark-test --submit-args="\
--supervise false \
--class org.apache.spark.examples.SparkPi \
https://s3-us-west-2.amazonaws.com/infinity-artifacts/spark/spark-examples_2.11-2.2.1.jar 100"
2018/08/30 16:17:20 Error when parsing --submit-args: unknown long flag '--supervise false'

$ dcos spark run --name=spark-test --submit-args="\
--supervise true \
--class org.apache.spark.examples.SparkPi \
https://s3-us-west-2.amazonaws.com/infinity-artifacts/spark/spark-examples_2.11-2.2.1.jar 100"
2018/08/30 16:17:26 Error when parsing --submit-args: unknown long flag '--supervise true'

$ dcos spark run --name=spark-tes --submit-args="\
--supervise \
--class org.apache.spark.examples.SparkPi \
https://s3-us-west-2.amazonaws.com/infinity-artifacts/spark/spark-examples_2.11-2.2.1.jar 100"
Using image 'mesosphere/spark-dev:7b2bb6c6a4e7d0a79d4e44e15076904d2adc6ab1' for the driver and the executors (from dispatcher: container.docker.image).
To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false
Run job succeeded. Submission id: driver-20180830231733-0005

@@ -2,7 +2,7 @@
"spark_version": "2.2.1",
"default_spark_dist": {
"hadoop_version": "2.6",
"uri": "https://downloads.mesosphere.com/spark/assets/spark-2.2.1-2-bin-2.6.tgz"
"uri": "svt-dev.s3.amazonaws.com/spark/spark-2.2.1-bin-2.6.5-DCOS-38138.tgz"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will be updated once the next release is cut?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, or do you think I should first merge the mesosphere/spark PR (and revert the --proxy-user commit) off custom-branch-2.2.1-X? I could then create a new tgz and upload it to
downloads.mesosphere.com/spark/assets/spark-2.2.1-3-bin-2.6.tgz (note the -3) and change this PR. That way I wouldn't have to do it twice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't think we should keep overwriting the "2.2.1-3" tgz with different versions. Until we actually make a release, how about something like "2.2.1-3-SNAPSHOT" to signal that it's not an officially released version, but a dev one that will keep changing up until the release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SNAPSHOT sounds best

If a user submits a Spark job with --supervise true/false, they weren't
actually following instructions and we shouldn't be covering for them.
func checkIfBooleanFlag(boolVals []*sparkVal, args []string, idx int) (isBool, skipNext bool) {
// check if current is a boolean flag (eg --supervise) but skip any accompanying boolean values (eg --supervise true)
current := args[idx]
for _, boolVal := range boolVals {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again this morning. We should break out of this loop as soon as we have found the matching flag -- either by returning, or by calling break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you go: d0aae3c

)

var keyWhitespaceValPattern = regexp.MustCompile("(.+)\\s+(.+)")
var backslashNewlinePattern = regexp.MustCompile("\\s*\\\\s*\\n\\s+")
var collapseSpacesPattern = regexp.MustCompile(`[\s\p{Zs}]{2,}`)

// AcceptedSparkAppExtensions lists the accepted extensions for Spark application artifacts
var AcceptedSparkAppExtensions = []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be public? Also, we could make the list local to the only function where it is used.

Copy link
Contributor Author

@samvantran samvantran Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry had a Scala slip of the brain there, forgot caps make go code public. Was aiming to denote it as a constant 🙃
fixed in 7b2bb6c

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @samvantran! Glad we can put this to bed.

@samvantran samvantran force-pushed the DCOS-38138-shell-escape branch from 7c9f1a9 to 7b2bb6c Compare August 31, 2018 20:17
@samvantran samvantran merged commit bd707ac into master Aug 31, 2018
@samvantran samvantran deleted the DCOS-38138-shell-escape branch August 31, 2018 22:00
samvantran added a commit that referenced this pull request Aug 31, 2018
This commit makes it possible to pass multi-argument strings configuration strings in the spark-cli.
samvantran added a commit that referenced this pull request Sep 18, 2018
Backport of #388 && #405

This commit makes it possible to pass multi-argument configuration strings in the spark-cli.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants