Tuesday, July 17, 2018

Spark CLI nohup


You may package a JAR and use spark-submit to submit the code to Spark. But sometimes you want to hack and sometimes this hack might be long running query. How do you keep spark-shell running after you have gone home?

It took me some fiddling but this works (with a bit of help from StackExchange).

In Unix shell #1, do:

mkfifo my_pipe
nohup spark-shell YOUR_CONFIG < my_pipe > YOUR_OUTPUT_FILE

Now in Unix shell #2, do:

nohup cat YOUR_SPARK_SCALA > my_pipe 2>/dev/null &

You should now see the Spark shell jump into life.

Now, back in shell #1, press CTRL+z, type:

jobs

identify your application's job ID and type

bg JOB_ID

Alternatively, following the advice in this StackOverflow answer, you can, CTRL+z, find the JOB_ID from jobs, then bg as above before calling:

disown -h %JOB_ID

You may now logoff and go home.

No comments:

Post a Comment