9

I am trying to submit spark-submit but its failing with as weird message.

 Error: Could not find or load main class org.apache.spark.launcher.Main
 /opt/spark/bin/spark-class: line 96: CMD: bad array subscript

this is the first time I am seeing this kind of error. I tried to check the code for the spark-class file but unable to decipher what is causing the issue.

# Turn off posix mode since it does not allow process substitution
set +o posix
CMD=()
DELIM=$'\n'
CMD_START_FLAG="false"
while IFS= read -d "$DELIM" -r ARG; do
  if [ "$CMD_START_FLAG" == "true" ]; then
    CMD+=("$ARG")
  else
    if [ "$ARG" == $'\0' ]; then
      # After NULL character is consumed, change the delimiter and consume command string.
      DELIM=''
      CMD_START_FLAG="true"
    elif [ "$ARG" != "" ]; then
      echo "$ARG"
    fi
  fi
done < <(build_command "$@")

COUNT=${#CMD[@]}
LAST=$((COUNT - 1))
LAUNCHER_EXIT_CODE=${CMD[$LAST]}

the line which is mentioned in the error message is

LAUNCHER_EXIT_CODE=${CMD[$LAST]}

Any pointer or any idea why the issue will help me a lot.

Thanks

Jonas V
  • 332
  • 2
  • 12
Ashit_Kumar
  • 521
  • 2
  • 9
  • 24
  • 1
    @hatefAlipoor yeah I was able to solve the issue by providing an entry point for the code to start – Ashit_Kumar Sep 10 '20 at 15:37
  • Would you mind clarifying what you mean by "providing an entry point for the code to start"? I seeing the same problem. This is what I'm seeing in my terminal: "$ pyspark set PYSPARK_SUBMIT_ARGS="--name" "PySparkShell" "pyspark-shell" && python C:\Users\oefel\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyspark/bin/spark-class: line 96: CMD: bad array subscript" Is this what you were seeing? – Erin Sep 14 '20 at 21:53
  • @Erin can you provide a brief description what you are trying to do ? Looks like you are simply submitting a spark-submit job in a windows machine. Can you ensure the spark required paths are set. Mine error was in different context it was from a pod in which I was trying to execute my job. – Ashit_Kumar Sep 15 '20 at 08:28
  • I'm looking to correctly set up the SPARK_HOME env variable so as to use pyspark within a Jupyter Notebook. I've set env variables by Max' comment in this post: https://stackoverflow.com/questions/38798816/pyspark-command-not-recognised . I'm now seeing: "/c/spark/spark-2.4.7-bin-hadoop2.7/spark-2.4.7-bin-hadoop2.7/bin/pyspark: line 24: C:spark\spark-2.4.7-bin-hadoop2.7/bin/load-spark-env.sh: No such file or directory /c/spark/spark-2.4.7-bin-hadoop2.7/spark-2.4.7-bin-hadoop2.7/bin/pyspark: line 77: C:spark\spark-2.4.7-bin-hadoop2.7/bin/spark-submit: No such file or directory" – Erin Sep 15 '20 at 13:46
  • I am getting this simply by trying to run the pyspark shell in the terminal. Any thoughts? It looks like I'm trying to do something similar to what Erin is doing. – jkix May 18 '21 at 17:55
  • I am encountering the same issue while trying to run pyspark in a cygwin enviroment. My environment variables are: export SPARK_PYTHON=python export PYSPARK_PYTHON=python export SPARK_HOME='c:/spark/spark-3.1.2' export PATH="$SPARK_HOME/bin:$PATH" export JAVA_HOME='c:/Program Files/Java/jre1.8.0_301' I get the following output: set PYSPARK_SUBMIT_ARGS="--name" "PySparkShell" "pyspark-shell" && python c:/spark/spark-3.1.2/bin/spark-class: line 98: CMD: bad array subscript – Amit Gupta Aug 26 '21 at 20:01

1 Answers1

0

When I faced exactly the same problem, I looked in a bin directory of my Spark setup which is in your Windows PATH variable and when you run spark-submit <arguments> CMD searches for this command in Spark's bin. This directory can be found by executing echo %SPARK_HOME%\bin in CMD. And I saw two copies for every executable bin directory screenshot. This makes sense because we need different executable scripts for Windows and Linux. So finally I just typed spark-submit.cmd instead of spark-submit and everything worked like expected.

LinFelix
  • 550
  • 1
  • 11
  • 21
tia
  • 1
  • 1
  • 1