QUESTION
Given the below context, how do I resolve this exception I receive when I run this code?
SETUP
I am new to python and spark/pyspark and working on a legacy code base that uses:
- Python 3.6
- Spark 2.4.1
- Pyspark 2.4.3.
I am running the code locally with Monterey(12.1) and and M1 chip.
The code imports sparkSession and creates a new session:
from pyspark.sql import SparkSession
class SomeClass():
spark = SparkSession.builder.getOrCreate()
...Some additional code...
EXCEPTION
At runtime, I am able to import sparkSession but Python throws the following exception when initializing the SparkSession:
[Errno 2] No such file or directory: '/home/[user]/Spark/spark-2.4.1-bin-without-hadoop/./bin/spark-submit': '/home/[user]/Spark/spark-2.4.1-bin-without-hadoop/./bin/spark-submit'
[stack trace...]
This code runs without exception for other users.
CURRENT DEBUGGING
I've verified the file exists:
|--[userHome]
|--Spark
|--spark-2.4.1-bin-without-hadoop
|--hadoop-3.0.0
|--bin
| |--beeline
| |--beeline.cmd
| |--find-spark-home
| |--find-spark-home.cmd
| |--load-spark-env.cmd
| |--pyspark
| |--pyspark.cmd
| |--pyspark2.cmd
| |--run-example
| |--run-example.cmd
| |--spark-class
| |--spark-class.cmd
| |--spark-class2.cmd
| |--spark-shell
| |--spark-class
| |--spark-class.cmd
| |--spark-class
--> | |--spark-submit
--> | |--spark-submit.cmd
--> | |--spark-submit2.cmd
| |--sparkR
| |--sparkR.cmd
| |--sparkR2.cmd
|
|--<Other Directories>
Environment variables are as follows:
SPARK_VERSION=2.4.1
SPARK_HOME=/home/$USER/Spark/spark-${SPARK_VERSION}-bin-without-hadoop/libexec
HADOOP_USER_NAME=$USER
HADOOP_VERSION='3.0.0'
HADOOP_HOME=${SPARK_HOME}/hadoop-${HADOOP_VERSION}
HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop SPARK_DIST_CLASSPATH="${HADOOP_HOME}/etc/hadoop/*:${HADOOP_HOME}/share/hadoop/common/lib/*:${HADOOP_HOME}/share/hadoop/common/*:${HADOOP_HOME}/share/hadoop/hdfs/*:${HADOOP_HOME}/share/hadoop/hdfs/lib/*:${HADOOP_HOME}/share/hadoop/hdfs/*:${HADOOP_HOME}/share/hadoop/yarn/lib/*:${HADOOP_HOME}/share/hadoop/yarn/*:${HADOOP_HOME}/share/hadoop/mapreduce/lib/*:${HADOOP_HOME}/share/hadoop/mapreduce/*:${HADOOP_HOME}/share/hadoop/tools/lib/*"
SPARK_LOCAL_IP=127.0.0.1
PYSPARK_PYTHON=python3.6
PYSPARK_DRIVER_PYTHON=python3.6
When I attempt to run spark from the terminal, permission is denied:
~/Spark/spark-2.4.1-bin-without-hadoop > ./bin
==> zsh: permission denied: ./bin
and when trying to run the spark submit script or run Pyspark form the terminal, I receive an error noting the files don't exist.
~/Spark/spark-2.4.1-bin-without-hadoop > ./bin/spark-submit
==>./bin/spark-submit: line 27: /home/[user]/Spark/spark-2.4.1-bin-without-hadoop/bin/spark-class: No such file or directory
==>./bin/spark-submit: line 27: exec: /home/[user]/Spark/spark-2.4.1-bin-without-hadoop/bin/spark-class: cannot execute: No such file or directory
~/Spark/spark-2.4.1-bin-without-hadoop > ./bin/pyspark
==> ./bin/pyspark: line 24: /home/[user]/Spark/spark-2.4.1-bin-without-hadoop/bin/load-spark-env.sh: No such file or directory
==>./bin/pyspark: line 77: /home/[user]/Spark/spark-2.4.1-bin-without-hadoop/bin/spark-submit: No such file or directory
==>./bin/pyspark: line 77: exec: /home/[user]/Spark/spark-2.4.1-bin-without-hadoop/bin/spark-submit: cannot execute: No such file or directory
I have verified that I have read, write and execute permissions for .bin and .bin/pyspark
drwxr-xr-x 30 [user] [group] 960 [Date/Time] bin
-rwxr-xr-x 1 [user] [group] 2987 [Date/Time] pyspark
PREVIOUS REFERENCES
I have referenced the following articles and have not been able to solve the issue:
Permission denied error when setting up local Spark instance and running pyspark
Why I take "spark-shell: Permission denied" error in Spark Setup?