I am passing a jar file to my PySpark-submit command.Now after the shell has started,I want to import a Scala class that is present inside the jar into my PySpark Shell.This happens easily when I am try to do it in the spark shell (which uses Scala) but dose not happen when I try it from the spark shell. I want to pass this scala class as a data format to the pyspark command to read a file like this:
my_df = spark.read.format("path_to_my_scala_class").................load("myfile")
So in order to achieve this I tried importing the class into my pyspark shell first
Situation 1:Successfully importing/loading the class in spark-shell I started my spark shell by writing this command:
spark --jars [myjar]
Once the spark shell started I wrote this command to import the scala class
scala>import [my_scala_classname]
This works perfectly
Situation 2: Not able to import/load the scala class from the jar into my pyspark shell
However this dose not work in Pyspark: I started my pyspark shell by passing the jar and wrote the same command:
>>>import [my_scala_classname]
I get an error on trying this: Module not found error: No module named [myscalaclassname]
I googled about this and found this question:How to use a Scala class inside Pyspark
This is not working. How can I import my scala class from my jar file then into my pyspark shell?