0

I am passing a jar file to my PySpark-submit command.Now after the shell has started,I want to import a Scala class that is present inside the jar into my PySpark Shell.This happens easily when I am try to do it in the spark shell (which uses Scala) but dose not happen when I try it from the spark shell. I want to pass this scala class as a data format to the pyspark command to read a file like this:

my_df = spark.read.format("path_to_my_scala_class").................load("myfile")

So in order to achieve this I tried importing the class into my pyspark shell first

Situation 1:Successfully importing/loading the class in spark-shell I started my spark shell by writing this command:

spark --jars [myjar]

Once the spark shell started I wrote this command to import the scala class

scala>import [my_scala_classname]

This works perfectly

Situation 2: Not able to import/load the scala class from the jar into my pyspark shell

However this dose not work in Pyspark: I started my pyspark shell by passing the jar and wrote the same command:

>>>import [my_scala_classname]

I get an error on trying this: Module not found error: No module named [myscalaclassname]

I googled about this and found this question:How to use a Scala class inside Pyspark

This is not working. How can I import my scala class from my jar file then into my pyspark shell?

0 Answers0