0

I was wondering if anyone might have any ideas on the following as I am stumped.

I'm a beginner learner of pyspark and this is what I have done so far:

from pyspark.sql import SparkSession

spark_ex = SparkSession.builder.getOrCreate()
print(spark_ex)

spark_ex.catalog.listTables()

import pandas as pd

df1=pd.read_csv('train.csv')

df2=(df1.iloc[:, 0:4],[4])

No I'm trying to do the following: Pass in the df2 frame into a variable called 'spark_df' using your spark_ex and .createDataFrame

I have tried many different options but nothing seems to be working. Here is one option I thought would work but isn't either :

spark_df = spark.createDataFrame(spark_ex).toDF(df2)

If anyone has any ideas I would really appreciate it!

vnk
  • 1,044
  • 1
  • 5
  • 17

0 Answers0