I was wondering if anyone might have any ideas on the following as I am stumped.
I'm a beginner learner of pyspark and this is what I have done so far:
from pyspark.sql import SparkSession
spark_ex = SparkSession.builder.getOrCreate()
print(spark_ex)
spark_ex.catalog.listTables()
import pandas as pd
df1=pd.read_csv('train.csv')
df2=(df1.iloc[:, 0:4],[4])
No I'm trying to do the following: Pass in the df2 frame into a variable called 'spark_df' using your spark_ex and .createDataFrame
I have tried many different options but nothing seems to be working. Here is one option I thought would work but isn't either :
spark_df = spark.createDataFrame(spark_ex).toDF(df2)
If anyone has any ideas I would really appreciate it!