automatically inferring column list for pivot in spark-sql

Question

in spark-sql is there any way to automatically infer distinct column values in pivot operator

data_list = \[("maaruti",1000,"hyderabad"),
("tata",2000,"mumbai"),
("hyundai",1500,"delhi"),
("mahindra",1200,"chennai"),
("maaruti",1200,"mumbai"),
("tata",1000,"delhi"),
("hyundai",2000,"chennai"),
("mahindra",1500,"hyderabad"),
("tata",1100,"delhi"),
("mahindra",1200,"chennai")

            ]

df=spark.createDataFrame(data_list).toDF("company", "sales", "city",)
df.show()

dataframe approach

df.groupby("company","city").sum("sales").groupby("company").pivot("city").sum('sum(sales)').show()

here distinct values in city column are automatically infered

spark-sql approach

df.createOrReplaceTempView("tab")
spark.sql("""
select \* from tab pivot(sum(sales) as assum
for city in ('delhi','mumbai','hyderabad','chennai'))
""").show()

above snippet gives desired output however the column list is need to be specified manually for distinct city column values .is there any way to automatically do this

This needs dynamic SQL, ie stored functions.does this help: https://stackoverflow.com/questions/30244910/how-to-pivot-spark-dataframe — , Mar 27 '22 at 08:18

automatically inferring column list for pivot in spark-sql

dataframe approach

spark-sql approach

0 Answers0