in spark-sql is there any way to automatically infer distinct column values in pivot operator
data_list = \[("maaruti",1000,"hyderabad"),
("tata",2000,"mumbai"),
("hyundai",1500,"delhi"),
("mahindra",1200,"chennai"),
("maaruti",1200,"mumbai"),
("tata",1000,"delhi"),
("hyundai",2000,"chennai"),
("mahindra",1500,"hyderabad"),
("tata",1100,"delhi"),
("mahindra",1200,"chennai")
]
df=spark.createDataFrame(data_list).toDF("company", "sales", "city",)
df.show()
dataframe approach
df.groupby("company","city").sum("sales").groupby("company").pivot("city").sum('sum(sales)').show()
here distinct values in city column are automatically infered
spark-sql approach
df.createOrReplaceTempView("tab")
spark.sql("""
select \* from tab pivot(sum(sales) as assum
for city in ('delhi','mumbai','hyderabad','chennai'))
""").show()
above snippet gives desired output however the column list is need to be specified manually for distinct city column values .is there any way to automatically do this