0

in spark-sql is there any way to automatically infer distinct column values in pivot operator

data_list = \[("maaruti",1000,"hyderabad"),
("tata",2000,"mumbai"),
("hyundai",1500,"delhi"),
("mahindra",1200,"chennai"),
("maaruti",1200,"mumbai"),
("tata",1000,"delhi"),
("hyundai",2000,"chennai"),
("mahindra",1500,"hyderabad"),
("tata",1100,"delhi"),
("mahindra",1200,"chennai")

            ]

df=spark.createDataFrame(data_list).toDF("company", "sales", "city",)
df.show()

dataframe approach

df.groupby("company","city").sum("sales").groupby("company").pivot("city").sum('sum(sales)').show()

here distinct values in city column are automatically infered

spark-sql approach

df.createOrReplaceTempView("tab")
spark.sql("""
select \* from tab pivot(sum(sales) as assum
for city in ('delhi','mumbai','hyderabad','chennai'))
""").show()

above snippet gives desired output however the column list is need to be specified manually for distinct city column values .is there any way to automatically do this

thebluephantom
  • 14,410
  • 8
  • 36
  • 67
  • 1
    This needs dynamic SQL, ie stored functions.does this help: https://stackoverflow.com/questions/30244910/how-to-pivot-spark-dataframe –  Mar 27 '22 at 08:18

0 Answers0