0

I have a dataframe

|--id:string (nullable = true)
|--ddd:struct (nullable = true)
  |-- aaa: string (nullable = true)
  |-- bbb: long(nullable = true)
  |-- ccc: string (nullable = true)
  |-- eee: long(nullable = true)

I am having output like this

 id     |  ddd
--------------------------
   1    | [hi,1,this,2]
   2    | [hello,6,good,3]
   1    | [hru,2,where,7]
   3    | [in,4,you,1]
   2    | [how,4,to,3]

I want the expected o/p as:

   id   |  ddd
  --------------------
   1    | [hi,1,this,2],[hru,2,where,7]
   2    | [hello,6,good,3],[how,4,to,3]
   3    | [in,4,you,1]

Please help

zero323
  • 305,283
  • 89
  • 921
  • 912
gayathri
  • 75
  • 2
  • 9

1 Answers1

8

you can collect_list as following

import org.apache.spark.sql.functions._
df.groupBy("id").agg(collect_list("ddd").as("ddd"))

collect_set works as well

df.groupBy("id").agg(collect_set("ddd").as("ddd"))
Ramesh Maharjan
  • 39,304
  • 6
  • 61
  • 89