Spark: How to transpose dynamic columns with rows of different size

Question

How to transpose dynamic columns with rows of different size in PySpark?

I'm phrasing jsons to spark dataframe with data structure below:

Static names: data, label, units(can be hard-coded)
Dynamic names: data_1_a, data_1_b , name_1_a , name_1_b, name_10000_xyz (cannot be hard-coded as they are up to 10000 names / data sub categories)

I tried solutions below in useful links section, however they deal with transformations where columns rows have the same size.

Input df1:

root
 |-- id: long (nullable = true)
 |-- data_1_a: struct (nullable = true)
 |    |-- name_1_a: struct (nullable = true)
 |    |    |-- label: string (nullable = true)
 |    |    |-- unit: string (nullable = true)
 |-- data_1_b: struct (nullable = true)
 |    |-- name_10000_xyz: struct (nullable = true)
 |    |    |-- label: string (nullable = true)
 |    |-- name_1_b: struct (nullable = true)
 |    |    |-- label: string (nullable = true)
    
    
+---+-----------------------+--------------------------+
|id |data_1_a               |data_1_b                  |
+---+-----------------------+--------------------------+
|1  |{{label_1_a, unit_1_a}}|{{10000_xyz}, {label_1_b}}|
+---+-----------------------+--------------------------+

Required output df:

+---+---------+----------------+-----------+----------+
|id |data     |  name          |   label    | unit    |
+---+---------+----------------+-----------+----------+
|1  |data_1_a | name_1_a       | label_1_a | unit_1_a |
|1  |data_1_b | name_1_b       | label_1_b | null     |
|1  |data_1_b | name_10000_xyz | 10000_xyz | null     |
+---+---------+----------------+-----------+----------+

To reproduce input df:

json1 = """{"id":1,"data":{"data_1_a":{"name_1_a":{"label":"label_1_a","unit":"unit_1_a"}},"data_1_b":{"name_1_b":{"label":"label_1_b"},"name_10000_xyz":{"label":"10000_xyz"}}}}"""
df = spark.read.json(sc.parallelize([json1]))
df1 = df.select('id','data.*')

Useful links:

Spark: How to transpose dynamic columns with rows of different size

0 Answers0