0

I have a dataframe with single row and multiple columns. I would like it to convert it into multiple rows. I had found a similar question here on the stackoverflow.

The question answers how it can be done in scala but I wanted to do this in pyspark. I tried to replicate the code in pyspark but I wasn't able to do that.

I am not able to convert the below code in scala to python:

import org.apache.spark.sql.Column
var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => {Array(lit(c), col(c))}}
val df2 = df1.withColumn("myMap", map(ColumnsAndValues: _*))
blackbishop
  • 26,760
  • 8
  • 50
  • 69
Nikunj Kakadiya
  • 2,219
  • 2
  • 18
  • 30

1 Answers1

1

In Pyspark you can use create_map function to create map column. And a list comprehension with itertools.chain to get the equivalent of scala flatMap :

import itertools
from pyspark.sql import functions as F

columns_and_values = itertools.chain(*[(F.lit(c), F.col(c)) for c in df1.columns])
df2 = df1.withColumn("myMap", F.create_map(*columns_and_values))
blackbishop
  • 26,760
  • 8
  • 50
  • 69