Pyspark: pass multiple columns in pandas_udf

Asked May 20 '20 at 13:55

Active May 20 '20 at 14:56

Viewed 158 times

My problem is similar to this one but instead of udf I need to use pandas_udf.

I have a spark data frame with many columns (number of columns varies) and I need to apply on them a custom function (for example sum). I know I can hard-code column names but it does not work when the number of columns varies.

Please see examples:

edited May 20 '20 at 14:56

Alexandre B.

5,092
2
14
38

asked May 20 '20 at 13:55

Grzegorz

Pyspark: pass multiple columns in pandas_udf

0 Answers0