0

I am trying this in databricks . Please let me know the pyspark libraries needed to be imported and code to get the below output in Azure databricks pyspark

example:- input dataframe :-

|     column1     |    column2    | column3  |  column4  |

| a               | bbbbb         | cc       | >dddddddd |
| >aaaaaaaaaaaaaa | bb            | c        | dddd      |
| aa              | >bbbbbbbbbbbb | >ccccccc | ddddd     |
| aaaaa           | bbbb          | ccc      | d         |

output dataframe :-

| column  | maxLength |

| column1 |        14 |
| column2 |        12 |
| column3 |         7 |
| column4 |         8 |
VIGNESH R
  • 5
  • 3

1 Answers1

4
>>> from pyspark.sql import functions as sf
>>> df = sc.parallelize([['a','bbbbb','ccc','ddd'],['aaaa','bbb','ccccccc', 'dddd']]).toDF(["column1", "column2", "column3", "column4"])
>>> df1 = df.select([sf.length(col).alias(col) for col in df.columns])
>>> df1.groupby().max().show()
+------------+------------+------------+------------+
|max(column1)|max(column2)|max(column3)|max(column4)|
+------------+------------+------------+------------+
|           4|           5|           7|           4|
+------------+------------+------------+------------+

then use this link to melt previous dataframe

E.ZY.
  • 556
  • 3
  • 10
  • Thanks this worked and also taking less time to execute :) :) – VIGNESH R Nov 04 '20 at 12:34
  • @VIGNESHR Glad to know that your issue has resolved. You can accept it as an answer( click on the check mark beside the answer to toggle it from greyed out to filled in). This can be beneficial to other community members. Thank you. – CHEEKATLAPRADEEP-MSFT Nov 09 '20 at 06:43