0

I am trying to replace many columns at a time using pyspark, I am able to do it using below code, but it is iterating for each column name and when I have 100s of columns it is taking too much time.

from pyspark.sql.functions import col, when
df = sc.parallelize([(1,"foo","val","0","0","can","hello1","buz","oof"), 
                     (2,"bar","check","baz","test","0","pet","stu","got"), 
                     (3,"try","0","pun","0","you","omg","0","baz")]).toDF(["col1","col2","col3","col4","col5","col6","col7","col8","col9"])
df.show()

columns_for_replacement = ['col1','col3','col4','col5','col7','col8','col9']
replace_form = "0"
replace_to = "1"
for i in columns_for_replacement:
    df = df.withColumn(i,when((col(i)==replace_form),replace_to).otherwise(col(i)))
df.show()

Can any one suggest how to replace all selected columns at once?

K Soumya
  • 7
  • 2
  • A similar question was already [asked](https://stackoverflow.com/questions/55643713/pyspark-replace-value-in-several-column-at-once). Probably you can find your solution there. – Francesco May 30 '22 at 13:59

0 Answers0