I am trying to replace many columns at a time using pyspark, I am able to do it using below code, but it is iterating for each column name and when I have 100s of columns it is taking too much time.
from pyspark.sql.functions import col, when
df = sc.parallelize([(1,"foo","val","0","0","can","hello1","buz","oof"),
(2,"bar","check","baz","test","0","pet","stu","got"),
(3,"try","0","pun","0","you","omg","0","baz")]).toDF(["col1","col2","col3","col4","col5","col6","col7","col8","col9"])
df.show()
columns_for_replacement = ['col1','col3','col4','col5','col7','col8','col9']
replace_form = "0"
replace_to = "1"
for i in columns_for_replacement:
df = df.withColumn(i,when((col(i)==replace_form),replace_to).otherwise(col(i)))
df.show()
Can any one suggest how to replace all selected columns at once?