Pyspark: Select all columns except particular columns

Question

I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?

`df.select([c for c in df.columns if c not in {'GpuName','GPU1_TwoPartHwID'}])` — vvg, Jun 13 '18 at 14:18
Possible duplicate of [How to exclude multiple columns in Spark dataframe in Python](https://stackoverflow.com/questions/35674490/how-to-exclude-multiple-columns-in-spark-dataframe-in-python) — vvg, Jun 13 '18 at 14:18

Tshilidzi Mudau · Accepted Answer · 2018-12-06T04:37:00.640

45

In the end, I settled for the following :

Drop:

df.drop('column_1', 'column_2', 'column_3')
Select :

df.select([c for c in df.columns if c not in {'column_1', 'column_2', 'column_3'}])

edited Dec 06 '18 at 04:37

answered Sep 04 '18 at 07:05

Tshilidzi Mudau

6,512
6
35
45

score 0 · Answer 2 · answered Sep 13 '21 at 17:04

0

df.drop(*[cols for cols in [list of columns to drop]])

Useful if the list to drop columns is huge. or if the list can be derived programmatically.

answered Sep 13 '21 at 17:04

martand

3
2

Pyspark: Select all columns except particular columns

2 Answers2