I am trying to join two dataframes with the same column names and compute some new values. after that i need to drop all columns of second table. The number of columns is huge. How can I do it in easier way? I tried to .drop("table2.*"),but this dont work.
Asked
Active
Viewed 3,284 times
5
-
1even `.drop("table2.specificColumnName")` doesn't work; forget `.drop("table2.*")`. – Bikash Gyawali Jul 02 '19 at 16:17
-
1Could someone explain why `drop("foo.column")` doesn't work? – wrschneider Aug 18 '21 at 13:53
2 Answers
4
You can use select with aliases:
df1.alias("df1")
.join(df2.alias("df2"), Seq("someJoinColumn"))
.select($"df1.*", $"someComputedColumn", ...)
reference with the parent DataFrame:
df1.join(df2, Seq("someJoinColumn")).select(df1("*"), $"someComputedColumn", ...)
zero323
- 305,283
- 89
- 921
- 912
0
Instead of dropping, you can select all the necessary columns that you want hold for further operations something like below
val newDataFrame = joinedDataFrame.select($"col1", $"col4", $"col6")
Prasad Khode
- 6,303
- 11
- 42
- 57
-
Its not a case, if I have like 50 columns + 50 columns in second table. Can i select "table1.*" + names of new columns – Mike Feb 21 '17 at 10:13