5

I am trying to join two dataframes with the same column names and compute some new values. after that i need to drop all columns of second table. The number of columns is huge. How can I do it in easier way? I tried to .drop("table2.*"),but this dont work.

Mike
  • 197
  • 1
  • 2
  • 7

2 Answers2

4

You can use select with aliases:

df1.alias("df1")
  .join(df2.alias("df2"), Seq("someJoinColumn"))
  .select($"df1.*", $"someComputedColumn", ...)

reference with the parent DataFrame:

df1.join(df2, Seq("someJoinColumn")).select(df1("*"), $"someComputedColumn", ...)
zero323
  • 305,283
  • 89
  • 921
  • 912
0

Instead of dropping, you can select all the necessary columns that you want hold for further operations something like below

val newDataFrame = joinedDataFrame.select($"col1", $"col4", $"col6")
Prasad Khode
  • 6,303
  • 11
  • 42
  • 57
  • Its not a case, if I have like 50 columns + 50 columns in second table. Can i select "table1.*" + names of new columns – Mike Feb 21 '17 at 10:13