0

is there an equivalent on pyspark that allow me to do similar operation as in Pandas

pd.contact(df1, df2, Axis=1)

I have tried several methods so far none of them seems to work. the concatenation that it does is vertical, and I'm needing to concatenate multiple spark dataframes into 1 whole dataframe.

if I use union or unionAll it the dataframes get stacked vertically, as one single column which is not useful for my use case. I also have tried this example (did not work either):

from functools import reduce  
from pyspark.sql import DataFrame

def unionAll(*dfs):
     return reduce(DataFrame.unionAll, dfs) 

any help will be greatly appreciated.

  • Does this answer your question?https://stackoverflow.com/questions/49763009/stack-spark-dataframes-horizontally-equivalent-to-pandas-concat-or-r-cbind – 过过招 Feb 11 '22 at 01:55
  • Does this answer your question? [Stack Spark dataframes horizontally - equivalent to pandas concat or r cbind](https://stackoverflow.com/questions/49763009/stack-spark-dataframes-horizontally-equivalent-to-pandas-concat-or-r-cbind) – blackbishop Feb 11 '22 at 09:40
  • Thank you, I see now there isn't a simplified way to do it. as Pandas will handle that piece. besides converting between pandas and pyspark just crashes everything. again thank you both. the post was really helpful on this matter. – Wendy Velasquez Feb 11 '22 at 14:58
  • I did find a way to join multiple spark dataframes though. using crossJoin function. – Wendy Velasquez Feb 13 '22 at 21:19

0 Answers0