0

I want to implement the below logic in Azure databricks using pyspark. I have a below file which has multiple sheets in it. the file is present on adls gen 2. I want to read the data of all sheets into a different file and write the file to some location in adls gen 2 itself.

Note: All sheet has same schema ( Id, Name)

My final output file should have data from all the sheets. Also I need to create an additional column which stores the sheetName info

enter image description here

enter image description here

Alex Ott
  • 64,084
  • 6
  • 72
  • 107
amikm
  • 37
  • 1
  • 6

1 Answers1

0

You can use the following logic

  • Using Pandas to read multiple worksheets of the same workbook link
  • concat the multiple dataframes in Pandas and make it single data frame link
  • Convert the Panda dataframe into pyspark dataframe .link
  • Apply Business logic which you want to implement.