0

I am trying to read a set of XML files nested in many folders into sequence files in spark. I can read the file names using function recursiveListFiles from How do I list all files in a subdirectory in scala?.

import java.io.File
def recursiveListFiles(f: File): Array[File] = {
 val these = f.listFiles
 these ++ these.filter(_.isDirectory).flatMap(recursiveListFiles)
}

But how to read the file content as separate column here?

Nicktar
  • 5,433
  • 1
  • 26
  • 42
VSr
  • 919
  • 2
  • 11
  • 27

1 Answers1

0

What about using sparks wholeTextFiles method? And parsing the XML yourself afterwards?

Georg Heiler
  • 15,554
  • 27
  • 140
  • 255
  • I tried the wholeTextFiles method but I cant use .xml that is only to select the xml files in the folders. something like `sc.wholeTextFiles("mainpath/*.xml")` – VSr Dec 06 '19 at 08:59