1
val rdd = spark.read.format("csv")
                    .option("delimiter","\t").option("header", "false") 
                    .csv("/mnt/adls/myDb/myTb/s_year_month=201806/s_day=10")

Now this reads in data for a specific partition (20180610). Is there a way I can read all partitions in the myTb folder into one rdd? So it can later by accessed like this

SELECT * FROM  myDb.myTb WHERE (CONCAT(s_year_month, s_day) = '20180610')

If I did just wild card read, it would lose the partitioning aspect.

Alper t. Turker
  • 32,514
  • 8
  • 78
  • 112
test acc
  • 501
  • 2
  • 11
  • 21

0 Answers0