0

How to parse and flatten a nested JSON available in a Hive/Hbase column using spark scala?

Example:

A hive table is having a column "c1" with following json

{
    "fruit": "Apple",
    "size": "Large",
    "color": "Red",
    "Lines": [{
            "LineNumber": 1,
            "Text": "ABC"
        },
        {
            "LineNumber": 2,
            "Text": "123"
        }
     ]
}

I want to parse this json and create a dataframe to contain columns and values like this
+------+------+-------+------------+------+
|fruit | size | color | LineNumber | Text |
+------+------+-------+------------+------+
|Apple | Large| Red   | 1          | ABC  |
|Apple | Large| Red   | 2          | 123  |
+------+------+-------+------------+------+

Appreciate any thoughts. Thanks!

Lux
  • 65
  • 1
  • 7

3 Answers3

0

Convert your json to String using mkstring and then use following code

val otherFruitRddRDD = spark.sparkContext.makeRDD( """{"Fruitname":"Jack","fruitDetails":{"fruit":"Apple","size":"Large"}}""" :: Nil)

val otherFruit = spark.read.json(otherFruitRddRDD)

otherFruit.show()

Vivek
  • 395
  • 3
  • 13
0
 val df = spark.read.json("example.json")

You can find detail examples on following link

vaquar khan
  • 9,473
  • 4
  • 64
  • 86
0

I think you need a method like this :

df.select(from_json($"c1", schema))

schema will be Struct Type and will contain the structure of the json for you it will be a.Fruit b.size c.color

Subhasish Guha
  • 212
  • 1
  • 6
  • I tried from_json but it's a nested json and "from_json" is increasing complexity when we want to parse each element and create a schema for each element in the json. – Lux Apr 17 '19 at 13:08
  • you can try a jackson complex json parser and use the json column as a String in a UDF and flatten it – Subhasish Guha Apr 18 '19 at 07:41