I am trying to read this json (that I get from an api) with pypsark:
[{'DBName': 'db1', 'NameEvent': 'event1', 'status': 'NEVER', 'Date': None},
{'DBName': 'db2','NameEvent': 'event2', 'status': 'ON TIME', 'Date': '2022-05-
13T15:09:58.798'}]
to do so, here my code :
import requests
r = requests.get('https://api_to_file.com')
rdd = sc.parallelize(r.json())
table = spark.read.json(rdd)
the problem is, I have a "_corrupt_record" for every records with a None value for the 'date' key.
Do you know how can I deal with this ?
I already try to add options that I found here https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameReader.json.html , and it still don't works.
Pyspark version +2.4