I'm struggling with creating a schema for a file (comma delimited) I need to load into Hive. Content looks something like this - First few columns have perfect values, sorted nicely:
2021-09-13,11111111,111111,2244,2186,xxxxx,xxxxx,2000106,xxx,2018-06-25 10:54:54,2018-06-25 07:24:00,2021-09-13 01:28:00,0,CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:N,,,false,
Then, there's a column with a huge chunk of data with end-of-line characters, commas and what not. It is wrapped in quotes:
"1. Navigate to the following URL:
https://sample.com/home.html
2. Review the server HTTP response headers:
HTTP/1.1 200 OK
Server: xxxxxxxxxxxxxxx
Pragma: No-cache
Cache-Control: no-cache, public
Expires: xxxxxxxxxxxxxxxxxxx
Content-Length: 2911
X-Cnection: close
Content-Type: text/html;charset=UTF-8
Vary: Accept-Encoding
Date: xxxxxxxxxxxxxxxxx
Connection: close
Set-Cookie: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; xxxxxxxxxxxxxxxxxxxxxx
Set-Cookie: bm_sv=xxxxxxxxxxxxxxxxxxxxxxxx=; Domain=.xxxxxx.com; Path=/; Max-Age=4737; HttpOnly
3. Note that neither the ""X-Frame-Options"" nor ""frame-ancestors"" headers appear to be present","xxxxxxxxxxxxxxxxxxxxxxxxx.",xxxxxxxxxxxxxxxxxx,"Missing ""X-Frame-Options""",443,6,http,"xxxm.
xxxxxxxxxxxxxxxx.",Information,false,"To remediate this issue, (re)configure the web application to use xxxxxx ""self"" :
Content-Security-Policy: frame-ancestors 'self' xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxx."
And then the rest of it:
,12.0,2021-09-13 13:03:49,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Could anyone please advise how to set the DDL (LazySimpleSerDe, OpenCSVSerDe, RegexSerDe)?
Thanks in advance, Gal