I'm sinking from Kinesis to S3 through Firehose and the data appears in S3, but reading the data back isn't easy.
I'm posting multiline JSON messages to Kinesis and the problem is when I try to read the data from the S3 file.
If I post 2 messages:
{
'foo': 1
}
and then
{
'foo': 2
}
I will get a file in S3 with the below contents:
{
'foo': 1
}{
'foo': 2
}
This isn't valid JSON - to be valid it would need to have a comma between the objects and be wrapped in [ ... ]. So I can't just read as a json file.
I expect the transformation required to end up with valid JSON would be difficult.
But, can/should I have a transformation in Firehose so the JSON is compacted to one line? I'm very conscious that as my message don't have a trailing /n the records end and start on the same line above. With that solution, each line would be valid JSON and I can use line breaks to separate the objects. So, I'd get something like:
{'foo': 1}
{'foo': 2}
So, the question is - how do I do this kind of transformation, or is there a best practice that I'm missing in how to better sink JSON messages from Kinesis to S3 such that reading the data back out is easier?
For context, I've a few instances of this architecture where I've used custom CDK, the AWS Solutions Construct and just configured this architecture with the Console.