0

In R, I am trying to parse a column of JSON code that is embedded in a .csv file. The JSON code has variable length, depending on which key was pressed in a mobile app. The length can be NULL ("[]") to more than 10 variables. I need to keep the JSON information with the other variables on the same line data. The JSON does not include any unique identifying information (like an ID) that would let me pull the JSON information out of the table, process it, and stick it back into the table. Is there a way to parse the JSON data into new variables as I'm reading in each data line while also filling the table with the (character & double) variables? Below is a few lines of the data. JSON variable = metadata. There are over 100 different JSON arrays over 2.25 million lines of data.

I have tried a number of things.

First, trying to convert each line JSON format (toJSON) then convert back (fromJSON).

        content <- readLines(AIMS_in_filename, n = 30)
        metadataJSON <- apply(content, to JSON)
        res <- lappy(metadataJSON, fromJSON)

I tried various ways to try to pull out just the JSON info (metadata variable) but I couldn't figure out how to reintegrate that back into the table and make sure the data points were matched correctly.

When I tried to convert from JSON for just the metadata variable, I got an error:

        res <- lapply(metadataJSON$metadata, fromJSON)

$ operator is invalid for atomic vectors

I've looked through the other answers, and I cannot extend the solutions to work with the one JSON array stored among the other variables in the .csv file.

Here was a solution that I thought would work, but I couldn't adapt to my specific case. I'm new to this. If you can point me in a direction, I would be deeply in your debt! 4 days, and I haven't made much progress. thank you!

from How to convert a json file without the same length of json objects into csv

file-without-the-same-length-of-json-objects-into-csv

        library(jsonlite)
        library(data.table)

# tell fromJSON we want a list back

        json_data <- fromJSON("in.json", simplifyDataFrame=FALSE)

# iterate over the list we have so we can "flatten" it then
# covert it back to a data.frame-like object

        dat <- rbindlist(lapply(json_data, function(x) {
           as.list(unlist(x))
           }), fill=TRUE)

example lines of the data, including column names on first line

id,invite_code,subject_id,metadata,session_id,platform,event,timestamp
1524715,6a82077651b30363,public,[],6a82077651b30363,android,app_opening,"1970-01-01 00:00:42.000"
1524717,6a82077651b30363,public,"{""time"":6,""screen_name"":""Home""}",6a82077651b30363,android,Home,"1970-01-01 00:00:51.000"
1524721,6a82077651b30363,public,"{""time"":3,""actions"":[{""name"":""Deep Breathing""}],""screen_name"":""Home""}",6a82077651b30363,android,Home,"1970-01-01 00:00:59.000"
1524722,6a82077651b30363,public,"{""time"":2,""parent_name"":""Home"",""screen_name"":""Deep Breathing Intro""}",6a82077651b30363,android,"Deep Breathing Intro","1970-01-01 00:01:02.000"
2223763,8354dd850b128cf2,public,"{""screen_name"":""Home"",""time"":-619314987}",8354dd850b128cf2,android,Home,"2001-10-29 12:43:11.000"
273381,b4f0baf6ecf1aa9e,public,"{""time"":5,""actions"":[{""name"":""Deep Breathing""}],""screen_name"":""Home""}",b4f0baf6ecf1aa9e,android,Home,"2010-01-01 02:55:10.000"
neuron
  • 1,775
  • 1
  • 11
  • 24
Nic
  • 1
  • your sample data does not really look like a valid csv? There are no linebreaks for example. – Wimpel Nov 16 '21 at 07:47

0 Answers0