Get Nested JSON into Pandas Dataframe and CSV

Question

I'm having problems getting the data I need out of this JSON response and into their own columns. The code below produces a df and csv, I cleaned it up a bit to isolate the data I'm after. In the csv image below the data I want is in the third column[trade.stats] On each row in that column there are lists with 6 items in them. I need those to be added to their own columns, in addition some rows have multiple lists in that column I would need that as well. Im really confused on how this would work. Somehow I need to get it so that if a row has multiple lists in that column (like row 4 in the image), those six items are added to their own row. The end product should be a df and csv with rows of of all the data extracted from the third column. Any help here would be appreciated Im confused just trying to write out the problem.

import json, time
from websocket import create_connection
import pandas as pd
   
# start with empty dataframe
df = pd.DataFrame()   

ws = create_connection("wss://ws.kraken.com/")

ws.send(json.dumps({
    "event": "subscribe",
    "pair": ["BTC/USD"],
    "subscription": {"name": "trade"}
}))

timeout = time.time() + 60*1
while time.time() < timeout:
    js = json.loads(ws.recv())
    if isinstance(js, dict):
        df = pd.concat([df, pd.json_normalize(js)])
        #df = df.append([df, pd.json_normalize(js)])
    elif isinstance(js, list):
        #df = df.append([df, pd.json_normalize({"event":"trade",
        df = pd.concat([df, pd.json_normalize({"event":"trade",
                                               "trade":{                                                                                                     
                                                   "s0":js[1][0][0],
                                                   "s1":js[1][0][1], 
                                                   "s2":js[1][0][2],
                                                   "s3":js[1][0][3],                                                
                                                   "s4":js[1][0][4],
                                                   "s5":js[1][0][5],  
                                                   "pair":js[3]}
                                              })
                       ])
    
    else:
         f"unknown socket data {js}"
   #data filters
    df = df[df['event'] != 'systemStatus'] 
    df = df[df['event'] != 'subscriptionStatus']
    df = df[df['event'] != 'heartbeat'] 
    
    
    print(df)
    #time.sleep(1)

#column drop for csv
cols = [0,2,3,4,5,6,7] 
df.drop(df.columns[cols],axis=1,inplace=True)
df.columns =['event','price','volume', 'time', 'side', 'orderType', 'misc', 'pair']
csv_file = "kraken-test4.csv"
df.to_csv(csv_file, index=False, encoding='utf-8')  
ws.close()

csv data:

here is some of the json response, from the beginning:

{'connectionID': 18155540471802097628, 'event': 'systemStatus', 'status': 'online', 'version': '1.7.2'}
{'channelID': 321, 'channelName': 'trade', 'event': 'subscriptionStatus', 'pair': 'XBT/USD', 'status': 'subscribed', 'subscription': {'name': 'trade'}}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['43081.00000', '0.02836870', '1612805543.405350', 'b', 'm', ''], ['43087.20000', '0.12163130', '1612805543.407889', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['43087.00000', '0.50000000', '1612805544.764522', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['43087.20000', '0.00230000', '1612805545.350573', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['43087.20000', '0.06226156', '1612805547.390753', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['43081.60000', '0.00300000', '1612805548.090721', 's', 'l', '']], 'trade', 'XBT/USD']

Also, avoid calling `pd.concat` or `DataFrame.append` in a `for-loop` since it leads to [quadratic copying](https://stackoverflow.com/a/36489724/1422451). — Parfait, Feb 08 '21 at 17:22
how would I add to the dataframe then on each loop? I will post that, but I did just revise my code it now is closer to what I want, the only thing now is the mutiple lists issue I mentioned.. Please look at revised code. — robothead, Feb 08 '21 at 17:29
your code seems to have gone in a strange direction.... why concat to then filter? why not navigate to what you want through list slicing? — Rob Raymond, Feb 08 '21 at 17:45
learning on the fly, Im trying whatever I can find to try to get this to work, also that was more or less to make the csv clearer so I could figure out the next part, it wasnt meant to be the final code, I would very much like to learn how to do it properly, this was just effective for the time being. BTW, Thanks for all your help, Im a bit out of my element here. — robothead, Feb 08 '21 at 17:46

Get Nested JSON into Pandas Dataframe and CSV

0 Answers0