How to optimize creating of datasets in Pandas?

Question

I read the file and group all lines by count of columns to different dataframes:

dict_of_cols = {}
dict_of_dfs = {}

with open(filename, 'r', encoding="utf-8") as f:
    reader = csv.reader(f, delimiter=data_file_delimiter)
    data= list(reader)
    
with open(filename, 'r', encoding="utf-8") as f:
    while (line := f.readline().strip()):
        cls = line.split(data_file_delimiter)
        if len(cls) in dict_of_cols:
            dict_of_cols[len(cls)].append(cols)
        else:    
            dict_of_cols[len(cls)] = [cls]
        
for colsKey, colsValue in dict_of_cols.items():
    print(colsValue)
    dict_of_dfs['dt_'+str(colsKey)] = pd.DataFrame(colsValue)

As result I can use any dataframe:

dict_of_dfs['dt_12'].head() this is dataframe based lines with 12 columns

I got the error when I call prev command:

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

How to optimize this (loops)?

Does this answer your question? [Read CSV into a dataFrame with varying row lengths using Pandas](https://stackoverflow.com/questions/55129640/read-csv-into-a-dataframe-with-varying-row-lengths-using-pandas) — Arne, May 25 '22 at 12:39

How to optimize creating of datasets in Pandas?

0 Answers0