I read the file and group all lines by count of columns to different dataframes:
dict_of_cols = {}
dict_of_dfs = {}
with open(filename, 'r', encoding="utf-8") as f:
reader = csv.reader(f, delimiter=data_file_delimiter)
data= list(reader)
with open(filename, 'r', encoding="utf-8") as f:
while (line := f.readline().strip()):
cls = line.split(data_file_delimiter)
if len(cls) in dict_of_cols:
dict_of_cols[len(cls)].append(cols)
else:
dict_of_cols[len(cls)] = [cls]
for colsKey, colsValue in dict_of_cols.items():
print(colsValue)
dict_of_dfs['dt_'+str(colsKey)] = pd.DataFrame(colsValue)
As result I can use any dataframe:
dict_of_dfs['dt_12'].head() this is dataframe based lines with 12 columns
I got the error when I call prev command:
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
How to optimize this (loops)?