I have a bunch of geojson files 200Mb+ that I would like to read with progress tracking. Current solution (see below) is inspired by this answers.
def hook(obj):
value = obj.get("features")
if value:
pbar = tqdm(value, desc='PROGRESS')
for item in pbar:
pass
return obj
with open(file_path, mode = 'r') as data_file:
data = json.load(data_file, object_hook = hook)
However I observe the following behaviour:
- Progress of
openis not tracked, hookactually slows down the process dramatically as it reads the file one more time just for the sake of getting total number of items,- Performance of
json.loadbecomes very slow due to the hook.
Ideally I would like to see:
- Progress of
open(I assume there should be a way to get total for tqdm directly from it to avoid reading the file twice) - Progress of
json.loadwith/without total number of items.
Any ideas? Thanks!