0

I have a bunch of geojson files 200Mb+ that I would like to read with progress tracking. Current solution (see below) is inspired by this answers.

def hook(obj):
    value = obj.get("features")
    if value:
        pbar = tqdm(value, desc='PROGRESS')
        for item in pbar:
            pass
    return obj

with open(file_path, mode = 'r') as data_file:
    data = json.load(data_file, object_hook = hook)

However I observe the following behaviour:

  1. Progress of open is not tracked,
  2. hook actually slows down the process dramatically as it reads the file one more time just for the sake of getting total number of items,
  3. Performance of json.load becomes very slow due to the hook.

Ideally I would like to see:

  1. Progress of open (I assume there should be a way to get total for tqdm directly from it to avoid reading the file twice)
  2. Progress of json.load with/without total number of items.

Any ideas? Thanks!

bbs
  • 1

0 Answers0