6

In the following script, is there a way to find out how many "chunks" there are in total?

import pandas as pd
import numpy as np

data = pd.read_csv('data.txt', delimiter = ',', chunksize = 50000) 

for chunk in data:
    print(chunk)

Using len(chunk) will only give me how many each one has.

Is there a way to do it without adding the iteration manually?

Michael Currie
  • 12,486
  • 8
  • 41
  • 56
Leb
  • 14,495
  • 9
  • 52
  • 73

1 Answers1

9

CSV, being row-based, does not allow a process to know how many lines there are in it until after it has all been scanned.

Very minimal scanning is necessary, though, assuming the CSV file is well formed:

sum(1 for row in open('data.txt', 'r'))

This might prove useful in case you need to calculate in advance how many chunks there are. A full CSV reader is an overkill for this. The above line has very low memory requirements, and does minimal parsing.

Ami Tavory
  • 71,268
  • 10
  • 134
  • 170