2

I am trying to open a file, and count the number of lines there are in the file.

The code I am using for this is:

def line_Count(x):
    with open(x,'r') as iFile:  # Open passed in file
        lines = iFile.readlines() # Read each line in the file
        line_Count = len(lines) # Count the number of lines
        return line_Count       # Return the line count

This works fine for small amounts of data (10k lines in 0.073 seconds).

However, for large files (1m lines), it is taking more than 15 minutes to complete.

Is there a faster way of completing the task?

The previous example is from more than 5 years ago, and some of the solutions have been deprecated since.

Clauric
  • 1,725
  • 6
  • 25
  • 43
  • 3
    Does this answer your question? [How to get line count of a large file cheaply in Python?](https://stackoverflow.com/questions/845058/how-to-get-line-count-of-a-large-file-cheaply-in-python) – CDJB Nov 18 '19 at 12:37
  • The crucial difference between your attempt and the linked question is that you *allocate memory for each line in the file*, thus keeping the entire file in memory. If you use `iFile` as a generator you only allocate one line at a time. – Botje Nov 18 '19 at 12:39
  • @CDJB the linked example crashes the machine after about 30 seconds. Some error about memory allocation. – Clauric Nov 18 '19 at 12:54
  • @CDJB So which was the best solution in that link. I read all the solution , but they are from 2009 to latest. – PIG Nov 18 '19 at 13:26

1 Answers1

3

Using xreadlines (since you're dealing with big files) might give you a boost in Python2:

count = 0
for line in open(file_path).xreadlines(): count += 1

Or since you're using Python 3 using a generator might be less memory intensive:

count = sum(1 for i in open(file_path, 'rb'))

Or

def blocks(files, size=65536):
    while True:
        b = files.read(size)
        if not b: break
        yield b

with open(file_path, "r",encoding="utf-8",errors='ignore') as f:
    print (sum(bl.count("\n") for bl in blocks(f)))

Lastly you could "cheat" and use subprocess:

int(subprocess.check_output("wc -l " + file_path).split()[0])
alexisdevarennes
  • 5,165
  • 3
  • 23
  • 37