7

We have couple of huge files (greater than size of RAM) in disk. I want to read them line by line in python and output results in terminal. I have gone through [1] and [2], but I am looking for methods which do not wait till the entire file is read into memory.

I would be using both of these commands:

cat fileName | python myScript1.py
python myScript2.py fileName

[1] How do you read from stdin in Python? [2] How do I write a unix filter in python?

Community
  • 1
  • 1
BiGYaN
  • 6,796
  • 5
  • 29
  • 43

3 Answers3

9

This is the standard behavior of file objects in Python:

with open("myfile.txt", "r") as myfile:
    for line in myfile:
        # do something with the current line

or

for line in sys.stdin:
    # do something with the current line
Tim Pietzcker
  • 313,408
  • 56
  • 485
  • 544
4

Just iterate over the file:

with open('huge.file') as hf:
  for line in hf:
    if 'important' in line:
      print(line)

This will require O(1) memory.

To read from stdin, simply iterate over sys.stdin instead of hf:

import sys
for line in sys.stdin:
  if 'important' in line:
    print(line)
phihag
  • 263,143
  • 67
  • 432
  • 458
  • I am a python newbie, can you please explain "simply iterate over sys.stdin instead of hf". Do you mean `for line in sys.stdin` ? – BiGYaN Oct 17 '11 at 09:38
  • 1
    Yes, `sys.stdin` is just a [file object](http://docs.python.org/library/sys.html?highlight=stdin#sys.stdin) that behaves like a file you have opened manually. – Tim Pietzcker Oct 17 '11 at 09:42
-1
if __name__ == '__main__':
    while 1:
        try:
            a=raw_input()
        except EOFError:
            break
        print a

This will read from stdin til EOF. To read a file using the second method, you can use Tim's method

i.e.

with open("myfile.txt", "r") as myfile:
    for line in myfile:
        print line
        # do something with the current line
spicavigo
  • 3,980
  • 20
  • 28