115

I try to read the file into pandas. The file has values separated by space, but with different number of spaces I tried:

pd.read_csv('file.csv', delimiter=' ')

but it doesn't work

smci
  • 29,564
  • 18
  • 109
  • 144
yemu
  • 22,689
  • 10
  • 30
  • 29
  • 4
    Possible duplicate of [How to make separator in read\_csv more flexible wrt whitespace?](http://stackoverflow.com/questions/15026698/how-to-make-separator-in-read-csv-more-flexible-wrt-whitespace) – e4c5 Jan 18 '17 at 12:44

3 Answers3

196

add delim_whitespace=True argument, it's faster than regex.

HYRY
  • 89,863
  • 23
  • 181
  • 185
  • 2
    should add that, and remove `delimiter=' '` as they are mutually exclusive in recent versions. – matanster Aug 08 '18 at 13:05
  • 6
    @matanster: `delimiter=' '` is very brittle, it says to expect one and only one space. No tabs, newsline, multiple spaces, nonbreaking whitespaces, combination of these etc. `delimiter='\s+'` is what pandas recommends and is more robust. – smci Jan 16 '20 at 12:28
  • `sep="\s+"` argument also works – PeJota May 26 '22 at 13:55
44

you can use regex as the delimiter:

pd.read_csv("whitespace.csv", header=None, delimiter=r"\s+")
  • 4
    This helps when you have more than just a space as delimiter. In current versions one should add `engine = "python"` to avoid a warning. – Jürg Merlin Spaak Mar 20 '18 at 09:45
  • 1
    Sorry for commenting old reply here, what does `r` before `"\s+"` mean? – AlphaF20 Sep 01 '21 at 04:23
  • @AlphaF20 it means read as raw string literal: https://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-prefixes-do-and-what-are-raw-string-literals – PeJota May 26 '22 at 13:57
0

If you can't get text parsing to work using the accepted answer (e.g if your text file contains non uniform rows) then it's worth trying with Python's csv library - here's an example using a user defined Dialect:

 import csv

 csv.register_dialect('skip_space', skipinitialspace=True)
 with open(my_file, 'r') as f:
      reader=csv.reader(f , delimiter=' ', dialect='skip_space')
      for item in reader:
          print(item)
Pierz
  • 5,588
  • 37
  • 55