1

I try to read all the files in a directory, but as they contain spaces and accents, I get errors (already read many posts on SO but cannot find any answer)

this returns a list of files

files = [y for x in os.walk(".") for y in glob(os.path.join(x[0], '*.pdf'))]

but as I try to open them one by one

for file in files:
    with open(file,"r") as f:

I get these kind of errors (I obfuscated the letters cos it's confidential):

IOError: [Errno 22] invalid mode ('r') or filename: '.\abcd?efgh (hijk? lmnop).pdf'

I believe the issues are caused by the accents but since it's python that gives me the the file names, I dont understand why they are not compatible with "open()"

regards

how can I fix this ?

jww
  • 90,984
  • 81
  • 374
  • 818
  • 3
    Did you try it with `os.walk(u'.')`? – Nick stands with Ukraine Sep 03 '18 at 07:33
  • you're the man !!! it worked, thank you so much – phil12345678910 Sep 03 '18 at 07:39
  • What platform are you on? If it's not Windows, this could be a sign of a deeper problem with your filesystems or mount tables that you should fix or you might see other problems later. – abarnert Sep 03 '18 at 07:50
  • Also, why are you using `glob` on the results of `walk`? Why not `file for root, dirs, files in os.walk(u'.') for file in files if os.path.splitext(file) == '.pdf'`? – abarnert Sep 03 '18 at 07:51
  • *"... caused by the accents"* - I believe they are called *[diacritics](https://en.wikipedia.org/wiki/Diacritic)* (assuming more than just the accent is giving you trouble). – jww Sep 03 '18 at 10:03

1 Answers1

0

I do this now :

files = [y for x in os.walk(u'.') for y in glob(os.path.join(x[0], '*.'+extension))]

Note the use of u'.' instead of "."

Nick stands with Ukraine
  • 6,365
  • 19
  • 41
  • 49
  • You also want `u'*.'`. And you probably also want `extension` to be a `unicode` rather than a `str`. – abarnert Sep 03 '18 at 07:50