I'm trying to write code that will read through a list of files and count the frequency that a particular event occurs in each file. But I'm having a lot of trouble just reading the files.
I have gotten counting frequency code to work if I specify the names of files myself, but want to generalize my code, so that I don't have to edit the script every time I want to run it.
Below is work in progress code for opening and reading files in a folder:
import os
path = "/Users/Desktop/PracticeCode/TextFiles"
for filename in os.listdir(path):
with open(filename, 'rU') as f:
contents = f.read()
print(filename)
print(contents)
I don't know what 'rU' means but saw others using 'rU' to open files in a list. Using 'r' results in a similar error.
I expected to print title and content of each file in the folder but get the error below. I have no idea how to fix this and would appreciate any feedback.
I think the error message states that something is wrong with file encoding. If this is correct, can someone explain why I don't get this error when specifying files explicitly?
with open(filename, 'rU') as f:
Traceback (most recent call last):
File "counting_code_2", line 8, in <module>
contents = f.read()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte
Edit: I'm posting some lines of the file I've been using to develop code. It's a text file of pride and prejudice.
The Project Gutenberg EBook of Pride and Prejudice, by Jane Austen
This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org
Title: Pride and Prejudice
Author: Jane Austen
Posting Date: August 26, 2008 [EBook #1342] Release Date: June, 1998 Last updated: February 15, 2015]
Language: English
Character set encoding: ASCII
* START OF THIS PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE *
Produced by Anonymous Volunteers
PRIDE AND PREJUDICE
By Jane Austen
Chapter 1
It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
Edit 2: a line of code has been added to the code before the with statement:
if filename != '.DS_Store':
This has removed the encoding error, however, there is still an indentation error after the read function. Is my coding grammar okay?