How to replace content of a file?

Question

So i have a bunch of HTML files that i would like to fix the markup on with the help of bs4. But once i run the code, all files are just empty (lucky my i made a backup before running my script on the folder).

This is what i have so far:

from bs4 import BeautifulSoup
import os
for entry in os.scandir(path):
    if entry.is_file() and entry.path.endswith('html'):
        file = open(entry.path, 'w+')
        soup = BeautifulSoup(file, 'html.parser')
        file.write(soup.prettify())
        print(colored('Success', 'green'))
        file.close()

The expected result would be that the file is read, prettyfied and saved.

Try printing the output of `soup` or `soup.prettify()` before saving it maybe it isn't loading/parsing correctly and just outputting `None`? — Jab, Jan 15 '20 at 17:00

Saif Asif · Answer 1 · 2020-01-15T17:03:32.087

0

you have truncated the files with the access modifier of +w. Take a look at this answer here which explains in detail which mode you require.

More information from the python docs can be found here for 2.7 and for python3

edited Jan 15 '20 at 17:03

answered Jan 15 '20 at 16:59

Saif Asif

5,250
2
29
47

Alright but i want to truncate the file, and then save it with new content (prettyfied) – Adam Jan 15 '20 at 17:03
then you require to read the file contents first so in `r` only mode and then create another new file `w` mode to dump the new contents – Saif Asif Jan 15 '20 at 17:04

score 0 · Accepted Answer · answered Jan 15 '20 at 17:03

opening the file with "W +" you delete what's in it before you can read. Solution:

from bs4 import BeautifulSoup
import os
for entry in os.scandir(path):
    if entry.is_file() and entry.path.endswith('html'):
        readFile = open(entry.path, 'r')
        soup = BeautifulSoup(readFile, 'html.parser')
        readFile.close()
        writeFile = open(entry.path, 'w')
        writeFile.write(soup.prettify())
        writeFile.close()
        print(colored('Success', 'green'))

Suyash · Answer 3 · 2020-01-15T17:26:59.943

You've used the 'w+' mode to open the file. This clears/ truncates all file content.

Use 'r' to read file contents, then process them, and use 'w+' to overwrite the file with the processed contents.

from bs4 import BeautifulSoup
import os
for entry in os.scandir(path):
    if entry.is_file() and entry.path.endswith('html'):
        with open(entry.path, 'r') as f:
            readfile = f.read()
        readFile = open(entry.path, 'r')
        soup = BeautifulSoup(readFile, 'html.parser')
        with open(entry.path, 'w+') as f:
            readfile = f.write(soup.prettify())
        print(colored('Success', 'green'))

For more info about modes of opening files in python see these resources:

Excellent StackOverflow answers

Manpagez

Python documentation

How to replace content of a file?

3 Answers3