Getting only one entry in csv. Python beautifulsoup ,requests, selenium

Question

Trying to scrape some data. Checking it with print and getting multiple prints..However, the CSV has only one entry. Can you help please? Thanks a lot.

import csv
import time
import requests
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium import webdriver


job_Details = []
job_links = []



chrome_options = Options()
'''chrome_options.add_argument("--headless")'''
driver = webdriver.Chrome(executable_path='C:/bin/chromedriver.exe', options=chrome_options)
driver.get(f'https://remotejobs.world/')
'''SCROLL_PAUSE_TIME = 20'''
last_height = driver.execute_script("return document.body.scrollHeight")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
divs = driver.find_elements_by_tag_name('h2')
for div in divs:
    job_Details.append(div)
    link = div.find_element_by_tag_name('a')
    job_links.append(link)

for job_detail, job_link in zip(job_Details, job_links):
    if job_detail and job_link:
        print(job_link.get_attribute('href'))
        print(job_detail.text)
        url = job_link.get_attribute('href')
        new_page = requests.get(url).text
        time.sleep(2)
        soup = BeautifulSoup(new_page, 'html.parser')
        job_desc = soup.find('div', class_='w-full md:w-2/3')
        if job_desc:
            print(job_desc.text) #Successful Prints.
            dict = {'Job_title and Company': job_detail.text, "Job link": job_link.get_attribute('href'),
                        "Job Details": job_desc.text}
            with open('remoteWORLD.csv', 'w') as f:
                w = csv.DictWriter(f, dict.keys())
                w.writeheader()
                w.writerow(dict)

Because you have `with open('remoteWORLD.csv', 'w') as f:` within your for loop so you'll only get the last entry in your CSV file. — Justin Ezequiel, Dec 02 '20 at 14:10
@JustinEzequiel When I move it outside it says Dict might be undefined? — Abhishek Rai, Dec 02 '20 at 14:12
You only need `dict` for the header row and the keys are fixed! And I suggest you rename `dict` as it shadows the builtin. — Justin Ezequiel, Dec 02 '20 at 14:25
@JustinEzequiel Can you please check..it is writing the first entry in the third column.. — Abhishek Rai, Dec 02 '20 at 15:05
Look at your `dict = {...}` line. `'Job_title and Company': job_detail.text` and `"Job Details": job_desc.text`. If that's wrong then you need to switch the values. — Justin Ezequiel, Dec 02 '20 at 15:16
@JustinEzequiel No, that's not wrong..I am saying ..it is not printing the job_desc.text at all.. — Abhishek Rai, Dec 02 '20 at 15:26
Do you mean `print(job_desc.text)` does not print? If so, then your condition `if job_desc:` is False meaning your soup.find(...) returned None which requires you to post a new question. — Justin Ezequiel, Dec 02 '20 at 15:30
@JustinEzequiel It prints it successfully, just not in the csv/dict. You can run it and check it yourself, if possible for you. — Abhishek Rai, Dec 02 '20 at 15:49
@JustinEzequiel It is kind of truncating the print..I don't know why..It's a comparatively larger amount of text..could that be the reason? — Abhishek Rai, Dec 02 '20 at 16:05
Open the CSV in a text editor. My guess is the "text" contains a whole bunch of whitespace and what you are using to view (Excel?) the CSV cannot display the data. — Justin Ezequiel, Dec 02 '20 at 16:09
@JustinEzequiel You are right..I couldn't have have thought of it myself though..would have remained stuck..Thanks a ton. — Abhishek Rai, Dec 02 '20 at 16:14
When in doubt, always open a text editor. I recommend Notepad++. — Justin Ezequiel, Dec 02 '20 at 16:15
@JustinEzequiel :) ..Checked it in NP++ ..I like it very much. — Abhishek Rai, Dec 02 '20 at 16:23

score 1 · Accepted Answer · answered Dec 02 '20 at 14:37

See below how simple my suggestion was.

with open('remoteWORLD.csv', 'w') as f:
    w = csv.DictWriter(f, ['Job_title and Company', "Job link", "Job Details"])
    w.writeheader()
    for job_detail, job_link in zip(job_Details, job_links):
        if job_detail and job_link:
            print(job_link.get_attribute('href'))
            print(job_detail.text)
            url = job_link.get_attribute('href')
            new_page = requests.get(url).text
            time.sleep(2)
            soup = BeautifulSoup(new_page, 'html.parser')
            job_desc = soup.find('div', class_='w-full md:w-2/3')
            if job_desc:
                print(job_desc.text) #Successful Prints.
                dict = {'Job_title and Company': job_detail.text, "Job link": job_link.get_attribute('href'),
                            "Job Details": job_desc.text}
                w.writerow(dict)

This is not writing the `job_desc.text` though..It is large amount of text..What do I need to change? — Abhishek Rai, Dec 02 '20 at 14:57

score 0 · Answer 2 · answered Dec 02 '20 at 14:13

0

Try using:

with open('remoteWORLD.csv', 'a+') as f:

The w will rewrite the file each time. The a means you can append (the + means it can be read as well as written).

Check here for more explanations: Difference between modes a, a+, w, w+, and r+ in built-in open function?

EDIT: or as Justin says, move it outside of the loop and write your lists job_Details and job_links to it

answered Dec 02 '20 at 14:13

ob9528

11
3

a+ works...I think ..will just confirm and accept. – Abhishek Rai Dec 02 '20 at 14:21
Sure if you like the header row repeated within your CSV. – Justin Ezequiel Dec 02 '20 at 14:26
@JustinEzequiel So, I need to write the list to dict? after moving out of loop? – Abhishek Rai Dec 02 '20 at 14:32
The line: `w.writeheader()` can be removed from the for loop if the header shouldn't be repeated, and replaced above with a 'w' instance of the file. – ob9528 Dec 02 '20 at 14:33

Getting only one entry in csv. Python beautifulsoup ,requests, selenium

2 Answers2