0

I want to get some data from a webpage (https://www.evaschulze-aufgabenpool.de/index.php/s/smwP6ygck2SXRtF?path=%2FKlasse12) with python and selenium, but the content I want is dynamically generated and to see all the content you would have to scroll down at the webpage. To be more specifically I want to get all the folder names shown at the website, but it doesn't work. My attempt to just scroll down the whole webpage with selenium also doesn't seem to work right, but I don't know what I'm doing wrong or what else I could do to get all the folder names. So my question is: How can I make sure I always get all of the dynamically generated folders of the website.

Here's the code I'm using:

from time import sleep
from selenium import webdriver

url = "https://www.evaschulze-aufgabenpool.de/index.php/s/smwP6ygck2SXRtF?path=%2FKlasse12"

driver = webdriver.Chrome("chromedriver.exe")
driver.get(url)
driver.maximize_window()

sleep(3)
for i in range(5):
    driver.execute_script("window.scrollTo(0, 1080)")
    sleep(3)

data = driver.find_element_by_tag_name("table")
data = data.find_elements_by_tag_name("tr")

for element in data:
    name = element.get_attribute("data-file")
    if name is not None:
        print(name)

driver.quit()
KlonAnon
  • 27
  • 6
  • If you want to do so with Selenium, here you can find how to be sure that you have reached the bottom of the page, then you can go through all the folders: https://stackoverflow.com/questions/32391303/how-to-scroll-to-the-end-of-the-page-using-selenium-in-python/32629481 – carlosvin Jan 10 '21 at 11:22
  • 1
    Yes, thank, so I guess my question has already been answered – KlonAnon Jan 10 '21 at 11:36

1 Answers1

0

Cześć, use the API that NextCloud offers to list your files and many other things. This way you will get an answer in plain text. You can find examples here: Nextcloud list files using API

Instead of the curl tool, use HTTP GET queries using the Python requests library.

nycaff
  • 93
  • 1
  • 5