0

I'm trying trying parse a website with bs4

<script>

var _load_pages = [{"n":1,"w":"760","h":"1990","u":"url1"},{"n":2,"w":"760","h":"1990","u":"url2"},{"n":3,"w":"760","h":"1990","u":"url3"},{"n":4,"w":"760","h":"1990","u":"url4"},{"n":5,"w":"760","h":"1990","u":"url5"}];

</script>

I need help to get these URLs. I don't know what to use to get them.

Mark Rotteveel
  • 90,369
  • 161
  • 124
  • 175

1 Answers1

0

Try this

import re, json

soup = BeautifulSoup(html, 'lxml')
for s in soup.find_all('script'):
    js = json.loads(re.findall(r'var _load_pages = (.*?);', s.string)[0])
urls = []
for j in js:
    urls.append(j['u'])
print(urls)
Nanthakumar J J
  • 883
  • 1
  • 5
  • 19