Python3.8 - get specific content from an website url

Question

I searched a lot and can't find an answer. I need only a few numbers (id in a link) and want to remove the other content from the same url.

Example:

https://tenor.com/view/cat-look-gif-19801862
https://tenor.com/view/4357-gif-18712819
https://tenor.com/view/gifs-away-gif-gif-8174489
https://tenor.com/view/spooky-vision-gif-18976398

what I need from the URL:

19801862 (first link)
18712819 (second link)
8174489 (third link)
18976398 (4th link)

What I know is, these numbers (the gif id) are always behind the "gif-" tag. Maybe that's useful. But GIF Names can contain numbers and the word "gif" too.

the word you need to search for is `web scraping` there are alot of tutorials — mama, Jan 05 '21 at 23:05
It seems like you are literally asking to slice off the number at the end of each string? — Chris, Jan 05 '21 at 23:07
Does this answer your question? [Split a string by a delimiter in python](https://stackoverflow.com/questions/3475251/split-a-string-by-a-delimiter-in-python) — Chris, Jan 05 '21 at 23:08

score 0 · Answer 1 · answered Jan 05 '21 at 23:17

I found a way to do it.

For others that need a solution too, see here:

link = f"https://tenor.com/view/cat-look-gif-19801862"
numbers = []
for z in link:
    if z.isdigit():
        numbers.append(z)
    else:
        numbers = []
        numbers = int("".join(numbers))
print(numbers)

score 0 · Answer 2 · answered Jan 05 '21 at 23:27

If you have a string that contains many links and you want to detect links and get the gif id from the end of link, you can use this code:

import re

links = '''
https://tenor.com/view/cat-look-gif-19801862
https://tenor.com/view/4357-gif-18712819
https://tenor.com/view/gifs-away-gif-gif-8174489
https://tenor.com/view/spooky-vision-gif-18976398
'''
for x in re.finditer(r"tenor\.com/view/.*-(\d+)", str(links)):
    the_id = x.group(1)
    print(the_id)

score 0 · Answer 3 · answered Jan 05 '21 at 23:38

the easiest way is to use regex library:

import re

pattern = re.compile(r'\d+')
link_list = ['https://tenor.com/view/cat-look-gif-19801862', 'https://tenor.com/view/4357-gif-18712819',
             'https://tenor.com/view/gifs-away-gif-gif-8174489', 'https://tenor.com/view/spooky-vision-gif-18976398']
for i, x in enumerate(link_list):

    result = pattern.findall(x)
    print(f'{result[0]} is link number {i}')

Python3.8 - get specific content from an website url

3 Answers3