0

I want to use Youtube Data API v3 to extract video metadata (especially title and publish date) for all videos in a channel. Currently, I'm only being able to extract details the last 20000 videos using the playlistItems() endpoint. Is there a way to extract metadata for more than 20000 videos from a single channel?

Here's the python code I'm using to extract metadata for 20000 videos.

youtube = build('youtube','v3',developerKey= "YOUTUBE_API_KEY")
channelId = "CHANNEL_ID"

# getting all video details
contentdata = youtube.channels().list(id=channelId,part='contentDetails').execute()
playlist_id = contentdata['items'][0]['contentDetails']['relatedPlaylists']['uploads']
videos = [ ]
next_page_token = None

while 1:
    res = youtube.playlistItems().list(playlistId=playlist_id,part='snippet',maxResults=50,pageToken=next_page_token).execute()
    videos += res['items']
    next_page_token = res.get('nextPageToken')
    if next_page_token is None:
        break

# getting video id for each video
video_ids = list(map(lambda x:x['snippet']['resourceId']['videoId'], videos))

The solution to this problem can either be forcing the API to extract metadata more than 20000 videos from a channel or specifying a time period during which video was uploaded. That way, the code can be run again and again for multiple time periods to extract metadata for all videos.

stvar
  • 5,871
  • 2
  • 10
  • 23
pmohanty
  • 11
  • 2
  • What is your question, please? – stvar May 22 '21 at 18:50
  • Apologies for making it confusing. I want to either extract metadata for more than 20000 videos from a channel or extract metadata for 20000 videos from a channel for a specified time period. Either of these will be a solution. – pmohanty May 22 '21 at 19:27
  • 1
    OK; then please edit your post above accordingly. – stvar May 22 '21 at 19:52
  • Done. Thanks for pointing that out. – pmohanty May 22 '21 at 20:55
  • I don’t see anything in the code that causes this 20k items limit. Can you please explain from where you determine some limit? Is it because of a limited number of pages available due to quota? – Dennis Jul 01 '21 at 22:27
  • @Dennis It's a limit imposed by the Youtube Data API. It's very hard to find any documentation about it but you'll find similar issues posted by other users. https://www.javaer101.com/en/article/40866109.html – pmohanty Jul 06 '21 at 08:43

1 Answers1

0

The solution to this problem can either be forcing the API to extract metadata more than 20000 videos from a channel or specifying a time period during which video was uploaded. That way, the code can be run again and again for multiple time periods to extract metadata for all videos.

I tried this without success.

My solution to failing YouTube API backend is using this Python script:

It consists in faking requests done when browsing "Videos" tab on a YouTube channel.

import urllib.request, json, subprocess
from urllib.error import HTTPError

def getURL(url):
    res = ""
    try:
        res = urllib.request.urlopen(url).read()
    except HTTPError as e:
        res = e.read()
    return res.decode('utf-8')

def exec(cmd):
    return subprocess.check_output(cmd, shell = True)

youtuberId = 'CHANNEL_ID'
videosIds = []
errorsCount = 0

def retrieveVideosFromContent(content):
    global videosIds
    wantedPattern = '"videoId":"'
    content = content.replace('"videoId": "', wantedPattern).replace("'videoId': '", wantedPattern)
    contentParts = content.split(wantedPattern)
    contentPartsLen = len(contentParts)
    for contentPartsIndex in range(contentPartsLen):
        contentPart = contentParts[contentPartsIndex]
        contentPartParts = contentPart.split('"')
        videoId = contentPartParts[0]
        videoIdLen = len(videoId)
        if not videoId in videosIds and videoIdLen == 11:
            videosIds += [videoId]

def scrape(token):
    global errorsCount, data
    # YOUR_KEY can be obtained by browsing a videos channel section (like https://www.youtube.com/c/BenjaminLoison/videos) while checking your "Network" tab using for instance Ctrl+Shift+E
    cmd = 'curl -s \'https://www.youtube.com/youtubei/v1/browse?key=YOUR_KEY\' -H \'Content-Type: application/json\' --data-raw \'{"context":{"client":{"clientName":"WEB","clientVersion":"2.20210903.05.01"}},"continuation":"' + token + '"}\''
    cmd = cmd.replace('"', '\\"').replace("\'", '"')
    content = exec(cmd).decode('utf-8')

    retrieveVideosFromContent(content)

    data = json.loads(content)
    if not 'onResponseReceivedActions' in data:
        print('no token found let\'s try again')
        errorsCount += 1
        return scrape(token)
    entry = data['onResponseReceivedActions'][0]['appendContinuationItemsAction']['continuationItems'][-1]
    if not 'continuationItemRenderer' in entry:
        return ''
    newToken = entry['continuationItemRenderer']['continuationEndpoint']['continuationCommand']['token']
    return newToken

url = 'https://www.youtube.com/channel/' + youtuberId + '/videos'
content = getURL(url)
content = content.split('var ytInitialData = ')[1].split(";</script>")[0]
dataFirst = json.loads(content)

retrieveVideosFromContent(content)

token = dataFirst['contents']['twoColumnBrowseResultsRenderer']['tabs'][1]['tabRenderer']['content']['sectionListRenderer']['contents'][0]['itemSectionRenderer']['contents'][0]['gridRenderer']['items'][-1]['continuationItemRenderer']['continuationEndpoint']['continuationCommand']['token']

while True:
    videosIdsLen = len(videosIds)
    print(videosIdsLen, token)
    if token == '':
        break
    newToken = scrape(token)
    token = newToken

print(videosIdsLen, videosIds)

Pay attention to modify "CHANNEL_ID" and "YOUR_KEY" values. Also pay attention to having curl command available from your shell.

Benjamin Loison
  • 880
  • 1
  • 6
  • 17