15

This is somewhat of a duplicate question of Does YouTube API forbid to download video captions if you are not it's owner?, Get YouTube captions and Does YouTube API forbid to download video captions if you are not it's owner?, which all basically say it's not possible unless to download captions via the YouTube API unless you are the owner or third-party contributions are not enabled; however, my question is how to sites like http://downsub.com/ or http://www.lilsubs.com/ have access to all captions?

In other words, when I access the YouTube API myself (even with youtubepartner and youtube.force-ssl scopes), I can only download the captions of some videos, but when I try the same videos that failed for me with 403: The permissions associated with the request are not sufficient to download the caption track. The request might not be properly authorized, or the video order might not have enabled third-party contributions for this caption. on these other sites, it works fine. I'm assuming they are using the YouTube API to access the captions, but what special sauce are they using? Some special partner key? An different API version? Are they just scraping from the videos themselves or something?

ryanbrainard
  • 5,738
  • 33
  • 41
  • Any link to example you are not able to get them but you can get them via mentioned sites? – Janis S. Oct 24 '17 at 15:10
  • @JanisS. Here's an example: https://youtu.be/0db1_qWZjRA, which resolves to caption id zMTLb41gaOS5LWeeAi0ribdiUBImBdqb, and then fails with a 403 – ryanbrainard Oct 24 '17 at 15:17
  • 2
    Thank you for comments about the unofficial `timedtext`. That'll probably work for my use case; however, it does not seem to support `kind=asr` (i.e. auto-translated captions) without a signature. The other sites like downsub.com also include these. How are they doing that? Here's an example: https://www.youtube.com/watch?v=vx6NCUyg1NE Only English and Indonesian work without a key. ASR captions also aren't listed here https://www.youtube.com/api/timedtext?v=vx6NCUyg1NE&lang=en&type=list. – ryanbrainard Oct 25 '17 at 10:58
  • please check my updated answer. – Janis S. Oct 26 '17 at 01:54

3 Answers3

17

Send a GET request on:

http://video.google.com/timedtext?lang={LANG}&v={VIDEOID}

Example for your video in comment: http://video.google.com/timedtext?lang=ko&v=0db1_qWZjRA

Let's look at another example of yours, i.e. https://www.youtube.com/watch?v=7068mw-6lmI (and I agree about differentiation part in your comment).

There are multiple subtitles available for the video

  • English
  • Korean
  • Spanish
  • Korean (auto-generated) also called asr (automatic speech recognition)

These stand for the subtitle name parameter (i.e., name=English).

lang stands for the country code. In your example: https://www.youtube.com/api/timedtext?lang=es-MX&v=7068mw-6lmI&name=Spanish

If subtitle track is available, it is possible to do translation form it, namely using tlang parameter.

https://www.youtube.com/api/timedtext?lang=en&v=7068mw-6lmI&name=English&tlang=lv
https://www.youtube.com/api/timedtext?lang=ko&v=7068mw-6lmI&name=Korean&tlang=lv

This would be my bid for what these sites are using, i.e. translation of the available subtitle track (confirm by trying to use a video without subtitle track as input for one of their sites).

As for asr signature seems to always be needed, but as long as one of the subtitle tracks are available, you could use that for translation. E.g. in your OP comment example:

https://www.youtube.com/api/timedtext?lang=en&v=vx6NCUyg1NE&tlang=lv

Looks like the last example is special with both of subtitle tracks being asr (checked with Chrome -> Inspect -> Network) therefore you need to omit the subtitle name parameter part. This difference unfortunately is not visible in YouTube video's settings wheel.

Janis S.
  • 2,418
  • 20
  • 30
4

There is this unofficial API used by Youtube :

https://www.youtube.com/api/timedtext?lang={LANG}&v={VIDEO_ID}

LANG here is ISO 639-1 2 letter country code. For your example it would be :

https://www.youtube.com/api/timedtext?lang=ko&v=0db1_qWZjRA

You can check it in network tab while toggling the closed caption button :

enter image description here

Bertrand Martel
  • 38,018
  • 15
  • 115
  • 140
  • 1
    Thanks, this is the best answer so far, but please see my comment about ASR captions. Happen to know? https://stackoverflow.com/questions/46864428/how-do-some-sites-download-youtube-captions#comment80807861_46864428 – ryanbrainard Oct 25 '17 at 13:05
  • Any idea why the `name` param is required on some videos even though `lang` is already provided? For example, this URL `https://www.youtube.com/api/timedtext?v=7068mw-6lmI&lang=ko&name=Korean` will not work without `name=Korean`. Other ones are fine. I'm thinking it might have something to do w/ the ASR captions on this video since there's also auto-generated Korean captions, so perhaps it's to differentiate, but just a guess. – ryanbrainard Oct 25 '17 at 16:41
  • looking at the list of available subs indicate when it's required. Not why. My guess is it's related to the YT v2 > v3 upgrade. Example : https://www.youtube.com/api/timedtext?v=7068mw-6lmI&type=list and https://www.youtube.com/api/timedtext?v=dhwpLACAls8&type=list – Flint Jun 13 '18 at 00:02
2

A 2022 answer:

Option 1: Send a curl request to the webpage: curl -L "https://youtu.be/YbJOTdZBX1g", search for timedtext in the result, and you would get a URL. replace \u0026 with & and you get the link for the subtitle.

Option 2: Use the yt-dlp package:

# For installing see: https://github.com/yt-dlp/yt-dlp#with-pip
from yt_dlp import YoutubeDL

ydl_opts = {
    "skip_download": True,
    "writesubtitles": True,
    "subtitleslangs": ["all", "-live_chat"],
    # Looks like formats available are vtt, ttml, srv3, srv2, srv1, json3
    "subtitlesformat": "json3",
    # You can skip the following option
    "sleep_interval_subtitles": 1,
}
with YoutubeDL(ydl_opts) as ydl:
    ydl.download(["YbJOTdZBX1g"])
C-Y
  • 135
  • 2
  • 9