You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. This is a feature request vs bug, methinks.
Have you looked at extracting captions from a live stream. If you look at any example (https://www.youtube.com/whitehouse) of a live stream, while the stream is live (key), there are auto-generated subtitles delivered in the videoplayback file that streams in, embedded e.g.
Expired, of course, but an example, the payload here is:
�ftyp��moovlmvhd�_�����@�(mvex trex���}trak\tkhd���@��mdia mdhd�_�UÄ!hdlrtextÐminf$dinf�dref�url �˜stblHstsd�8tx3g�
ftab�stts�stsc�stco�stsz��Vnmhd��emsghttp://youtube.com/streaming/metadata/segment/102015�ˆ°D«Sequence-Number: 664
Stream-Finished: F
Ingestion-Walltime-Us: 1614189870022158
Stream-Duration-Us: 3320017000
Max-Dvr-Duration-Us: 14400000000
Target-Duration-Us: 5000000
Encoding-Alias: L1_Ag
Xmoof�mfhd@traf�tfhd���ß’�tfdt�ÏYz�trun��`�^mdat<?xml version="1.0" encoding="utf-8" ?><timedtext format="3">
<body>
<p t="0" d="345">what's in the Declassified
report or when it comes out</p>
<p t="345" d="3750">because many elements of Italy
two years ago when when it was</p>
<p t="4095" d="910">first first came out if you come
to the conclusion that there</p>
</body>
</timedtext>
The timedtext is embedded in the file:
<?xml version="1.0" encoding="utf-8" ?><timedtext format="3">
<body>
<p t="0" d="345">what's in the Declassified
report or when it comes out</p>
<p t="345" d="3750">because many elements of Italy
two years ago when when it was</p>
<p t="4095" d="910">first first came out if you come
to the conclusion that there</p>
</body>
</timedtext>
It's not TTMLv3 but we get this text is associated with sequence #664 from the URL. The t= appears to be millisecond designation relative to the sequence chunk, and "d" appears to be the duration. But even absent that, the stream of text is there. Note it doesn't appear by default. It appears you need to insert into the "sparams" in the URL "xtags" to get the live captioning, but it appears if you try to insert it, it messes up the hash/key associated with it so it needs to be triggered on (cc_load_policy=1 in URL does NOT seem to work)
youtube-dl et al don't recognize this since it's not being delivered as a standalone subtitle file. Acts like there's no subtitles on the live stream since it doesn't identify as a subtitles file.
Thoughts?
The text was updated successfully, but these errors were encountered:
Hi @frisch1, I would definitely say that this is a feature request and not a bug. Sounds interesting, but I don't see myself implementing this anytime soon, as this module is mostly used for data-science purposes and I don't really see the use-case for livestreams. However, if you want to contribute this feature I'd be happy to merge it. Deserializing the response probably isn't a big deal, you just gotta find out how to scrape the URL you'll have to call to actually get that response. Let me know if you have that figured out and are interested in contributing it, so we can have a chat on how to implement this into the current API 😊
Hello. This is a feature request vs bug, methinks.
Have you looked at extracting captions from a live stream. If you look at any example (https://www.youtube.com/whitehouse) of a live stream, while the stream is live (key), there are auto-generated subtitles delivered in the videoplayback file that streams in, embedded e.g.
https://r6---sn-8xgp1vo-p5qy.googlevideo.com/videoplayback?expire=1614211486&ei=PpU2YM-ULYm98wTm0L_gDA&ip=71.246.232.10&id=yhxmnlGtJ-g.1&itag=386&source=yt_live_broadcast&requiressl=yes&mh=zc&mm=44,29&mn=sn-8xgp1vo-p5qy,sn-p5qs7nel&ms=lva,rdu&mv=m&mvi=6&pl=18&initcwndbps=1717500&vprv=1&live=1&hang=1&noclen=1&xtags=lang=en:ttkind=asr&mime=text/mp4&ns=aD6U7aY6idhNPyXEqiXu6K0F&gir=yes&mt=1614189620&fvip=6&keepalive=yes&fexp=23983797&beids=9466586&c=WEB&n=lmOMV3MuzrpzRQ&sparams=expire,ei,ip,id,itag,source,requiressl,vprv,live,hang,noclen,xtags,mime,ns,gir&sig=AOq0QJ8wRAIgd0qHHqBF3aRir-pw93UKhFNuFxrlpe6OqyMerxsZ4JsCIHZK74UbKX7ig08-egt6vMDzP6g_7EhOyuOOoUXAkSVW&lsparams=mh,mm,mn,ms,mv,mvi,pl,initcwndbps&lsig=AG3C_xAwRAIgHa9tABbFKMiVQSnLLWa7iO_iu7pcVtrea43G-zdfGBUCIGbqOL15uN0-32Yki8s5vwXD2XDkvCBUgntS54w9xvjc&alr=yes&cpn=LW2TAYe5jfbjzMjx&cver=2.20210223.09.00&sq=664
Expired, of course, but an example, the payload here is:
The timedtext is embedded in the file:
It's not TTMLv3 but we get this text is associated with sequence #664 from the URL. The t= appears to be millisecond designation relative to the sequence chunk, and "d" appears to be the duration. But even absent that, the stream of text is there. Note it doesn't appear by default. It appears you need to insert into the "
sparams
" in the URL "xtags
" to get the live captioning, but it appears if you try to insert it, it messes up the hash/key associated with it so it needs to be triggered on (cc_load_policy=1 in URL does NOT seem to work)youtube-dl et al don't recognize this since it's not being delivered as a standalone subtitle file. Acts like there's no subtitles on the live stream since it doesn't identify as a subtitles file.
Thoughts?
The text was updated successfully, but these errors were encountered: