Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I there a transcription API? #453

Closed
4 tasks done
bezaleel22 opened this issue Jul 21, 2023 · 2 comments
Closed
4 tasks done

I there a transcription API? #453

bezaleel22 opened this issue Jul 21, 2023 · 2 comments
Labels
question Further information is requested

Comments

@bezaleel22
Copy link

Question

Is there any support for fetching video transcription from YouTube using this library. If not, it will be nice have this feature. Thank for this great work.

Other details

No response

Checklist

  • I am running the latest version.
  • I checked the documentation and found no answer.
  • I have searched the existing issues and made sure this is not a duplicate.
  • I have provided sufficient information.
@bezaleel22 bezaleel22 added the question Further information is requested label Jul 21, 2023
@bezaleel22 bezaleel22 changed the title Transcription API I there a transcription API? Jul 21, 2023
@tomByrer
Copy link

tomByrer commented Aug 1, 2023

Might not be as covenant, but I have my own fork of a YouTube transcription downloader. I forked so I can include basic HTML formatting, & shape the JSON how I wanted it.
https://github.com/tomByrer/youtube-captions-scraper

@seomikewaltman
Copy link

seomikewaltman commented Aug 14, 2023

You can extended the Innertube class to do it.

// InnerYouTube.js
import { Innertube, Utils } from 'youtubei.js';

export default class InnerYouTube extends Innertube {

   async getTranscriptsParameters(id) {

        if (!id) throw new Utils.MissingParamError('Video id is missing');

        const uri = `/watch?v=${id}`;
        const response = await this.session.http.fetch(uri, { method: "GET", baseURL: 'https://www.youtube.com' })
        .then(r => r.text());

        const params = Utils.getStringBetweenStrings(response, 'getTranscriptEndpoint":{"params":"', '"}}}},');
        if (params) return params;
       
        throw new Utils.ParsingError(`getTranscriptEndpoint not found ${id}`);

    }

async getTranscript(id) {

        if (!id) throw new Utils.MissingParamError('Video id is missing');

        const params = await this. getTranscriptsParameters(id);
        const url = `/get_transcript?key=${this.key}`;
        const context = this.session.context;
        const opts = {
            method: "POST",
            body: JSON.stringify({
                context,
                params,
            }),
            baseURL: 'https://www.youtube.com/youtubei/v1'
        }

        const response = await this.session.http.fetch(url, opts)
            .then(r => r.json())

        return response
    }
    
}

Implementation is a bit different since they made static method create() for generating the instance.

import InnerYouTube from './InnerYouTube.js';

// Normal way
// const innertube await InnerTube.create(...options_go_here);

// extending youtubei with your own class
const innertube = await new InnerYouTube(await Session.create(...options_go_here));
const transcripts_json = await innertube. getTranscript(videoId);

// parse the json as you see fit

Since this hits the youtube.com/watch URL to get the parameters for the transcript's youtubei/v1 endpoint needed to see the transcripts stream you will get recaptcha'ed if you do a lot of these per minute. You'll want to use a proxy.

import InnerYouTube from './InnerYouTube.js';
import { ProxyAgent, fetch } from 'undici';

const proxyClient = new ProxyAgent('http://yourproxyhere');

const innertube = await new InnerYouTube(await Session.create({
              fetch: async (input, init) => {
                  return fetch(input, { ...init, dispatcher: proxyClient })
              },
          }));
const transcripts_json = await innertube. getTranscript(videoId);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants