Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeps returning the exact same error no matter if it has subtitles or not #357

Closed
linuxfandudeguy opened this issue Nov 27, 2024 · 4 comments

Comments

@linuxfandudeguy
Copy link

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior:

enter in any URL to the API which i am making and uses the library:
yt

What code / cli command are you executing?

I am running

from flask import Flask, request, jsonify
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter
from youtube_transcript_api._errors import VideoUnavailable

app = Flask(__name__)

def get_video_id(url):
    """
    Extracts the video ID from a YouTube URL.
    """
    if "youtube.com" in url:
        return url.split("v=")[1].split("&")[0]
    elif "youtu.be" in url:
        return url.split("/")[-1]
    else:
        raise ValueError("Invalid YouTube URL")

def extract_transcript(video_url, language='en'):
    """
    Extracts transcript from a YouTube video using YouTube Transcript API.
    Supports both human-written and auto-generated subtitles.
    """
    video_id = get_video_id(video_url)

    try:
        # List all available transcripts for this video
        available_transcripts = YouTubeTranscriptApi.list_transcripts(video_id)

        # Print available transcripts for debugging (can be removed later)
        print("Available transcripts:", available_transcripts)

        # Try to fetch the transcript for the requested language
        transcript = None
        if language in available_transcripts:
            transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=[language])
        else:
            # Fallback to auto-generated subtitles if human-written is not available
            transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en', 'en-US', 'auto'])

        if not transcript:
            return f"No transcript found for {language} or auto-generated subtitles."

        # Format the transcript into a readable format
        formatted_transcript = "\n".join([item['text'] for item in transcript])
        return formatted_transcript

    except VideoUnavailable:
        return "This video is unavailable or doesn't have subtitles."
    except Exception as e:
        return f"Error extracting transcript: {e}"

@app.route('/api/transcript', methods=['GET'])
def get_transcript():
    """
    API endpoint to fetch YouTube video subtitles/transcripts.
    Expects 'video_url' as a query parameter and optional 'lang' (default 'en') for language.
    """
    # Get the YouTube video URL and language from the query parameters
    video_url = request.args.get('video_url')
    language = request.args.get('lang', 'en')  # Default to 'en' if no language is provided
    
    if not video_url:
        return jsonify({"error": "Missing 'video_url' parameter"}), 400
    
    # Extract transcript from the provided YouTube URL
    transcript = extract_transcript(video_url, language)
    
    # Return the transcript or error message
    return jsonify({"transcript": transcript})

if __name__ == '__main__':
    app.run(debug=True)
    ```

### Which Python version are you using?
Python 3.13.0

### Which version of youtube-transcript-api are you using?
youtube-transcript-api 0.6.3

# Expected behavior
Describe what you expected to happen. 

I expected to get the transcript auto generated in JSON

# Actual behaviour
Describe what is happening instead of the **Expected behavior**. Add **error messages** if there are any. 

 Instead I received the following error message:

Error extracting transcript: \nCould not retrieve a transcript for the video https://www.youtube.com/watch?v=BUmTEFDSTrQ! This is most likely caused by:\n\nSubtitles are disabled for this video\n\nIf you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!

The video also had subtitles on the video so I have no idea why it is giving me this error and this exact same error takes place for every single video.
@jdepoix
Copy link
Owner

jdepoix commented Nov 27, 2024

duplicate of #303

@jdepoix jdepoix closed this as completed Nov 27, 2024
@tikene
Copy link

tikene commented Nov 30, 2024

I was having the same issue, ended up making a similar program, may not work in your case though since it also adds the subtitles to the video. It's more barebones than this repo but it works https://github.com/tikene/video-caption-and-translate - I'll delete this if linking other repos isnt allowed. Btw you do need an OpenAI api key but it's pretty cheap to use

@jdepoix
Copy link
Owner

jdepoix commented Dec 1, 2024

@tikene Worth noting that this solution does not retrieve YouTubes original transcript, but uses OpenAI Whisper model to transcribe the video. Whisper isn't really great at transcribing noisy audio, so be aware that transcript quality probably isn't as good as YouTubes original transcripts. Also, using the OpenAI API to do that will probably be more expensive than paying for a proxy tbh.

@tikene
Copy link

tikene commented Dec 1, 2024

@tikene Worth noting that this solution does not retrieve YouTubes original transcript, but uses OpenAI Whisper model to transcribe the video. Whisper isn't really great at transcribing noisy audio, so be aware that transcript quality probably isn't as good as YouTubes original transcripts. Also, using the OpenAI API to do that will probably be more expensive than paying for a proxy tbh.

Absolutely correct, my implementation requires an OpenAI api key and costs around 2 cents per 2 minutes of video. My use case is mostly for translating videos which don't have manually added captions already, which it does do pretty well (better than youtube's auto translation feature anyways thats for sure) so I think these projects are complimented nicely. If a video already has proper translated captions (manually added), then its definitely better and cheaper to use yours

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants