Video imagery to text (Closed Captioning) #11770

A2113S · 2023-10-13T16:29:52Z

Feature request

Implement a feature using Langchain's image_captions.py and audio_speech_to_text.py to produce .srt files. This system will provide both subtitles and visual scene descriptions, essentially creating closed captioning.

Motivation

Current closed captioning primarily captures spoken content, omitting crucial visual elements of videos. This lack of comprehensive description hinders a full understanding, especially for the hearing-impaired community. Our goal is to elevate video accessibility by integrating both visual and auditory components in closed captioning, ensuring a more inclusive viewing experience.

Your contribution

We are students at the University of Toronto, interested in contributing to Langchain for a school project. We would like to submit a PR that implements this feature sometime in November.

dosubot · 2023-10-13T16:32:09Z

🤖

Thank you for your interest in contributing to LangChain! Your proposed feature to generate .srt files with both subtitles and visual scene descriptions sounds like a valuable addition to the framework.

We appreciate your motivation to enhance video accessibility and provide a more inclusive viewing experience. It's great to see your commitment to addressing the needs of the hearing-impaired community.

Please go ahead and work on implementing this feature. We look forward to reviewing your pull request in November. If you have any questions or need any guidance during the process, feel free to ask. Good luck with your school project!

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

A2113S · 2023-10-13T16:45:20Z

@dosu-bot What would the next steps look like? We should probably start by creating an outline for the implementation of this feature. Since Langchain already has an image analyzer and speech-to-text feature, the main goal would be to effectively break down a video into frames and then to make sure the output of each frame is combined into something that is accurate and makes sense.

Also, can I add my group members (GitHub accounts) to this issue? Just so we are all on the same page.

dosubot · 2023-10-15T23:08:34Z

🤖

Hello,

It's great to hear that you're interested in contributing to LangChain! Your feature idea sounds very promising and would be a valuable addition to the project.

To start implementing this feature, you can follow the steps outlined in the CONTRIBUTING.md file. This includes forking the repository, creating a new branch, implementing the feature, committing and pushing your changes, and finally opening a pull request.

As you've mentioned, LangChain already has an image analyzer and speech-to-text feature. Your main task would be to break down a video into frames and then combine the output of each frame into something that makes sense. This will likely involve using the image_captions.py and audio_speech_to_text.py modules, and possibly others.

For the .srt file format, you can refer to the pysrt library that LangChain uses to load and parse .srt files. The standard structure of an .srt file is as follows:

1
00:00:20,000 --> 00:00:24,400
This is subtitle text.

2
00:00:25,000 --> 00:00:27,000
This is another subtitle text.

Each subtitle entry consists of a sequence number, the start and end times of the subtitle, the subtitle text itself, and a blank line indicating the end of the subtitle entry.

To add your group members to this issue, you can navigate to the issue in the LangChain repository and click on the 'assign yourself' link on the right side of the issue page. If you do not have the necessary permissions to do this, you can comment on the issue asking to be assigned.

Please remember that all code contributions must pass the formatting, linting, and testing checks before they can be merged. You can run these checks locally using the commands make format, make lint, and make test.

I hope this helps! If you have any further questions, feel free to ask.

Best of luck with your project!

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

nicoledroi1 · 2023-10-15T23:12:35Z

@dosu-bot Hello, I am also part of the group contributing to this issue. Can you please assign me to this issue?

A2113S · 2023-11-05T21:25:56Z

@dosu-bot Hello! When trying to use the ImageCaptionLoader, we were following the doc on it. We get the error that NoneType object is not callable at the line processor = BlipProcessor.from_pretrained(self.blip_processor)

dosubot · 2023-11-05T21:31:35Z

🤖

The error message "NoneType object is not callable" typically indicates that you're trying to call a None object as if it were a function or method. In your case, the error is happening at the line processor = BlipProcessor.from_pretrained(self.blip_processor), which suggests that BlipProcessor.from_pretrained might be None.

Here are a few potential reasons for this issue:

The transformers package might not be installed or properly installed. You can verify this by running pip show transformers in your terminal. If it's not installed, you can install it using pip install transformers.
The BlipProcessor class might not be correctly imported from the transformers package. You can verify this by checking if BlipProcessor is available in the transformers package by running dir(transformers) in your Python interpreter.
The from_pretrained method might not be available in the BlipProcessor class. You can verify this by checking if from_pretrained is a method of BlipProcessor by running dir(BlipProcessor) in your Python interpreter.
The self.blip_processor might not be a valid pre-trained model. You can verify this by checking the value of self.blip_processor. It should be a valid pre-trained model available in the Hugging Face model hub.

If all the above checks pass, then the issue might be with the transformers package itself. You might need to update it to the latest version using pip install --upgrade transformers. If the issue still persists, it would be best to raise an issue in the transformers GitHub repository.

Unfortunately, I couldn't find specific details about how the BlipProcessor.from_pretrained(self.blip_processor) method is implemented in the LangChain codebase. If you could provide more information about your implementation or the steps leading up to the error, it might help in diagnosing the problem.

Sources

libs/langchain/langchain/document_loaders/image_captions.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

LunarECL · 2023-11-30T04:53:57Z

@dosu-bot We created a new chain using some new dependencies that the other existing chains don't have. Thus, when running the tests, they fail because they cannot find the specified module. For example, we are using OpenCV to split videos into frames. How should we add the opencv dependency to langchain so they work with other chains?

We keep getting the error that ModuleNotFoundError: No module named 'cv2' for every existing chain when trying to run the tests.

dosubot · 2024-03-11T16:04:32Z

Hi, @A2113S,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue was raised by you to request the implementation of a feature using Langchain's image_captions.py and audio_speech_to_text.py to generate .srt files for closed captioning, aiming to enhance accessibility for the hearing-impaired community. The issue gained interest from dosubot, LunarECL, nicoledroi1, and ifritzler, who provided guidance and assistance on the implementation steps, dependencies, and error troubleshooting. The students from the University of Toronto are aiming to contribute to Langchain for a school project and have received support and instructions from the maintainers of the repository.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!

@baskaryan

…14059) Description: Video imagery to text (Closed Captioning) This pull request introduces the VideoCaptioningChain, a tool for automated video captioning. It processes audio and video to generate subtitles and closed captions, merging them into a single SRT output. Issue: #11770 Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers, pillow, torch, openai Tag maintainer: @baskaryan @hwchase17 Hello!  We are a group of students from the University of Toronto (@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a contribution to the LangChain community! We have ran make format, make lint and make test locally before submitting the PR. To our knowledge, our changes do not introduce any new errors. Thank you for taking the time to review our PR! --------- Co-authored-by: Bagatur <[email protected]>

@baskaryan

…angchain-ai#14059) Description: Video imagery to text (Closed Captioning) This pull request introduces the VideoCaptioningChain, a tool for automated video captioning. It processes audio and video to generate subtitles and closed captions, merging them into a single SRT output. Issue: langchain-ai#11770 Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers, pillow, torch, openai Tag maintainer: @baskaryan @hwchase17 Hello!  We are a group of students from the University of Toronto (@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a contribution to the LangChain community! We have ran make format, make lint and make test locally before submitting the PR. To our knowledge, our changes do not introduce any new errors. Thank you for taking the time to review our PR! --------- Co-authored-by: Bagatur <[email protected]>

@baskaryan

…#14059) Description: Video imagery to text (Closed Captioning) This pull request introduces the VideoCaptioningChain, a tool for automated video captioning. It processes audio and video to generate subtitles and closed captions, merging them into a single SRT output. Issue: langchain-ai/langchain#11770 Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers, pillow, torch, openai Tag maintainer: @baskaryan @hwchase17 Hello!  We are a group of students from the University of Toronto (@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a contribution to the LangChain community! We have ran make format, make lint and make test locally before submitting the PR. To our knowledge, our changes do not introduce any new errors. Thank you for taking the time to review our PR! --------- Co-authored-by: Bagatur <[email protected]>

@baskaryan

…14059) Description: Video imagery to text (Closed Captioning) This pull request introduces the VideoCaptioningChain, a tool for automated video captioning. It processes audio and video to generate subtitles and closed captions, merging them into a single SRT output. Issue: #11770 Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers, pillow, torch, openai Tag maintainer: @baskaryan @hwchase17 Hello!  We are a group of students from the University of Toronto (@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a contribution to the LangChain community! We have ran make format, make lint and make test locally before submitting the PR. To our knowledge, our changes do not introduce any new errors. Thank you for taking the time to review our PR! --------- Co-authored-by: Bagatur <[email protected]>

dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Oct 13, 2023

LunarECL mentioned this issue Nov 30, 2023

Create Closed Captioning Chain for .mp4 videos #14059

Merged

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 11, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 18, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video imagery to text (Closed Captioning) #11770

Video imagery to text (Closed Captioning) #11770

A2113S commented Oct 13, 2023

dosubot bot commented Oct 13, 2023

A2113S commented Oct 13, 2023 •

edited

Loading

dosubot bot commented Oct 15, 2023

nicoledroi1 commented Oct 15, 2023

A2113S commented Nov 5, 2023

dosubot bot commented Nov 5, 2023

LunarECL commented Nov 30, 2023

dosubot bot commented Mar 11, 2024

Video imagery to text (Closed Captioning) #11770

Video imagery to text (Closed Captioning) #11770

Comments

A2113S commented Oct 13, 2023

Feature request

Motivation

Your contribution

dosubot bot commented Oct 13, 2023

A2113S commented Oct 13, 2023 • edited Loading

dosubot bot commented Oct 15, 2023

Sources

nicoledroi1 commented Oct 15, 2023

A2113S commented Nov 5, 2023

dosubot bot commented Nov 5, 2023

Sources

LunarECL commented Nov 30, 2023

dosubot bot commented Mar 11, 2024

A2113S commented Oct 13, 2023 •

edited

Loading