Skip to content

Commit

Permalink
feat: add methods for fetching transcript content directly
Browse files Browse the repository at this point in the history
  • Loading branch information
Thoroldvix authored Jun 13, 2024
1 parent 28078d7 commit 88075ba
Show file tree
Hide file tree
Showing 5 changed files with 321 additions and 152 deletions.
64 changes: 34 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,11 @@ for [finding specific transcripts](#find-transcripts) by language or by type (ma
TranscriptList transcriptList = youtubeTranscriptApi.listTranscripts("videoId");

// Iterate over transcript list
for(Transcript transcript : transcriptList) {
System.out.println(transcript);
for(
Transcript transcript :transcriptList){
System.out.

println(transcript);
}

// Find transcript in specific language
Expand Down Expand Up @@ -290,48 +293,49 @@ Playlists and channels information is retrieved from
the [YouTube V3 API](https://developers.google.com/youtube/v3/docs/),
so you will need to provide API key for all methods.

All methods take a `TranscriptRequest` object as a parameter,
which contains the following fields:

- `apiKey` - YouTube API key.
- `stopOnError`(optional, defaults to `true`) - Whether to stop on the first error or continue. If true, the method will
fail fast by throwing an error if one of the transcripts could not be retrieved,
otherwise it will ignore failed transcripts.

- `cookies` (optional) - Path to [cookies.txt](#cookies) file.

All methods return a map which contains the video ID as a key and the corresponding result as a value.

```java
// Create a new default PlaylistsTranscriptApi instance
PlaylistsTranscriptApi playlistsTranscriptApi = TranscriptApiFactory.createDefaultPlaylistsApi();

//Create request object
TranscriptRequest request = new TranscriptRequest("apiKey");

// Retrieve all available transcripts for a given playlist
Map<String, TranscriptList> transcriptLists = playlistsTranscriptApi.listTranscriptsForPlaylist(
"playlistId",
"apiKey",
true);
Map<String, TranscriptList> transcriptLists = playlistsTranscriptApi.listTranscriptsForPlaylist("playlistId", request);

// Retrieve all available transcripts for a given channel
Map<String, TranscriptList> transcriptLists = playlistsTranscriptApi.listTranscriptsForChannel(
"channelName",
"apiKey",
true);
Map<String, TranscriptList> transcriptLists = playlistsTranscriptApi.listTranscriptsForChannel("channelName", request);
```

As you can see, there is also a boolean flag `continueOnError`, which tells whether to continue if transcript retrieval
fails for a video or not. For example, if it's set to `true`, all transcripts that could not be retrieved will be
skipped, if
it's set to `false`, operation will fail fast on the first error.

All methods are also have overloaded versions which accept path to [cookies.txt](#cookies) file.
Same as with the `YoutubeTranscriptApi`, you can also fetch transcript content directly
using [fallback languages](#use-fallback-language) if needed.

```java
// Retrieve all available transcripts for a given playlist
Map<String, TranscriptList> transcriptLists = playlistsTranscriptApi.listTranscriptsForPlaylist(
"playlistId",
"apiKey",
true,
"path/to/cookies.txt"
);
//Create request object
TranscriptRequest request = new TranscriptRequest("apiKey");

// Retrieve all available transcripts for a given channel
Map<String, TranscriptList> transcriptLists = playlistsTranscriptApi.listTranscriptsForChannel(
"channelName",
"apiKey",
true,
"path/to/cookies.txt"
);
// Retrieve transcript content for all videos in a playlist
Map<String, TranscriptContent> transcriptLists = playlistsTranscriptApi.getTranscriptsForPlaylist("playlistId", request);

// Retrieve transcript content for all videos in a channel
Map<String, TranscriptContent> transcriptLists = playlistsTranscriptApi.getTranscriptsForChannel("channelName", request, "en, de");
```

> **Note:** If you want to get transcript content in a different format, refer
> to [Use Formatters](#use-formatters).

## 🤓 How it works

Within each YouTube video page, there exists JSON data containing all the transcript information, including an
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@
* Retrieves transcripts for all videos in a playlist, or all videos for a specific channel.
* <p>
* Playlists and channel videos are retrieved from the YouTube API, so you will need to have a valid api key to use this.
* <p>
* All methods take a {@link TranscriptRequest} object as a parameter, which contains API key, cookies file path (optional), and stop on error flag (optional, defaults to true).
* If cookies are not provided, the API will not be able to access age restricted videos, see <a href="https://github.com/Thoroldvix/youtube-transcript-api#cookies">Cookies</a>.
* <p>
* {@link TranscriptRequest} also contains a flag to stop on error, or continue on error.
* </p>
* <p>
* To get implementation for this interface see {@link TranscriptApiFactory}
Expand All @@ -16,56 +21,59 @@
public interface PlaylistsTranscriptApi {

/**
* Retrieves transcript lists for all videos in the specified playlist using provided API key and cookies file from a specified path.
* Retrieves transcript lists for all videos in the specified playlist.
*
* @param playlistId The ID of the playlist
* @param apiKey API key for the YouTube V3 API (see <a href="https://developers.google.com/youtube/v3/getting-started">Getting started</a>)
* @param continueOnError Whether to continue if transcript retrieval fails for a video. If true, all transcripts that could not be retrieved will be skipped,
* otherwise an exception will be thrown.
* @param cookiesPath The file path to the text file containing the authentication cookies. Used in the case if some videos are age restricted see {<a href="https://github.com/Thoroldvix/youtube-transcript-api#cookies">Cookies</a>}
* @param playlistId The ID of the playlist
* @param request {@link TranscriptRequest} request object containing API key, cookies file path, and stop on error flag
* @return A map of video IDs to {@link TranscriptList} objects
* @throws TranscriptRetrievalException If the retrieval of the transcript lists fails
*/
Map<String, TranscriptList> listTranscriptsForPlaylist(String playlistId, String apiKey, String cookiesPath, boolean continueOnError) throws TranscriptRetrievalException;
Map<String, TranscriptList> listTranscriptsForPlaylist(String playlistId, TranscriptRequest request) throws TranscriptRetrievalException;


/**
* Retrieves transcript lists for all videos in the specified playlist using provided API key.
* Retrieves transcript lists for all videos for the specified channel.
*
* @param playlistId The ID of the playlist
* @param apiKey API key for the YouTube V3 API (see <a href="https://developers.google.com/youtube/v3/getting-started">Getting started</a>)
* @param continueOnError Whether to continue if transcript retrieval fails for a video. If true, all transcripts that could not be retrieved will be skipped,
* otherwise an exception will be thrown.
* @param channelName The name of the channel
* @param request {@link TranscriptRequest} request object containing API key, cookies file path, and stop on error flag
* @return A map of video IDs to {@link TranscriptList} objects
* @throws TranscriptRetrievalException If the retrieval of the transcript lists fails
*/
Map<String, TranscriptList> listTranscriptsForPlaylist(String playlistId, String apiKey, boolean continueOnError) throws TranscriptRetrievalException;
Map<String, TranscriptList> listTranscriptsForChannel(String channelName, TranscriptRequest request) throws TranscriptRetrievalException;


/**
* Retrieves transcript lists for all videos for the specified channel using provided API key and cookies file from a specified path.
* Retrieves transcript content for all videos in the specified playlist.
*
* @param channelName The name of the channel
* @param apiKey API key for the YouTube V3 API (see <a href="https://developers.google.com/youtube/v3/getting-started">Getting started</a>)
* @param cookiesPath The file path to the text file containing the authentication cookies. Used in the case if some videos are age restricted see {<a href="https://github.com/Thoroldvix/youtube-transcript-api#cookies">Cookies</a>}
* @param continueOnError Whether to continue if transcript retrieval fails for a video. If true, all transcripts that could not be retrieved will be skipped,
* otherwise an exception will be thrown.
* @return A map of video IDs to {@link TranscriptList} objects
* @throws TranscriptRetrievalException If the retrieval of the transcript lists fails
* @throws TranscriptRetrievalException If the retrieval of the transcript lists fails
* @param playlistId The ID of the playlist
* @param request {@link TranscriptRequest} request object containing API key, cookies file path, and stop on error flag
* @param languageCodes A varargs list of language codes in descending priority.
* <p>
* For example:
* </p>
* If this is set to {@code ("de", "en")}, it will first attempt to fetch the German transcript ("de"), and then fetch the English
* transcript ("en") if the former fails. If no language code is provided, it uses English as the default language.
* @return A map of video IDs to {@link TranscriptContent} objects
* @throws TranscriptRetrievalException If the retrieval of the transcript fails
*/
Map<String, TranscriptList> listTranscriptsForChannel(String channelName, String apiKey, String cookiesPath, boolean continueOnError) throws TranscriptRetrievalException;
Map<String, TranscriptContent> getTranscriptsForPlaylist(String playlistId,
TranscriptRequest request,
String... languageCodes) throws TranscriptRetrievalException;


/**
* Retrieves transcript lists for all videos for the specified channel using provided API key.
* Retrieves transcript content for all videos for the specified channel.
*
* @param channelName The name of the channel
* @param apiKey API key for the YouTube V3 API (see <a href="https://developers.google.com/youtube/v3/getting-started">Getting started</a>)
* @param continueOnError Whether to continue if transcript retrieval fails for a video. If true, all transcripts that could not be retrieved will be skipped,
* otherwise an exception will be thrown.
* @return A map of video IDs to {@link TranscriptList} objects
* @throws TranscriptRetrievalException If the retrieval of the transcript lists fails
* @param channelName The name of the channel
* @param request {@link TranscriptRequest} request object containing API key, cookies file path, and stop on error flag
* @param languageCodes A varargs list of language codes in descending priority.
* <p>
* For example:
* </p>
* If this is set to {@code ("de", "en")}, it will first attempt to fetch the German transcript ("de"), and then fetch the English
* transcript ("en") if the former fails. If no language code is provided, it uses English as the default language.
* @return A map of video IDs to {@link TranscriptContent} objects
* @throws TranscriptRetrievalException If the retrieval of the transcript fails
*/
Map<String, TranscriptList> listTranscriptsForChannel(String channelName, String apiKey, boolean continueOnError) throws TranscriptRetrievalException;
Map<String, TranscriptContent> getTranscriptsForChannel(String channelName, TranscriptRequest request, String... languageCodes) throws TranscriptRetrievalException;
}
67 changes: 67 additions & 0 deletions lib/src/main/java/io/github/thoroldvix/api/TranscriptRequest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
package io.github.thoroldvix.api;

/**
* Request object for retrieving transcripts from {@link PlaylistsTranscriptApi}.
* <p>
* Contains API key required for the YouTube V3 API,
* and optionally a file path to the text file containing the authentication cookies. If cookies are not provided, the API will not be able to access age restricted videos.
* Also contains a flag to stop on error, or continue on error. Defaults to false if not provided.
* </p>
* </p>
*/
public class TranscriptRequest {
private final String apiKey;
private final String cookiesPath;
private final boolean stopOnError;

/**
* Creates a new instance of {@link TranscriptRequest}
*
* @param apiKey API key for the YouTube V3 API (see <a href="https://developers.google.com/youtube/v3/getting-started">Getting started</a>)
* @param cookiesPath The file path to the text file containing the authentication cookies. Used in the case if some videos are age restricted see {<a href="https://github.com/Thoroldvix/youtube-transcript-api#cookies">Cookies</a>}
* @param stopOnError Whether to stop if transcript retrieval fails for a video. If false, all transcripts that could not be retrieved will be skipped,
* * otherwise an exception will be thrown on first error.
*/
public TranscriptRequest(String apiKey, String cookiesPath, boolean stopOnError) {
if (apiKey == null || apiKey.isBlank()) {
throw new IllegalArgumentException("API key cannot be null or blank");
}
this.apiKey = apiKey;
this.cookiesPath = cookiesPath;
this.stopOnError = stopOnError;
}

public TranscriptRequest(String apiKey, String cookiesPath) {
this(apiKey, cookiesPath, true);
}

public TranscriptRequest(String apiKey) {
this(apiKey, null, true);
}

public TranscriptRequest(String apiKey, boolean stopOnError) {
this(apiKey, null, stopOnError);
}

/**
* @return API key for the YouTube V3 API (see <a href="https://developers.google.com/youtube/v3/getting-started">Getting started</a>)
*/
public String getApiKey() {
return apiKey;
}

/**
* @return The file path to the text file containing the authentication cookies. Used in the case if some videos are age restricted see {<a href="https://github.com/Thoroldvix/youtube-transcript-api#cookies">Cookies</a>}
*/
public String getCookiesPath() {
return cookiesPath;
}

/**
* @return Whether to stop if transcript retrieval fails for a video. If false, all transcripts that could not be retrieved will be skipped,
* * otherwise an exception will be thrown on first error.
*/
public boolean isStopOnError() {
return stopOnError;
}
}
Loading

0 comments on commit 88075ba

Please sign in to comment.