Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcripts #16

Closed
tomrossi7 opened this issue Oct 12, 2020 · 13 comments
Closed

Transcripts #16

tomrossi7 opened this issue Oct 12, 2020 · 13 comments

Comments

@tomrossi7
Copy link
Contributor

tomrossi7 commented Oct 12, 2020

  1. I would suggest that podcast:transcript and podcast:captions are really the same thing provided in different formats. The way we have approached it with Buzzsprout makes use of the XML type. I know this may go against your Goal Note on "keep existing conventions" #2, but it really does accurately capture what is being represented and avoids creating a new tag when people want to make use of another format for transcripts e.g. JSON, WebVTT.

  2. Transcript language seems redundant with the language of the podcast which may be better captured with podcast:language.

@theDanielJLewis
Copy link

  1. But the idea of captions is for time-based transcripts. I can see uses for both fields. Look at how Apple TV supports "What did they say?" It jumps back a few seconds and turns on captions for a few seconds. That kind of time-based transcript could be helpful even for those without impaired hearing. This requires an extra layer of data beyond transcripts, but maybe that layer can be included inside transcripts. We would need to make captions fully SRT-compatible.
  2. While this sounds good and could be the assumption, there could be cases where someone provides a translated transcript of their podcast episode. Then, that would need a separate transcript tag and separate language.

@daveajones
Copy link
Contributor

daveajones commented Oct 12, 2020

@tomrossi7 I looked at the link you provided and I see your meaning. I think it comes down to name confusion with the tag. Are you saying that the aggregators would know this element refers to captions by looking for type="application/srt"? That works but feels semantically strange to take intent from a mime type. Is there a way to firm that up with something like the following?

<podcast:transcript type="application/srt" rel="captions">https://host.com/1.srt</podcast:transcript>

I agree with @theDanielJLewis on point 2. I heard a podcast yesterday (a random one on castcoverage) and they started the show saying that this was their first english language episode. That show would really need the ability for separating the language of the show from the language of the transcript.

@tomrossi7
Copy link
Contributor Author

Great idea on point #2! We don't support transcripts in other languages at this point, but that is a great idea for the future!

I believe all transcripts are time-based, the only thing that changes is the "fidelity". Even our HTML transcripts have timestamps automatically inserted based on monologues. I just don't know if warrants another tag when really it points to the exact same resource represented in another format that is captured in the type.

@daveajones what would other rel options be for a transcript other than captions? Buzzsprout currently serves up feeds in HTML, SRT, and JSON. So is this the proposed RSS standard?

<podcast:transcript type="application/srt"  rel="captions">https://host.com/1.srt</podcast:transcript>
<podcast:transcript type="application/json" rel="captions">https://host.com/1.json</podcast:transcript>
<podcast:transcript type="application/html" rel="captions">https://host.com/1.html</podcast:transcript>

This is really exciting since we are in the midst of rolling this out to players currently!

@daveajones
Copy link
Contributor

daveajones commented Oct 12, 2020

@tomrossi7 Yes, that's the idea!

At this point in time, rel="captions" would act like a binary since the only use case we're addressing is transcripts and closed captions. But adding the attribute seems like the right way to future proof it. If rel="captions" is missing, the tag would be assumed to contain a link to a plain text transcription. If rel="captions" exists, the tag would be referring to a time coded file as you are showing.

I'm assuming none of us want an actual transcript in the XML. That's insanity. So, these would all be links like you are showing in your examples.

@tomrossi7
Copy link
Contributor Author

Did you look at our examples though? We have time codes even in HTML, its just the format or type that is changing. The aggregator can choose which type they want to ingest and we (as the producer) don't make any assumptions about how they will use the various representations. We can definitely add the rel tag, I just hate to add more markup that may not be necessary 😬. Definitely don't want to include actual transcripts in the XML.

@snookfin
Copy link
Contributor

Since we're all living in this XML world, doesn't it make sense to use a self-closing tag?

<podcast:transcript url="https://host.com/1.srt" type="application/srt" rel="captions" />

The URL of the transcript is not content. This is an inherently empty element.

tomrossi7 added a commit to tomrossi7/podcast-namespace that referenced this issue Oct 12, 2020
These are the changes we've been discussing in issue Podcastindex-org#16 . What do you think?
@daveajones
Copy link
Contributor

Did you look at our examples though? We have time codes even in HTML, its just the format or type that is changing. The aggregator can choose which type they want to ingest and we (as the producer) don't make any assumptions about how they will use the various representations. We can definitely add the rel tag, I just hate to add more markup that may not be necessary 😬. Definitely don't want to include actual transcripts in the XML.

Yes I saw them. They look good. But, I'm thinking about this issue reverse from that. What if a podcast item declares this:

<podcast:transcript url="https://host.com/1.html" type="application/html" />

... according to your spec.

How is the podcast app going to know if that HTML document is simply a straight transcript like this, or if it's a time-encoded HTML transcript like yours. The mime type declares only the underlying format, which can be ambiguous. Having an attribute that captures intent, like rel="captions" lets the app know that this is HTML, but it'll still have time codes in it because that's the whole point of a captions file.

I hate extra attributes as much as anyone. It's Goal 2 after all. But, you are currently generating those, so you know what to expect. Once that tag gets in the wild, it could be populated by other "transcripts". And, those are going to be all over the map. The aggregators and apps need a hint here about what they are about to consume.

@tomrossi7
Copy link
Contributor Author

Yeah, I totally agree that transcripts provided in HTML format can be anything in the world. Even within Buzzsprout, our HTML format varies wildly depending on what the podcaster provides. If you really wanted to parse it, you would rather have it in a standard like JSON or SRT. I was just saying that a transcript is a transcript and doesn't seem to warrant a separation between a podcast:transcript and podcast:captions. The real separation is the type.

I'm happy to go along with other, just wanted to provide my 2 cents from our experience.

@daveajones
Copy link
Contributor

Yeah, I totally agree that transcripts provided in HTML format can be anything in the world. Even within Buzzsprout, our HTML format varies wildly depending on what the podcaster provides. If you really wanted to parse it, you would rather have it in a standard like JSON or SRT. I was just saying that a transcript is a transcript and doesn't seem to warrant a separation between a podcast:transcript and podcast:captions. The real separation is the type.

I'm happy to go along with other, just wanted to provide my 2 cents from our experience.

Ah, I got you.

@daveajones
Copy link
Contributor

Since we're all living in this XML world, doesn't it make sense to use a self-closing tag?

<podcast:transcript url="https://host.com/1.srt" type="application/srt" rel="captions" />

The URL of the transcript is not content. This is an inherently empty element.

I hate empty tags. It's a personal preference. It is less readable to me. But, even the RSSv2 spec itself is inconsistent on this point - having a url for the node value of <link> and a src="[url]" attribute for <enclosure> so probably not a hill worth dying on.

You guys have that tag in production already, correct?

If so, maybe we make it an empty tag to match what you are already doing, and just make rel="captions" the new optional attribute. It'll be a tiny code change for yall, and everyone else gets a transcript/caption tag. :-)

@snookfin
Copy link
Contributor

You guys have that tag in production already, correct?

Yes, we do and it's already been picked up by Podcast Addict. They are using it to display real-time captions.

@tomrossi7
Copy link
Contributor Author

@daveajones off-topic, but we could make this spec for both XML and JSON. One of the issues with XML is its just so verbose. If enough hosts adopt JSON, maybe eventually the industry will turn?

@daveajones
Copy link
Contributor

Then let's move forward with that then. We'll just merge your tag into the spec as-is and add the rel="captions" attribute as optional when specifying a potentially ambiguous filetype (something non-SRT) that is meant specifically as a closed caption. And, add the language="" attribute as optional when the language of the referenced file doesn't match the rss language tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants