-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transcripts #16
Comments
|
@tomrossi7 I looked at the link you provided and I see your meaning. I think it comes down to name confusion with the tag. Are you saying that the aggregators would know this element refers to captions by looking for type="application/srt"? That works but feels semantically strange to take intent from a mime type. Is there a way to firm that up with something like the following?
I agree with @theDanielJLewis on point 2. I heard a podcast yesterday (a random one on castcoverage) and they started the show saying that this was their first english language episode. That show would really need the ability for separating the language of the show from the language of the transcript. |
Great idea on point #2! We don't support transcripts in other languages at this point, but that is a great idea for the future! I believe all transcripts are time-based, the only thing that changes is the "fidelity". Even our HTML transcripts have timestamps automatically inserted based on monologues. I just don't know if warrants another tag when really it points to the exact same resource represented in another format that is captured in the @daveajones what would other
This is really exciting since we are in the midst of rolling this out to players currently! |
@tomrossi7 Yes, that's the idea! At this point in time, rel="captions" would act like a binary since the only use case we're addressing is transcripts and closed captions. But adding the attribute seems like the right way to future proof it. If rel="captions" is missing, the tag would be assumed to contain a link to a plain text transcription. If rel="captions" exists, the tag would be referring to a time coded file as you are showing. I'm assuming none of us want an actual transcript in the XML. That's insanity. So, these would all be links like you are showing in your examples. |
Did you look at our examples though? We have time codes even in HTML, its just the format or type that is changing. The aggregator can choose which type they want to ingest and we (as the producer) don't make any assumptions about how they will use the various representations. We can definitely add the |
Since we're all living in this XML world, doesn't it make sense to use a self-closing tag? <podcast:transcript url="https://host.com/1.srt" type="application/srt" rel="captions" /> The URL of the transcript is not content. This is an inherently empty element. |
These are the changes we've been discussing in issue Podcastindex-org#16 . What do you think?
Yes I saw them. They look good. But, I'm thinking about this issue reverse from that. What if a podcast item declares this:
... according to your spec. How is the podcast app going to know if that HTML document is simply a straight transcript like this, or if it's a time-encoded HTML transcript like yours. The mime type declares only the underlying format, which can be ambiguous. Having an attribute that captures intent, like rel="captions" lets the app know that this is HTML, but it'll still have time codes in it because that's the whole point of a captions file. I hate extra attributes as much as anyone. It's Goal 2 after all. But, you are currently generating those, so you know what to expect. Once that tag gets in the wild, it could be populated by other "transcripts". And, those are going to be all over the map. The aggregators and apps need a hint here about what they are about to consume. |
Yeah, I totally agree that transcripts provided in HTML format can be anything in the world. Even within Buzzsprout, our HTML format varies wildly depending on what the podcaster provides. If you really wanted to parse it, you would rather have it in a standard like JSON or SRT. I was just saying that a transcript is a transcript and doesn't seem to warrant a separation between a I'm happy to go along with other, just wanted to provide my 2 cents from our experience. |
Ah, I got you. |
I hate empty tags. It's a personal preference. It is less readable to me. But, even the RSSv2 spec itself is inconsistent on this point - having a url for the node value of You guys have that tag in production already, correct? If so, maybe we make it an empty tag to match what you are already doing, and just make rel="captions" the new optional attribute. It'll be a tiny code change for yall, and everyone else gets a transcript/caption tag. :-) |
Yes, we do and it's already been picked up by Podcast Addict. They are using it to display real-time captions. |
@daveajones off-topic, but we could make this spec for both XML and JSON. One of the issues with XML is its just so verbose. If enough hosts adopt JSON, maybe eventually the industry will turn? |
Then let's move forward with that then. We'll just merge your tag into the spec as-is and add the rel="captions" attribute as optional when specifying a potentially ambiguous filetype (something non-SRT) that is meant specifically as a closed caption. And, add the language="" attribute as optional when the language of the referenced file doesn't match the rss language tag. |
I would suggest that
podcast:transcript
andpodcast:captions
are really the same thing provided in different formats. The way we have approached it with Buzzsprout makes use of the XMLtype
. I know this may go against your Goal Note on "keep existing conventions" #2, but it really does accurately capture what is being represented and avoids creating a new tag when people want to make use of another format for transcripts e.g. JSON, WebVTT.Transcript language seems redundant with the language of the podcast which may be better captured with
podcast:language
.The text was updated successfully, but these errors were encountered: