-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
image alt text is lost during parsing #2
Comments
This sounds similar to the language parsing brainstorm at: http://microformats.org/wiki/microformats2-parsing-brainstorming#Parse_language_information
Continuing on the brainstorming around how to include languages one could imagine: <img class="u-photo" src="globe.gif" alt="spinning globe animation" lang="en"> Parsed as: {
"photo": [
{
"value": "http://example.com/globe.gif",
"alt": "spinning globe animation",
"lang": "en"
}
]
} As all implementations should already have the expectation of receiving an object rather than a string and to use the value of that object rather than the string, so adding such an additional Important for parsing libraries to also distinguish between an empty I know @glennjones has already implemented experimental And @gRegorLove made a So there's something to build upon there experience wise. |
After some discussion I'm not as sure anymore on the similarity in parsing – could be that this is rather a special case of fallback content for embedded content: https://html.spec.whatwg.org/multipage/dom.html#fallback-content One should maybe consider The resulting value from whatever parsing one ends up with could probably though be represented similarly as has been suggested for |
To preserve alt text (and indeed all accessibility markup) you can use e-content. |
First, I think "can use e-content" is not solving the problem, but rather "kicking the can down the road". It is not a solution for the parsing of alt text problem, but instead a way of procrastinating responsibility of parsing for alt text to every microformats JSON consuming application, which is unreasonable since the reason a microformats JSON consuming application is using microformats JSON in the first place is because they do not want to have to parse the HTML. Thus saying "just parse the HTML from e-content" (which is essentially what saying "you can use e-content ... To preserve alt text (and indeed all accessibility markup)" is saying is ignoring the very context of incentives of the microformats JSON consuming application in the first place. Second, lang and alt are similar in that they are both extra information on the element, but the resemblance stops there. "lang" is both rarely used (in comparison to "alt"), and can often be auto-implied from the content, whereas "alt" can nearly never be implied, and is thus more important to solve. That being said, if a solution for "alt" works for "lang", that would be a nice side effect (but it's not a "must have"). |
I'm not sure how much to brainstorm in a GH issue and how much to recommend a specific course of action. Feels weird to brainstorm in a threaded medium (GitHub issue) which is the opposite of what you want (collaborative iteration in-place on a brainstorm). @aaronpk suggested a hybrid approach of collborative iterative brainstorming on the wiki. Here is a start on some specific ideas for approaches (and problems therein): |
The change as described in the brainstorm conversation here: Any implementation of this change would (should) be paired with a major version # change to give consumers a chance to adjust their consuming code |
If there is no non-empty alt attribute should the original parsed format be used? Secondly, does this not in some way conflict with the use of "value" in e-* type parsing where value is a plaintext representation and html is the actual representation? |
@kartikprabhu wrote:
Then existing behavior.
I don't see what you are talking about. Can you provide a code example that demonstrates this conflict? |
@tantek Consider the following example <div class="h-entry">
<p class="p-name e-content"><span>Hello World</span></p>
<img class="u-photo" src="globe.gif" alt="spinning globe animation">
</div> which under the new rules would give the parsed mf2 as {
"type": [
"h-entry"
],
"properties": {
"name": [
"Hello World"
],
"photo": [
{
"value": "http://example.com/globe.gif",
"alt": "spinning globe animation"
}
],
"content": [
{
"html": "<span>Hello World</span>",
"value": "Hello World"
}
]
}
} from the above one can see that for |
I remember @notenoughneon built a system that uses HTML files with Microformats as a data store: PURR I'd love to get her feedback on whether this new data structure would cause any problems with that model. |
That's interesting, @kartikprabhu. I had not really thought of it as an alternative, but more of a default. For |
My understanding of the parsing rules was that Basically as a consumer, you can always use the value in |
@gRegorLove @aaronpk good points. I guess I was thinking of |
It sounds like we have a fairly good consensus around a particular proposal, and any apparent conflicts have been explained or resolved. Would someone like to take a crack at suggested minimal spec edits to implement the proposal? |
Re: @voxpelli point / question / counterproposal for "fallback", this isn't about "fallback" this is about capturing what the author authored, specifically on the element with the microformats property name being parsed. re: audio & video - they don't do content based fallback, their contents are only for older browsers that have no support for those elements at all. re: object - it's a different case entirely since its contents allow rich markup. if you want an object's contents, can already get them with an "e-*" property on the object. if there are others with specific use-cases, we can address them as necessary. |
@tantek I'm not really against the solution, it was after all what I proposed initially. The discussion I referenced above, but failed to link, was this one: https://chat.indieweb.org/microformats/2016-07-12#t1468345415448000 After there having "considered the difference" I concluded that the difference between It specifically says the following in that spec about
So fallback content is still about what the author has authored – if the author has given specific fallback content then that fallback content should be forwarded – we are talking about the same thing.. In practice it probably makes sense to use I still do wonder though why it wouldn't work to just say that a
And actually even this:
Don't they all convey the very same thing from the perspective of HTML? |
It makes sense to use "alt" as the name because it's a 1:1 mapping of the value of the alt attribute.
Is an artificial example, not real world, you would just use an img.
Would be properly marked up by putting u-photo on both photos provided:
which would then provide the alt for the second photo. |
I'm okay with just doing the img alt parsing as it makes for a simpler mf2 parsing spec. I still don't fully understand the criticism in regards to the alt text not being fallback content, but let's leave that. (The object tag linking to an SVG is not an artificial example but one usually brought up as one of the major ways to include SVG. See eg: https://css-tricks.com/using-svg/#article-header-id-11) |
i currently have a use case, snarfed/bridgy#756, that's blocked on this. the composite object |
On http://microformats.org/wiki/index.php?title=microformats2-parsing&oldid=66695#parsing_a_u-_property Replace:
With:
|
Absence of So I suggest the following modification to @gRegorLove 's suggestion
|
LGTM. Think my only addition now is to ensure the
|
as proof of concept, this has been implemented in experimental version of mf2py for explicit Example 0<div class="h-entry">
<p class="p-name e-content"><span>Hello World</span></p>
<img class="u-photo" src="globe.gif">
</div> has [
"globe.gif"
] Example 1<div class="h-entry">
<p class="p-name e-content"><span>Hello World</span></p>
<img class="u-photo" src="globe.gif" alt="spinning globe animation">
</div> has [
{
"alt": "spinning globe animation",
"value": "globe.gif"
}
] Example 2<div class="h-entry">
<p class="p-name e-content"><span>Hello World</span></p>
<img class="u-photo" src="globe.gif" alt="">
</div> has [
{
"alt": "",
"value": "globe.gif"
}
] |
Is there a specific reason why this change shouldn’t also be applied to implied photos? Haven’t seen this mentioned in the discussion yet, but if a spec edit is coming up, this might be worth addressing? It wasn’t too long ago implied properties were updated to better match the parsing algo of their explicit counterparts. |
@Zegnat I don't see any reason not to apply this to implied photo too. However, this was not discussed so Ieft it out. Also, currently it is only for a |
Neither do I, but I didn’t want to assume as I haven’t been part of the conversation. I think we should try not to introduce too many differences between implied an explicit properties, by which I mean that if I add the |
I would disagree with applying this only to explicit u-photo, I think that would result in a surprise to web authors. The simpler model is to handle "alt" for u-photo regardless of whether it is implicit or explicit. In addition, why shouldn’t it apply to any use of u-* with an img? E.g. "u-featured" on an img should also pick up any alt attribute. In short, I’d rather NOT go through multiple proposal/consensus/prototype/changes to get "alt" to work properly. I’d rather we figure out how "alt" should work and change the parsing spec once to handle it. Note the issue name "image alt text is lost during parsing" is not specific to u-photo. Let’s fix this for any use of any image (img) tags in the parsing spec. (Originally published at: http://tantek.com/2018/147/t1/) |
Here are the proposed changes to the spec to account for Add a new section 1.5 with title "parse an
in http://microformats.org/wiki/microformats2-parsing#parsing_a_u-_property break the second step into the following
in http://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties for implied photo change
step 3 to
step 5 to
|
experimental mf2py now implements the above algorithm under the flag cc: @snarfed |
woo, can't wait to try it! |
This has been in mf2py for a while now, and used by granary/bridgy. @snarfed, any feedback on it? For reference, here's the granary diff: I noticed the need for the type check in snarfed/granary@05a7818#diff-7c6b8da7f499d633036e0bcdd9819a95R445 since only images with |
thanks for the nudge @sknebel! yup, granary and bridgy are using this feature happily. details in snarfed/bridgy#756. here's a recent example of a bridgy publish to twitter with alt text:
fine by me to close this issue if you all want! |
I’ve incorporated the proposed issue 2 spec changes (see diff: http://microformats.org/wiki/index.php?title=microformats2-parsing&diff=66965&oldid=66956), please review "PROPOSED" text in:
(Originally published at: http://tantek.com/2018/364/t2/) |
Makes sense to me! |
test in mf2py for img parsing with alt: test examples: https://github.com/microformats/mf2py/blob/master/test/examples/experimental/img_with_alt.html expected results: below in python code form from mf2py
|
Resolution: issue 2 proposal accepted. No objections in above discussion, and positive opinions (👍) from a few implementors on the proposal. Proposal text incorporated into spec and reviewed. Proposal implementation in mf2py parser, test cases provided, and Brid.gy verification that mf2py implementation satisfies use-case for the issue is sufficient to demonstrate implementability and utility, all as noted/linked in issue thread. Edited specification accordingly. Thanks everyone for all the hard work on this one! Took a while but I think it was worth it to get it just right. (Originally published at: http://tantek.com/2018/365/t6/) |
As this has been accepted should the big warning under "Uploading a photo with alt text" I. https://www.w3.org/TR/micropub/#json-syntax have been updated to clarify that this is now the standard? I'd be happy to raise the PR (Originally published at: https://www.jvt.me/mf2/2020/07/vbncg/) |
PR already exists: w3c/Micropub#116 |
This example illustrates the loss of image alt text during microformats parsing.
This will occur any time the
<img>
tag appears outside of other microformats properties.This means it's impossible for a consumer of the parsed
h-entry
to reconstruct a representation of the post that includes the alt text.This is blocking w3c/Micropub#34
The text was updated successfully, but these errors were encountered: