-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitizing/correcting of (script) tag(s) #102
Comments
Ruh roh, looks like this may be granary's fault after all. I see this in https://granary.io/twitter/@me/@all/@app/1341785945364811776?format=html&... : <div class="e-content p-name">
The "Let's use client-side JavaScript rendering for everything! A web page should be a single <script> tag!" crowd have held the developer marketing advantage for far too long
</div> Argh. I'll look into it. |
I deployed a fix to granary and checked XRay, it seems happy now. Feel free to retry! |
fixes aaronpk/XRay#102 background discussion: https://chat.indieweb.org/microformats/2021-01-10#t1610238353449500 also fix a bug in test_testdata.py that was preventing a bunch of tests from running, argh. they're now broken. i'll fix them in upcoming commits.
@snarfed indeed, that fixes it. However, the problem remains in Xray too - or maybe it's in mf2 parsing, too early in the morning now to debug :) Payload underneath to test: I added an article before and after that tweet, with the unescaped characters again; if you paste that, you only get two items. pinging @aaronpk as well.
|
So it's probably in the mf2 library, pasting that payload at http://pin13.net/mf2/ only returns 2 items. pinging @Zegnat and @gRegorLove |
Updated the title to reflect better what's going on. |
@swentel I'm not certain if the mf2 parser should be correcting HTML (or to what extent). With an unmatched |
True, nothing to do anymore here. |
It looks like there might be a problem when the 'script' tag is included in a tweet. It's an assumption at the moment, the problem might be at granary too.
This is the tweet: https://twitter.com/simonw/status/1341785945364811776
(got that because adactio retweeted it)
I'm using granary to generate an HTML version of my twitter followers, which is then parsed by XRay. When I tested it with adactio alone (and @self) and pasted the body into https://xray.p3k.app/, the output is pretty weird: every article is included into a single content property.
So, this might be a granary problem, although I think it's fine as far as I can see.
The text was updated successfully, but these errors were encountered: