-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
URL previews includes unicode punycode which causes issues URLs in the text body #14068
Comments
The Unicode quotation marks are being added by Twitter itself, not Synapse. This isn't something that the Synapse team will fix, since we want to avoid adding special handling for certain websites. $ curl "https://twitter.com/mischiefanimals/status/1576904037449969664" -H "User-Agent: Synapse (bot; +https://github.com/matrix-org/synapse)" | less
...<meta data-rh="true" content="<E2><80><9C>https://t.co/fVP8YWHS2j<E2><80><9D>" property="og:description"/>... |
@squahtx you already have "special" handling for bits of Twitter: https://github.com/matrix-org/synapse/blob/develop/synapse/res/providers.json - in theory adding all of Twitter to that whitelist would resolve this. |
That file is essentially unused now; moments don't exist. We backed out the changes to use oEmbed for all of Twitter in #11985 because it gives significantly worse results. It seems that Twitter itself is giving a better description in the oEmbed though (the Synapse parsed version would be):
vs. what can be pulled from their OpenGraph tags in the HTML, which as @squahtx pulled above is:
We should be preferring this, see: synapse/synapse/rest/media/v1/preview_url_resource.py Lines 382 to 387 in 303b40b
|
Ah, I see what's going on -- Twitter doesn't have oEmbed autodiscovery enabled, so we are only scraping the HTML in this case. If we were to add the Twitter URLs back to the We could always scrape the given URL, but also check oEmbed info if available. Would be reasonable in terms of "try to find the most info possible", but would result in duplicate queries in some situations... it would treat autodiscovery of oEmbed more similar to the hard-coded providers list though. 🤷 |
I downgraded to tolerable since you can easily click the link being previewed, and uncommon since I think this is an odd example where a URL was given as the only tweet content (which seems a bit weird -- but do shout if this seems incorrect). |
Description
element-hq/element-web#23432
Synapse injects
“
(5o0a
) punycode into og:description of twitter URLs, leading to broken links.Steps to reproduce
Forwarded Element issue:
Steps to reproduce
""
Outcome
What did you expect?
links work
What happened instead?
Demo URLs for reference:
https://twitter.com/FXNetworks/status/1577704289476128771
https://twitter.com/mischiefanimals/status/1576904037449969664
Operating system
arch
Application version
Element Nightly version: 2022100501 Olm version: 3.2.12
How did you install the app?
aur
Homeserver
private
Synapse Version
1.68.0
Installation Method
Docker (matrixdotorg/synapse)
Platform
debian, matrix-docker-ansible-deploy
Relevant log output
Anything else that would be useful to know?
https://user-images.githubusercontent.com/2403652/194146661-044758f9-fefd-4744-9ef2-bd3aec094d40.png
The text was updated successfully, but these errors were encountered: