-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce load of preview fetching on third-party servers #23662
Comments
For my part, I think the mitigation which best strikes a balance of non-controversial, effective, and easy to implement, is robots.txt. It allows server operators to have fine-grained control so that they can indicate which routes are expensive to render. For my servers, I have already had expensive routes excluded in robots.txt for a long time to keep crawlers away; this has been a well-known mitigation for automated requests to expensive routes for some time now. I also think that someone within Mastodon should be willing to take responsibility for implementing these mitigations. I am generally sympathetic to the incentives of FOSS projects and the nature of the volunteer labor pool, but in the case where software is causing disruptions to other services it's important for the maintainers of that software to accept responsibility for it -- expecting the victims of a DDoS to put up the labor to mitigate the DDoS is just rubbing salt into the wound. If no one is available to do the work, fetching previews should simply be disabled entirely until such a time as a mitigation can be implemented. |
As a random passerby I can say, that a server, the author of a message belongs to, should be responsible to retrieve this information (subject to a timeout) and only then propagate the message along with this data. Other servers should not retrieve the data themselves. You have to trust it, sure, buy you already have to trust it's not altering messages… And you might as well block bad actors on your server when that happens. |
I wrote a document on the topic a few months ago, which I think is quite extensive on the subject: https://gist.github.com/renchap/3ae0df45b7b4534f98a8055d91d52186 I tried to highlight the challenges related to this, as well as the possible implementations and their drawbacks. |
Perhaps a small point, but what happens if the original web page linked to is updated in the meantime so that the preview changes (as sometimes happens with news articles for example)? |
Another path that could be taken here is to have a link preview service, which can cache the preview. That'd mean rather than every mastodon server going to the origin server to prepare a preview, the mastodon servers delegate that to a trusted preview service. E.g., a lot of large companies use a service like https://embed.ly for generating previews. Perhaps there's room for a setup based on low-cost infrastructure (e.g., Cloudflare workers or similar on-demand compute environments) that could be deployed? That way a server admin could choose to trust one of the preview services, instead of having their instance do the fetching (of course, you could always opt out, with the caveat that your service would amplify traffic potentially). Logic here is that there'd be less preview services deployed than mastodon servers, and that'd prevent the thundering herd. Trust would be between mastodon server admins and the preview services, a preview service that doesn't provide accurate previews would quickly find their reputation trashed & instances would switch to a different provider. Edit: this is Solution 5 in @renchap's gist |
A few more thoughts:
To lessen the impact of an untrustworthy server, something that might make sense is pre-fetch the link while the author is writing their posts, offer them a preview, and have them select it. Then, somehow serialize that preview when federating it. On the receiving end, if you receive a post with a preview:
Whenever a local user authors a post with a link to a known but non-trusted preview, fetch the link yourself to get a trusted preview. Whenever a non-trusted preview is eligible for trending, fetch the link yourself and update the trusted preview. Compared to the current state, this would require (possibly breaking—in that clients not implementing this change may not get previews to generate) changes in the REST API to chose which link to preview (if any at all) and S2S protocol changes to communicate that information. It would introduce an attack vector in which a remote user could generate a misleading preview for their own posts, but without introducing a cache poisoning issue. An attacker could still purposefully send two conflicting previews to a server to cause it to fetch the link, but it should greatly reduce the impact of normal usage. |
…d citations from https://everything2.com/title/Slashdot+Effect http://www.catb.org/jargon/html/T/thundering-herd-problem.html https://techcrunch.com/2009/09/09/rsscloud-vs-pubsubhubbub-why-the-fat-pings-win/ and most recently mastodon/mastodon#23662" to "See Also"
One option for @renchap's "Solution 6: Design and implement a protocol for websites to provide a signed preview" would be for Open Graph to define an "integrity" structured property on image/video/audio properties to allow for zero-trust redistribution of bandwidth-heavy media resources (similar to what's used used for Subresource Integrity). e.g.:
Unfortunately, the Facebook & Google groups linked for discussion from the official Open Graph page seem to be derelict and defunct respectively, so I'm not sure of the best way to get this proposed and adopted. Edit: Another option would be to use the HTTP |
+1. Also, having this information and federating it would improve compatibility with federated news aggregator platforms like Kbin/Lemmy, which don't have a reliable way of extracting the "best link" from an incoming Mastodon post (LemmyNet/lemmy#2889 (comment)). If this information was added to a status as an explicitly attached Link, e.g.:
...this would be helpful in determining both whether a link preview should be generated or not, and which link the author intended to have shown as a preview (in the case of multiple links in a post). (Presumably the preview property could also be added to an attached link for federating this other card preview information?) |
@ryanfb FWIW, in theory Open Graph uses a subset of RDFa, though which subset exactly was never clear to me (full RDFa is a bear to implement, duplicating body content in Whether there’s a good place to talk to people interested in such things, I don’t know—Facebook’s relationship with the wider Web has gotten a lot less cuddly since 2010. Perhaps those who once maintained the spec repo could be contacted? |
Some questions for the group:
There are many more questions that could be asked, particularly from the perspectives of readers and admins, but I thought I'd start with these. The point is that the discussions so far seem a little bit focused on the current implementation which seems based on an understanding of link previews as an automatic aid to readers whereas I believe link previews should primarily be driven by posters during authoring since this is the path to full trust in the content. As a post creator, I want to see link previews, and I don't want them to change after I have posted: they should be authored content. As other people on this thread have indicated, some of the answers to the technical questions are clearer when this is the starting point of the feature. |
Yet another site – this time a major bookmarking service – is considering blocking Mastodon because of the DDoS effect: https://twitter.com/pinboard/status/1698775481292832978 Please fix this. |
Moot. It should be up to the preference of the person viewing the post, and the client they use. Also, can you stop a client creating a link preview if it wants to?
Expect? No, it depends on if the client I'm using has the capability. It's certainly a nice to have, though.
Yes.
Moot, because of (3).
If by non-authored content you mean previews, then I think it's clear that link previews aren't part of the authored content. Right? Link previews are a thing across the web, they display what's on the linked page. They're not part of the post.
See (1). |
After 6 years of no movement on Mastodon's DDoS-by-design, and a string of outages caused by people just tooting links to SourceHut, I have blacklisted Mastodon User-Agents across SourceHut's services. |
This is on our roadmap for the next version FYI, expect some news soon. |
That is excellent news, will be keeping an eye out. Thanks! |
Solutions 2–6 are all variants of the same idea, a shared cache. Solution 1 ("On-demand generation of previews") is a variant of the idea of just adding a delay. Further variants are simple to come up with and implement. For example, the most obvious solution to reduce the intensity of the load is to introduce a random delay where each instance would wait e.g. between 0 seconds and 30 minutes before doing exactly what it's doing now. (I'm assuming this is rather simple to implement with scheduled jobs but I might be wrong.) It's possible to come up with other triggers and thresholds which would consider the instance's specific circumstances to reduce the number of "wasteful" preview generations (for example, on many single-user instances a preview may never be seen). It might be possible to collect statistics to measure the extent of the current waste and estimate the impact of the alternatives. (For example, instrument Mastodon by adding metrics for the hit rates of the previews, as you'd do for a cache.) |
I run a small server and use relays to fill the federated timeline. Most of those posts are probably never displayed, so only fetching previews for posts for the other timelines (home, trending, lists, local) and replies to those posts would greatly reduce the load caused by servers like mine. For the federated timeline, fetch previews for the last 200 messages or so when a user requests its display. |
For the shared cache idea, the server would still need to check the original source. The cached version could be
How about a different take: Warm preview generation based on active users, cold grey cache distribution to less active instances. What I mean with this:
¹ By many more, I think about a scale. For example an instance with 1 active user and an instance of 100 active users would be "the same range of active users". An instance with 101 would be considered 'many more' and so would generate its own preview. With this, we would have a pretty limited first series of requests, as mostly bigger instances will do the first request. Other instances will have a preview ASAP, as a grey preview that will change to an actual preview as soon as someone clicks on it. Smaller instances also wont go down by cache requests, as only same sized and smaller will even pull the cache. We still face the problems of a distributed cache, but as the cache version is shown grey and will probably only last for a while, this should'nt be a huge problem, especially because only smaller instances will less active users will see them. |
It will be nice to have a solution for this problem... |
Based on the available data, In my humble opinion, the jiggle delay is simply a defect that was added to try to game the system and slow it down, and it should be fully removed as it distracts from the real issue, that of creating an effective caching solution and not re-rendering data that has not changed. The core issue here seems to be that the generated data shouldn't be rendered more than once before being added to a cache with a long ttl, so that it doesn't need to be rendered again, and so requests for it can simply be served from that cache. Any thoughts about a global cache that spans multiple tlds or domains needs to also consider using a strong HMAC and algorithm table that can be independently validated by 3rd parties using the named validation mechanism to protect against injection and enforce a chain of trust on the data content, so that cached records/data blobs that are fetched from it can be validated as really from the true origin via the hMac and shown to be what they claim to be, or marked as corrupted/edited/malicious At that point, simply adding the original contents last updated timestamp would allow you to create a distributed cache that would able to help allow the entire system to scale ( domain, resource url, hmac, byte size, timestamp ) Because your primary key would include the origins version data, the Hmac data for its identity, byte size, and the most recent timestamp. This would also make expiration of old records easier. However, I don't believe that a distributed cache in this instance is really needed for the bulk of servers, and if it did exist, solving the initial issue of not generating the data multiple times on the local instance would allow that cache to be more effective long term because then it could simply consume those records, instead of forcing the server to recreate them. The system shouldn't be hitting your database to get this data or generate that data blob more than once if it hasn't been edited anyway. tldr; Wouldn't just adding a X second ttl based cache and then adding every new post to it so the fediverse will get the data from that cache be enough? Ensure that cache has the resources it needs based on the number of new posts on your server, And when you generate links that get pumped out to the fediverse, direct requests to the cache. |
The AT protocol (Bluesky) uses IPFS to fetch content by hash from anywhere. In its foundation, IPFS is a distributed caching system. |
I'd go farther than that. Don't fetch a preview unless a post in the federated timeline that doesn't meet your criteria has been opened. I would also add fetching of preview cards for saved hashtags, although, that should be included in the home timeline fetch already by default. I'm fetching so many posts a day and will never see most of that on my self-hosted instance unless I do a search. I have come to rely on my instance being my first stop for searching, even over Google. Can't wait for the ability for query string searches so I can set it up in my browser as a default search provider. See Help Wanted: Use /search for search |
After being pushed back version after version, this was quietly removed from the roadmap last week. This is very disappointing, to say the least. The thundering herd behaviour is obviously wrong. |
@dpk the event here is a bit misleading, all milestones were removed: https://github.com/mastodon/mastodon/milestones?state=open |
I removed the milestones are they were not indicative of our real roadmap. We did more research on this topic in the last few weeks, and, if possible with the time we have, will try to move forward on this topic for 4.4. |
While waiting for Proper Solution(tm) it looks to me as if there's a way for us Fediverse instance owners to voluntarily lessen the problem. The issue here doesn't seem to be with the top #10 or top #50 instances, where most users reside, but the fact that there are so many smaller instances - even one person ones (I run a family instance myself for example). We, the "small instance owners" could perhaps implement mitigation purely on our side right now? If so, a helpful guide that we can spread throughout #MastoAdmin et al would be a great start. From this thread it looks like the 0-60 random timeout is a place to look into, but I'm sure there are more ways. I already change the post length limit in my docker build and adding more things to patch there seems easy. |
So? |
It would help to fetch brotli (pre)compressed HTML pages (Accept-Encoding: gzip, br) and alternate image media (eg WEBP, Accept: image/webp,/) where available to reduce bandwidth. When I recognise a stampede request incoming by the lack of Referer I share a lower-fi JPEG or PNG, but I also have the option to share a much smaller lower-fi WEBP at little effort if the second of these (Accept) is dealt with. Yes, reducing the overall number and density of requests would be the main win, but making each request less expensive would also help and is orthogonal. |
Steps to reproduce the problem
Posting an image on Mastodon causes hundreds or thousands of federated servers to fetch the preview details all at once, causing a high load for the target server and effectively creating the by now well-known "Mastodon DDoS" effect. For static websites this may not be much of an issue but it routinely brings down larger pages or those which have dynamic server-rendered content.
Expected behaviour
No DDoS
Actual behaviour
DDoS
Detailed description
The discussion at #4486 has been going on for six years and is quite a mess. I'm opening this ticket to start fresh and collect the information scattered throughout the thread into one place so that the discussion doesn't have to keep going in circles.
Rationale
It is the responsibility of software like Mastodon to be a good neighbor on the internet. DDoSing others is not being a good neighbor! It's important to figure out how to prevent this issue from occurring.
IMPORTANT: Let's make this discussion better than the last one. For Mastodon devs: don't blame the victims. It's not the website's fault that Mastodon is DDoSing them. For server operators: Mastodon is a volunteer-run free software project. Be mindful of that.
Present mitigations
Currently a random jitter is added of between 0 and 60 seconds after a federated server is made aware of a post which includes a link, before it fetches the preview details. This does not seem to be sufficient to prevent the DDoS effect from occurring.
Suggested mitigations
The discussion has focused on two main suggestions for a fix.
Federating previews
The original poster's instance can fetch the preview details and attach them to the message, federating the preview details without requiring other servers to fetch it themselves. Criticism of this solution is mainly focused around the fact that Mastodon is a zero-trust environment, so instance A cannot trust instance B's word that a preview accurately represents the URL.
Because the original post is always fetched when a post is federated, the trust space can be reduced to the origin server alone; intermediates need not be trusted. Opinions on the depth of this problem have ranged from "it's no different from posting an image" to "we absolutely cannot trust anyone ever for any reason".
Answers proposed to the objections of trust have included random sampling wherein the preview is fetched 1 in N times, and if it's found to be inconsistent with the federated preview, some action can be taken, such as setting a flag on that instance which causes future (and past?) previews to be fetched unconditionally from that server, automatically making a report to the instance admins of suspected foul play, or federating the flag so that other servers can force a sample from that instance when foul play is suggested.
Any of these changes would likely involve a slow roll-out across the fediverse. Sometimes link previews might not work for older clients as a consequence. My take: I believe this is quite acceptable, it's a small price to pay for correcting this behavior. User experience does not outweigh "don't DDoS people". Furthermore, a slow roll-out will naturally imply that the problem does not get fixed overnight, but rather that the behavior corrects gradually over time as the fix is rolled out across the fediverse -- not an issue imo.
Reducing load
No federated previews, but instances don't immediately fetch the preview. Most to least effective mitigations along this line of thought:
Some combination of these mitigations is also possible, for instance the jitter could be increased to five minutes, but done immediately if the post shows up in the UI.
Specifications
n/a, all versions affected
The text was updated successfully, but these errors were encountered: