You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For link aggregators like Reddit, the ripper shouldn't have to know every URL that could show up. However, Ripme broadly knows.
It might be a good idea to have a globally-applicable (to all rippers) table for rate limiting rules on URLs (in terms of not ripping a new URL for a given domain until the rate limit time has expired) so that rate limiting isn't piecemeal in each ripper's logic.
Broadly-speaking, we don't have any real rate limiting system in place, and any rate limiting has been ad hoc in simply delaying when the link gets added to the queue, which isn't actually the same as delaying after a download completes (consider if the "rate limit" delay is 3 seconds before the link is added to the download queue, but downloads take 10 seconds -- we then pretty immediately have no real rate limiting in place anymore this way).
The text was updated successfully, but these errors were encountered:
metaprime
changed the title
Add a globally-applicable table for rate limiting rules on URLs
[Proposal] Add a globally-applicable table for rate limiting rules on URLs
Jan 6, 2025
Also, consider the update scenario. Self-rate-limiting shouldn't be necessary if we check a URL, discover that we already have it, and don't actually rip it. So, we should have a mechanism in place to apply the rate limiting rules only after we successfully rip something.
For link aggregators like Reddit, the ripper shouldn't have to know every URL that could show up. However, Ripme broadly knows.
It might be a good idea to have a globally-applicable (to all rippers) table for rate limiting rules on URLs (in terms of not ripping a new URL for a given domain until the rate limit time has expired) so that rate limiting isn't piecemeal in each ripper's logic.
Broadly-speaking, we don't have any real rate limiting system in place, and any rate limiting has been ad hoc in simply delaying when the link gets added to the queue, which isn't actually the same as delaying after a download completes (consider if the "rate limit" delay is 3 seconds before the link is added to the download queue, but downloads take 10 seconds -- we then pretty immediately have no real rate limiting in place anymore this way).
The text was updated successfully, but these errors were encountered: