-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
Create the url_cache index on local_media_repository as a background update, so that we can detect whether we are on sqlite or not and create a partial or complete index accordingly. To avoid running the cleanup job before we have built the index, add a bailout which will defer the cleanup if the bg updates are still running. Fixes #2572.
Hum. We're going to end up doing the background update on the main synapse, which means my carefully-laid plans not to do the cleanup until the index has been rebuilt will be for naught. Any better ideas? (Not that it really matters, since matrix.org only has a thousand rows in this table; still it feels like a shame for anybody else who might be running the media_repo in a worker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to remove the delta file for creating the index?
@@ -426,8 +431,7 @@ def _expire_url_cache_data(self): | |||
|
|||
yield self.store.delete_url_cache_media(removed_media) | |||
|
|||
if removed_media: | |||
logger.info("Deleted %d media from url cache", len(removed_media)) | |||
logger.info("Deleted %d media from url cache", len(removed_media)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have we removed the conditional? We added it so that we wouldn't spam the logs with lots of pointless log lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it makes a nice closure to
logger.info("Running url preview cache expiry") |
I don't think that two log lines every ten seconds constitutes a significant amount of spam. It's useful to know it's running; if nothing else it highlights the fact that it was running four times every ten seconds (leading to #2701).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People specifically added a PR as it was annoying, fwiw. Once every ten seconds isn't too bad by itself, but we have quite a few background tasks that should probably be logging to the same extent.
Personally, I'd probably put the first log line after we ask the DB for expired URLs and then log both if any expired media ids were returned or we're at debug logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People specifically added a PR as it was annoying, fwiw.
wat? people added a PR?
we have quite a few background tasks that should probably be logging to the same extent.
Probably. I'm not going to fix them in this PR though.
Personally, I'd probably put the first log line after we ask the DB for expired URLs and then log both if any expired media ids were returned or we're at debug logging.
I'm really unhappy about background tasks that run silently. Not least because it leads to the sort of fuck-up we had here where it's running much more often than we think it is and it's only good luck which stops the processes racing against each other and doing real damage.
Not really, other than making |
https://github.com/matrix-org/synapse/pull/2697/files#diff-487f7332efcd28fdefeb96ce1ef0dd8eR19 removes the offending update. I don't think we want to remove the whole file. |
That sounds sane. I'll look into it. |
Oh, I'm blind. |
so that the right thing happens on workers.
@erikjohnston ptal |
…13657) Media downloaded as part of a URL preview is normally deleted after two days. However, while a background database migration is running, the process is stopped. A long-running database migration can therefore cause the media store to fill up with old preview files. This logic was added in #2697 to make sure that we didn't try to run the expiry without an index on `local_media_repository.created_ts`; the original logic that needs that index was added in #2478 (in `get_url_cache_media_before`, as amended by 93247a4), and is still present. Given that the background update was added before Synapse v1.0.0, just drop this check and assume the index exists.
Create the url_cache index on local_media_repository as a background update, so
that we can detect whether we are on sqlite or not and create a partial or
complete index accordingly.
To avoid running the cleanup job before we have built the index, add a bailout
which will defer the cleanup if the bg updates are still running.
Fixes #2572.