-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Track number of hosts affected by the rate limiter #13541
Track number of hosts affected by the rate limiter #13541
Conversation
@@ -0,0 +1 @@ | |||
Add metrics to track how the rate limiter is affecting requests (sleep/reject). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same changelog as #13534 so they merge together.
…littlemods/metrics-rate-limit-affected-host-count Conflicts: synapse/util/ratelimitutils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable; minor suggestions
…-host-count Conflicts: synapse/util/ratelimitutils.py
Thanks for the review @erikjohnston and @reivilibre 🐘 |
lambda: sum( | ||
ratelimiter.should_reject() | ||
for ratelimiter in self.ratelimiters.values() | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to always evalulate to 0
.
Even though we see individual requests being rejected:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a note, when I log should_reject()
in Complement, I see should_reject queue_size=0 self.reject_limit=99999
(the high reject_limit
is from docker/complement/conf/workers-shared-extra.yaml.j2#L36
) for all of the calls. Not sure I am exercising this enough in parallel to see it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to visualise this at https://grafana.matrix.org/d/000000012/synapse?orgId=1&viewPanel=225&var-datasource=default&var-bucket_size=$__auto_interval_bucket_size&var-instance=matrix.org&var-job=All&var-index=All but I still see 0 for all rejected
metrics, even with #13649.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These numbers look consistent with data I saw when investigating in the manhole (sum(x.should_reject() for x in hs.get_federation_ratelimiter().ratelimiters.values())
). It might be that we need to wait some time for queues to grow before we start rejecting requests. I'd like to leave confirming/investigating this on your plate @MadLittleMods if that's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DMRobertson Looks like #13649 hasn't fixed it if it has already been rolled out to matrix.org
😩 (still all 0)
Here are the rate limiter graph section I have:
https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1661854473240&to=1661876073240&viewPanel=220
I'll create an actual issue to track this more long-term now ⏩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#13670 ⏩
Synapse 1.66.0rc1 (2022-08-23) ============================== This release removes the ability for homeservers to delegate email ownership verification and password reset confirmation to identity servers. This removal was originally planned for Synapse 1.64, but was later deferred until now. See the [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details. Features -------- - Improve validation of request bodies for the following client-server API endpoints: [`/account/password`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpassword), [`/account/password/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpasswordemailrequesttoken), [`/account/deactivate`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountdeactivate) and [`/account/3pid/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidemailrequesttoken). ([\#13188](#13188), [\#13563](#13563)) - Add forgotten status to [Room Details Admin API](https://matrix-org.github.io/synapse/latest/admin_api/rooms.html#room-details-api). ([\#13503](#13503)) - Add an experimental implementation for [MSC3852 (Expose user agents on `Device`)](matrix-org/matrix-spec-proposals#3852). ([\#13549](#13549)) - Add `org.matrix.msc2716v4` experimental room version with updated content fields. Part of [MSC2716 (Importing history)](matrix-org/matrix-spec-proposals#2716). ([\#13551](#13551)) - Add support for compression to federation responses. ([\#13537](#13537)) - Improve performance of sending messages in rooms with thousands of local users. ([\#13522](#13522), [\#13547](#13547)) Bugfixes -------- - Faster room joins: make `/joined_members` block whilst the room is partial stated. ([\#13514](#13514)) - Fix a bug introduced in Synapse 1.21.0 where the [`/event_reports` Admin API](https://matrix-org.github.io/synapse/develop/admin_api/event_reports.html) could return a total count which was larger than the number of results you can actually query for. ([\#13525](#13525)) - Fix a bug introduced in Synapse 1.52.0 where sending server notices fails if `max_avatar_size` or `allowed_avatar_mimetypes` is set and not `system_mxid_avatar_url`. ([\#13566](#13566)) - Fix a bug where the `opentracing.force_tracing_for_users` config option would not apply to [`/sendToDevice`](https://spec.matrix.org/v1.3/client-server-api/#put_matrixclientv3sendtodeviceeventtypetxnid) and [`/keys/upload`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3keysupload) requests. ([\#13574](#13574)) Improved Documentation ---------------------- - Add `openssl` example for generating registration HMAC digest. ([\#13472](#13472)) - Tidy up Synapse's README. ([\#13491](#13491)) - Document that event purging related to the `redaction_retention_period` config option is executed only every 5 minutes. ([\#13492](#13492)) - Add a warning to retention documentation regarding the possibility of database corruption. ([\#13497](#13497)) - Document that the `DOCKER_BUILDKIT=1` flag is needed to build the docker image. ([\#13515](#13515)) - Add missing links in `user_consent` section of configuration manual. ([\#13536](#13536)) - Fix the doc and some warnings that were referring to the nonexistent `custom_templates_directory` setting (instead of `custom_template_directory`). ([\#13538](#13538)) Deprecations and Removals ------------------------- - Remove the ability for homeservers to delegate email ownership verification and password reset confirmation to identity servers. See [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details. Internal Changes ---------------- - Update the rejected state of events during de-partial-stating. ([\#13459](#13459)) - Avoid blocking lazy-loading `/sync`s during partial joins due to remote memberships. Pull remote memberships from auth events instead of the room state. ([\#13477](#13477)) - Refuse to start when faster joins is enabled on a deployment with workers, since worker configurations are not currently supported. ([\#13531](#13531)) - Allow use of both `@trace` and `@tag_args` stacked on the same function. ([\#13453](#13453)) - Instrument the federation/backfill part of `/messages` for understandable traces in Jaeger. ([\#13489](#13489)) - Instrument `FederationStateIdsServlet` (`/state_ids`) for understandable traces in Jaeger. ([\#13499](#13499), [\#13554](#13554)) - Track HTTP response times over 10 seconds from `/messages` (`synapse_room_message_list_rest_servlet_response_time_seconds`). ([\#13533](#13533)) - Add metrics to track how the rate limiter is affecting requests (sleep/reject). ([\#13534](#13534), [\#13541](#13541)) - Add metrics to time how long it takes us to do backfill processing (`synapse_federation_backfill_processing_before_time_seconds`, `synapse_federation_backfill_processing_after_time_seconds`). ([\#13535](#13535), [\#13584](#13584)) - Add metrics to track rate limiter queue timing (`synapse_rate_limit_queue_wait_time_seconds`). ([\#13544](#13544)) - Update metrics to track `/messages` response time by room size. ([\#13545](#13545)) - Refactor methods in `synapse.api.auth.Auth` to use `Requester` objects everywhere instead of user IDs. ([\#13024](#13024)) - Clean-up tests for notifications. ([\#13471](#13471)) - Add some miscellaneous comments to document sync, especially around `compute_state_delta`. ([\#13474](#13474)) - Use literals in place of `HTTPStatus` constants in tests. ([\#13479](#13479), [\#13488](#13488)) - Add comments about how event push actions are rotated. ([\#13485](#13485)) - Modify HTML template content to better support mobile devices' screen sizes. ([\#13493](#13493)) - Add a linter script which will reject non-strict types in Pydantic models. ([\#13502](#13502)) - Reduce the number of tests using legacy TCP replication. ([\#13543](#13543)) - Allow specifying additional request fields when using the `HomeServerTestCase.login` helper method. ([\#13549](#13549)) - Make `HomeServerTestCase` load any configured homeserver modules automatically. ([\#13558](#13558))
Synapse 1.66.0 (2022-08-31) =========================== No significant changes since 1.66.0rc2. This release removes the ability for homeservers to delegate email ownership verification and password reset confirmation to identity servers. This removal was originally planned for Synapse 1.64, but was later deferred until now. See the [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details. Deployments with multiple workers should note that the direct TCP replication configuration was deprecated in Synapse v1.18.0 and will be removed in Synapse v1.67.0. In particular, the TCP `replication` [listener](https://matrix-org.github.io/synapse/v1.66/usage/configuration/config_documentation.html#listeners) type (not to be confused with the `replication` resource on the `http` listener type) and the `worker_replication_port` config option will be removed . To migrate to Redis, add the [`redis` config](https://matrix-org.github.io/synapse/v1.66/workers.html#shared-configuration), then remove the TCP `replication` listener from config of the master and `worker_replication_port` from worker config. Note that a HTTP listener with a `replication` resource is still required. See the [worker documentation](https://matrix-org.github.io/synapse/v1.66/workers.html) for more details. Synapse 1.66.0rc2 (2022-08-30) ============================== Bugfixes -------- - Fix a bug introduced in Synapse 1.66.0rc1 where the new rate limit metrics were misreported (`synapse_rate_limit_sleep_affected_hosts`, `synapse_rate_limit_reject_affected_hosts`). ([\matrix-org#13649](matrix-org#13649)) Synapse 1.66.0rc1 (2022-08-23) ============================== Features -------- - Improve validation of request bodies for the following client-server API endpoints: [`/account/password`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpassword), [`/account/password/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpasswordemailrequesttoken), [`/account/deactivate`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountdeactivate) and [`/account/3pid/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidemailrequesttoken). ([\matrix-org#13188](matrix-org#13188), [\matrix-org#13563](matrix-org#13563)) - Add forgotten status to [Room Details Admin API](https://matrix-org.github.io/synapse/latest/admin_api/rooms.html#room-details-api). ([\matrix-org#13503](matrix-org#13503)) - Add an experimental implementation for [MSC3852 (Expose user agents on `Device`)](matrix-org/matrix-spec-proposals#3852). ([\matrix-org#13549](matrix-org#13549)) - Add `org.matrix.msc2716v4` experimental room version with updated content fields. Part of [MSC2716 (Importing history)](matrix-org/matrix-spec-proposals#2716). ([\matrix-org#13551](matrix-org#13551)) - Add support for compression to federation responses. ([\matrix-org#13537](matrix-org#13537)) - Improve performance of sending messages in rooms with thousands of local users. ([\matrix-org#13522](matrix-org#13522), [\matrix-org#13547](matrix-org#13547)) Bugfixes -------- - Faster room joins: make `/joined_members` block whilst the room is partial stated. ([\matrix-org#13514](matrix-org#13514)) - Fix a bug introduced in Synapse 1.21.0 where the [`/event_reports` Admin API](https://matrix-org.github.io/synapse/develop/admin_api/event_reports.html) could return a total count which was larger than the number of results you can actually query for. ([\matrix-org#13525](matrix-org#13525)) - Fix a bug introduced in Synapse 1.52.0 where sending server notices fails if `max_avatar_size` or `allowed_avatar_mimetypes` is set and not `system_mxid_avatar_url`. ([\matrix-org#13566](matrix-org#13566)) - Fix a bug where the `opentracing.force_tracing_for_users` config option would not apply to [`/sendToDevice`](https://spec.matrix.org/v1.3/client-server-api/#put_matrixclientv3sendtodeviceeventtypetxnid) and [`/keys/upload`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3keysupload) requests. ([\matrix-org#13574](matrix-org#13574)) Improved Documentation ---------------------- - Add `openssl` example for generating registration HMAC digest. ([\matrix-org#13472](matrix-org#13472)) - Tidy up Synapse's README. ([\matrix-org#13491](matrix-org#13491)) - Document that event purging related to the `redaction_retention_period` config option is executed only every 5 minutes. ([\matrix-org#13492](matrix-org#13492)) - Add a warning to retention documentation regarding the possibility of database corruption. ([\matrix-org#13497](matrix-org#13497)) - Document that the `DOCKER_BUILDKIT=1` flag is needed to build the docker image. ([\matrix-org#13515](matrix-org#13515)) - Add missing links in `user_consent` section of configuration manual. ([\matrix-org#13536](matrix-org#13536)) - Fix the doc and some warnings that were referring to the nonexistent `custom_templates_directory` setting (instead of `custom_template_directory`). ([\matrix-org#13538](matrix-org#13538)) Deprecations and Removals ------------------------- - Remove the ability for homeservers to delegate email ownership verification and password reset confirmation to identity servers. See [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details. Internal Changes ---------------- - Update the rejected state of events during de-partial-stating. ([\matrix-org#13459](matrix-org#13459)) - Avoid blocking lazy-loading `/sync`s during partial joins due to remote memberships. Pull remote memberships from auth events instead of the room state. ([\matrix-org#13477](matrix-org#13477)) - Refuse to start when faster joins is enabled on a deployment with workers, since worker configurations are not currently supported. ([\matrix-org#13531](matrix-org#13531)) - Allow use of both `@trace` and `@tag_args` stacked on the same function. ([\matrix-org#13453](matrix-org#13453)) - Instrument the federation/backfill part of `/messages` for understandable traces in Jaeger. ([\matrix-org#13489](matrix-org#13489)) - Instrument `FederationStateIdsServlet` (`/state_ids`) for understandable traces in Jaeger. ([\matrix-org#13499](matrix-org#13499), [\matrix-org#13554](matrix-org#13554)) - Track HTTP response times over 10 seconds from `/messages` (`synapse_room_message_list_rest_servlet_response_time_seconds`). ([\matrix-org#13533](matrix-org#13533)) - Add metrics to track how the rate limiter is affecting requests (sleep/reject). ([\matrix-org#13534](matrix-org#13534), [\matrix-org#13541](matrix-org#13541)) - Add metrics to time how long it takes us to do backfill processing (`synapse_federation_backfill_processing_before_time_seconds`, `synapse_federation_backfill_processing_after_time_seconds`). ([\matrix-org#13535](matrix-org#13535), [\matrix-org#13584](matrix-org#13584)) - Add metrics to track rate limiter queue timing (`synapse_rate_limit_queue_wait_time_seconds`). ([\matrix-org#13544](matrix-org#13544)) - Update metrics to track `/messages` response time by room size. ([\matrix-org#13545](matrix-org#13545)) - Refactor methods in `synapse.api.auth.Auth` to use `Requester` objects everywhere instead of user IDs. ([\matrix-org#13024](matrix-org#13024)) - Clean-up tests for notifications. ([\matrix-org#13471](matrix-org#13471)) - Add some miscellaneous comments to document sync, especially around `compute_state_delta`. ([\matrix-org#13474](matrix-org#13474)) - Use literals in place of `HTTPStatus` constants in tests. ([\matrix-org#13479](matrix-org#13479), [\matrix-org#13488](matrix-org#13488)) - Add comments about how event push actions are rotated. ([\matrix-org#13485](matrix-org#13485)) - Modify HTML template content to better support mobile devices' screen sizes. ([\matrix-org#13493](matrix-org#13493)) - Add a linter script which will reject non-strict types in Pydantic models. ([\matrix-org#13502](matrix-org#13502)) - Reduce the number of tests using legacy TCP replication. ([\matrix-org#13543](matrix-org#13543)) - Allow specifying additional request fields when using the `HomeServerTestCase.login` helper method. ([\matrix-org#13549](matrix-org#13549)) - Make `HomeServerTestCase` load any configured homeserver modules automatically. ([\matrix-org#13558](matrix-org#13558)) # -----BEGIN PGP SIGNATURE----- # # iQGzBAABCgAdFiEEWMTnW8Z8khaaf90R+84KzgcyGG8FAmMPT8QACgkQ+84Kzgcy # GG9CUAv+Pv/iDpE2jKlV7zQ/cagaKCGsFK5jy0+K9Wr215nP89tuhU37bJXsgvVu # GP3A8k1c/ENPhXwYHLCnnxV3jick1FuVE0W6h0j2PMYeIGNCQhDswytnsQO4JExg # fGLL4ygCzpe8bFX9+mhIM4z8xkZjZX3lIa8CN2LtRLIo0m7qoT1ZWqdt7kAjj5yL # XMk+3Y1yq/Y4SHHqgKurBNdwNcwnv7ynchWxTYa12WVTINt26dLV0Syk3p8u2SLl # 5YNzcDs2TAM7+VxAu7E0AQl426+Ufi122Oj1ZBUG2FxTPLH8Xr18cN2M/at6WxoX # 8pOkGiuahKKvahw1iCoHAGIC66gFIPxBE9xW4R2SKrQtG4sDuKJI0kvunRV8+cy5 # TuJ9cmdDmJR2vj3P3OULqLXGkWsGNJqfZZF8OWkHEI8LUIXZLrAZocFtlonkr9rV # Y8r8LxL8Id1rbHAnCXcJnYdaJ6ol0RIObDFpitY/D8BDUONVw/byeOyAEkq/XPrZ # Ke/9K8sy # =eg1L # -----END PGP SIGNATURE----- # gpg: Signature made Wed Aug 31 13:10:44 2022 BST # gpg: using RSA key 58C4E75BC67C92169A7FDD11FBCE0ACE0732186F # gpg: Can't check signature: No public key # Conflicts: # synapse/api/auth.py # synapse/push/baserules.py # synapse/push/bulk_push_rule_evaluator.py # synapse/push/push_rule_evaluator.py # synapse/storage/databases/main/event_push_actions.py # tests/server_notices/test_resource_limits_server_notices.py
Track number of hosts affected by the rate limiter so we can differentiate one really noisy homeserver from a general ratelimit tuning problem across the federation.
Follow-up to #13534
Part of #13356
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.Pull request includes a sign off(run the linters)