-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Filter out unwanted user_agents from udv. #16124
Conversation
AND udv.timestamp IS NULL | ||
AND u.user_agent !='sync-v3-proxy-' | ||
AND u.user_agent !='matrix-rust-sdk' | ||
AND users.is_guest=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the definition of the user_ips
table it appears that there is one row per user/device/ip, and as such we only store one user agent per device. The upshot of this is that we won't necessarily pick up e.g. Element X
user agents if its been clobbered by a sliding sync user agent, which I think will materially skew the stats here?
I wonder if we should instead be filtering out these user agents on insertion instead?
I'm still happy to merge this as is for a first cut if it'd be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. No point in merging as is then. I'll edit to remove the UAs from insertion.
Note the 'matrix-rust-sdk' entries generated from element x should reduce to zero following element-hq/element-x-ios#1507
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR I think you want to do the filtering here:
synapse/synapse/storage/databases/main/client_ips.py
Lines 573 to 604 in 54a51ff
async def insert_client_ip( | |
self, | |
user_id: str, | |
access_token: str, | |
ip: str, | |
user_agent: str, | |
device_id: Optional[str], | |
now: Optional[int] = None, | |
) -> None: | |
if not now: | |
now = int(self._clock.time_msec()) | |
key = (user_id, access_token, ip) | |
try: | |
last_seen = self.client_ip_last_seen.get(key) | |
except KeyError: | |
last_seen = None | |
# Rate-limited inserts | |
if last_seen is not None and (now - last_seen) < LAST_SEEN_GRANULARITY: | |
return | |
self.client_ip_last_seen.set(key, now) | |
if self._update_on_this_worker: | |
await self.populate_monthly_active_users(user_id) | |
self._batch_row_update[key] = (user_agent, device_id, now) | |
else: | |
# We are not the designated writer-worker, so stream over replication | |
self.hs.get_replication_command_handler().send_user_ip( | |
user_id, access_token, ip, user_agent, device_id, now | |
) |
Revert "filter out unwanted user_agents" This reverts commit 6fabc26.
I reworked the PR to focus on exclusion of the proxy user_agent from user_ip. I dropped filtering out the rust-sdk user agent firstly because there are legitimate non-EX use cases where the rust-sdk UA is correct and secondly because the bug in EX has been fixed. |
PR to filter out references to sliding sync proxy and rust-sdk from the user_daily_visits table to ensure that Element X can be represented fully.
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.(run the linters)