Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Filter out unwanted user_agents from udv. #16124

Merged
merged 5 commits into from
Aug 23, 2023

Conversation

neilisfragile
Copy link
Contributor

@neilisfragile neilisfragile commented Aug 16, 2023

PR to filter out references to sliding sync proxy and rust-sdk from the user_daily_visits table to ensure that Element X can be represented fully.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

@neilisfragile neilisfragile requested a review from a team as a code owner August 16, 2023 20:49
AND udv.timestamp IS NULL
AND u.user_agent !='sync-v3-proxy-'
AND u.user_agent !='matrix-rust-sdk'
AND users.is_guest=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the definition of the user_ips table it appears that there is one row per user/device/ip, and as such we only store one user agent per device. The upshot of this is that we won't necessarily pick up e.g. Element X user agents if its been clobbered by a sliding sync user agent, which I think will materially skew the stats here?

I wonder if we should instead be filtering out these user agents on insertion instead?

I'm still happy to merge this as is for a first cut if it'd be useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. No point in merging as is then. I'll edit to remove the UAs from insertion.

Note the 'matrix-rust-sdk' entries generated from element x should reduce to zero following element-hq/element-x-ios#1507

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR I think you want to do the filtering here:

async def insert_client_ip(
self,
user_id: str,
access_token: str,
ip: str,
user_agent: str,
device_id: Optional[str],
now: Optional[int] = None,
) -> None:
if not now:
now = int(self._clock.time_msec())
key = (user_id, access_token, ip)
try:
last_seen = self.client_ip_last_seen.get(key)
except KeyError:
last_seen = None
# Rate-limited inserts
if last_seen is not None and (now - last_seen) < LAST_SEEN_GRANULARITY:
return
self.client_ip_last_seen.set(key, now)
if self._update_on_this_worker:
await self.populate_monthly_active_users(user_id)
self._batch_row_update[key] = (user_agent, device_id, now)
else:
# We are not the designated writer-worker, so stream over replication
self.hs.get_replication_command_handler().send_user_ip(
user_id, access_token, ip, user_agent, device_id, now
)

@neilisfragile
Copy link
Contributor Author

I reworked the PR to focus on exclusion of the proxy user_agent from user_ip. I dropped filtering out the rust-sdk user agent firstly because there are legitimate non-EX use cases where the rust-sdk UA is correct and secondly because the bug in EX has been fixed.

@erikjohnston erikjohnston merged commit ec662bb into develop Aug 23, 2023
@erikjohnston erikjohnston deleted the neilj/filter_out_unwanted_ua branch August 23, 2023 13:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants