-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matomo might create too many visits when using userId feature #7691
Comments
(here are notes from Slack discussion): Current analysis is that:
Making this change means a few important things will be affected
|
No issue for the Android SDK. We don't hash/overwrite the visitor-id client side. By default every call to the Tracker contains a user-id (one per app install) and a visitor-id (per app session). |
I would really appreciate if the current implementation gets changed again, as it makes no sense from a non-technical point of view to split a visit when a user sings in or out. In my opinion the user_id should be an information attached to a visitor allowing to aggregate visitors (who are - at least to some extent - browsers on devices) to a single user. |
See also: Incorrect browser logged when user switches browsers when using userId #7785 |
The discussion here is "which assumption should be enforced at the time we process incoming data". I'd like to at least propose that we enforce neither at runtime. In most cases, the assumptions made by Piwik are reasonable... but they have the side-effect of masking the actual, underlying data so they preclude analysis under different assumptions. Another example of this is a prior forum discussion on referrers. In my proposal, Piwik simply stores all the facts (userid, visitorid, time, referrer, device, etc.) at runtime. We then delegate the various analytical decisions (like the one debated here) to the reporting system. The default reporting system can select one... someone could write a module to display the other. No one is precluded from analyzing the data in the way that makes the most sense to them. As an added benefit, eliminating runtime analysis should enhance performance. Obviously, there's an offsetting cost when the report is run, but that can be managed any number of ways (and only when it's actually required). |
Yep!
Absolutely
Yep, should enhance tracker performance (or at least not make it slower).
Yep, I think this is why it is built the way it is currently. There were some performance tweaks in mind for faster archiving. Agreed on all this :) |
tentatively moving this to |
Any updates on this guys? The error seems to persist. |
Hi, |
@saurtar So far this is not fixed AFAIK. It should be an edge case and not the norm but this issue can still occur |
FYI we'll likely work on & merge this PR soon, which would in theory help a lot with this issue: #12742 When setting or resetting User ID, do not update the Visitor ID in the first party cookie |
This pull request completely separates userId from visitorId as a per-site setting:
My change will create a new visit if the userid changes. But only if the previous userId is non-empty and the new userId is non-empty.
If we still want to do that (fall back to userId as visitorId if visitorId is missing), I will adjust my pull request. |
Thanks Michael for the update. Can you confirm if this would be fixed/merged soon? This is holding us to use the UserID feature with the combination of redis worker queues (16 in number). |
Hi guys. From what I understand, I'm getting the feeling that Visitor ID / User ID logic is based on the assumption that often multiple people would sign in / out, and hence logging in or out should be tracked as if it were different person before / after log out. At least it seems to be the case for actions prior to the login -> even if they were recorded without user_id, they are not attributed to the user who just logged in. Nowadays, at least based on what I know from my friends and my colleagues, it is much more often that only one singe person uses the device. And even if someone else uses same device and browser, they will often use incognito mode (where it is good to record new visitor). It would be great to have an option for something like At least in our business, in that way we would more accurately track actions, and avoid seeing what seems to me distorted reports where I guess the unique visitors could be double (same person before + after login). |
Since #14360 was release as part of 3.13.0, should this issue be closed? Or possibly, closed after removing the reference to it in the QueuedTracking settings page? 🙂 |
Will do for now :) |
This is especially a problem for Log Importer and QueuedTracking, but can happen with normal tracking as well. It's hard to explain but I will try :)
It's a problem when a user logs in and turns from a visitor into a user or when a user logs out and becomes a visitor again. It is a problem when the requests are not inserted in the exactly same order as they were sent.
Imagine the following tracking requests:
We will create a new visit for tracking request
1
. So far so good. If then for some reason3
is processed before2
, a second visit will be created. Why? When a userId is detected, we use theuid
as visitorId and we do overwrite theidvisitor
of all past visits (in this case of request1
). Meaning when the second tracking requests is executed, it won't find an existingidvisitor
as theuid
does not exist there and it will create a new visit.When is this a problem? As mentioned this is especially a problem when using log importer or queuedTracking with multiple workers / recorders. Both split requests into a different queues to process them in parallel see: https://github.com/piwik/piwik-log-analytics/blob/master/import_logs.py#L1642-L1651 and https://github.com/piwik/plugin-QueuedTracking/blob/multi_test/Queue/Manager.php#L161-L177 . This means once a
uid
is set, a request might go into a different queue than the one withoutuid
and they can be likely processed in different order.Same problem can occur if someone has for example multiple PHP nodes with load balancing etc. but it is less likely and it would be - realistically - only one request affected and all following would be fine. Still it can create one additional visit.
The text was updated successfully, but these errors were encountered: