Matomo might create too many visits when using userId feature #7691

tsteur · 2015-04-16T21:56:02Z

This is especially a problem for Log Importer and QueuedTracking, but can happen with normal tracking as well. It's hard to explain but I will try :)

It's a problem when a user logs in and turns from a visitor into a user or when a user logs out and becomes a visitor again. It is a problem when the requests are not inserted in the exactly same order as they were sent.

Imagine the following tracking requests:

1: http://apache.piwik/piwik.php?action_name=foo&_id=visitorId&idsite=1 // visitor
2: http://apache.piwik/piwik.php?action_name=bar&_id=visitorId&idsite=1 // visitor pageview
3: http://apache.piwik/piwik.php?action_name=foo&_id=visitorId&idsite=1&uid=5 // logs in

We will create a new visit for tracking request 1. So far so good. If then for some reason 3 is processed before 2, a second visit will be created. Why? When a userId is detected, we use the uid as visitorId and we do overwrite the idvisitor of all past visits (in this case of request 1). Meaning when the second tracking requests is executed, it won't find an existing idvisitor as the uid does not exist there and it will create a new visit.

When is this a problem? As mentioned this is especially a problem when using log importer or queuedTracking with multiple workers / recorders. Both split requests into a different queues to process them in parallel see: https://github.com/piwik/piwik-log-analytics/blob/master/import_logs.py#L1642-L1651 and https://github.com/piwik/plugin-QueuedTracking/blob/multi_test/Queue/Manager.php#L161-L177 . This means once a uid is set, a request might go into a different queue than the one without uid and they can be likely processed in different order.

Same problem can occur if someone has for example multiple PHP nodes with load balancing etc. but it is less likely and it would be - realistically - only one request affected and all following would be fine. Still it can create one additional visit.

The text was updated successfully, but these errors were encountered:

mattab · 2015-04-17T00:45:07Z

(here are notes from Slack discussion):

Current analysis is that:

we may want to change radically the behaviour of user_id tracking and revert a technical decision made during initial implementation of user ID ( Accurate User Detection cross devices: User ID (set in JS and all other clients) #3490 User ID implementation #6169 )
we cannot offer both behaviors (eg. via INI setting) as it would be too complicated, therefore we want take a product decision and understand and document the behavior
what we want to change is that instead of setting the visitor_id set as a hash of the user_id, we would leave the visitor_id as it is from the first party cookie
- this would complement Fix User ID Segmentation #6954 Segmentation by User ID #6959 where we decoupled User ID from Visitor ID for the Custom Segment user id
- also in piwik.js and PiwikTracker do not set Visitor id as a hash of user id ie. revert When User ID is used, set the first party cookie UUID to this User ID #7167 (note: Android/iOS SDKs + C# client etc. may need to change too)

Edit: in case of gdpr, exporting data for a user id, we need to make sure that the exported data does not cover multiple users, so we should somehow try to make sure a given visit should not cover 2 users and we would create a new visit if a different user id comes in

Making this change means a few important things will be affected

whenever a user (1) clears cookies or (2) connects via multiple devices simultaneously, currently, sessions opened on each device will be recorded in the same visit
- after the change, each simultaneous visit on separate devices would each create a new visit (but with the same user id)
- the User ID user guide will need to be updated as we change how User id works, especially this part: "How requests with a User ID are tracked > Same user from multiple device use case: [....]"
if several users connect on the same device, within 30min, the same visit would be re-used and only the latest User Id would be kept in the visit (currently, we create a new visit for each separate user id)
in Tracking API to be friendly to devs who want to only use user id uid and don't want to care to use visitor id _id, we'd ensure to default _id it to the user id hash so multiple actions for same user id are still tracked in the same visit
making the change would include revert When User ID is used, set the first party cookie UUID to this User ID #7167 Store visitorID related to userID to cookies #6838 which discussed the case with trust_visitors_cookies=1 -> maybe we could explain how User id would work in the user id user guide
we would no longer have to update old visitorId's when someone logs in (refs When a User Id is set, try to attach User ID to existing visit before user logged in #6313)
it will be helpful that when working on this we also include "User id Signing out use case" raised & discussed in UserID "Signing out use-case" - actions still attributed to the same Visitor #7556 (When UserID is set to empty string, actions maybe added to the same UserID visit #7368 Allow setting empty userId, refs #7402 #7518)

d4rken · 2015-04-18T11:01:56Z

also in piwik.js and PiwikTracker do not set Visitor id as a hash of user id ie. revert #7167 (note: Android/iOS SDKs + C# client etc. may need to change too)

No issue for the Android SDK. We don't hash/overwrite the visitor-id client side. By default every call to the Tracker contains a user-id (one per app install) and a visitor-id (per app session).

andre-hh · 2015-04-26T19:19:28Z

I would really appreciate if the current implementation gets changed again, as it makes no sense from a non-technical point of view to split a visit when a user sings in or out. In my opinion the user_id should be an information attached to a visitor allowing to aggregate visitors (who are - at least to some extent - browsers on devices) to a single user.

mattab · 2015-04-29T23:40:43Z

See also: Incorrect browser logged when user switches browsers when using userId #7785
where a user expected that Piwik would track two visits when the same user uses the website across two devices. (currently those clicks on those two devices would appear in same visit)

claytondaley · 2015-05-05T12:29:59Z

The discussion here is "which assumption should be enforced at the time we process incoming data". I'd like to at least propose that we enforce neither at runtime.

In most cases, the assumptions made by Piwik are reasonable... but they have the side-effect of masking the actual, underlying data so they preclude analysis under different assumptions. Another example of this is a prior forum discussion on referrers.

In my proposal, Piwik simply stores all the facts (userid, visitorid, time, referrer, device, etc.) at runtime. We then delegate the various analytical decisions (like the one debated here) to the reporting system. The default reporting system can select one... someone could write a module to display the other. No one is precluded from analyzing the data in the way that makes the most sense to them.

As an added benefit, eliminating runtime analysis should enhance performance. Obviously, there's an offsetting cost when the report is run, but that can be managed any number of ways (and only when it's actually required).

tsteur · 2015-05-05T21:47:31Z

In my proposal, Piwik simply stores all the facts (userid, visitorid, time, referrer, device, etc.) at runtime.

Yep!

The default reporting system can select one... someone could write a module to display the other

Absolutely

As an added benefit, eliminating runtime analysis should enhance performance.

Yep, should enhance tracker performance (or at least not make it slower).

Obviously, there's an offsetting cost when the report is run

Yep, I think this is why it is built the way it is currently. There were some performance tweaks in mind for faster archiving.

Agreed on all this :)

mattab · 2015-07-14T16:15:49Z

tentatively moving this to Short term as it sounds like we should make this change.

kachkaev · 2015-10-23T19:21:16Z

Any updates on this guys? The error seems to persist.

saurtar · 2016-03-21T14:53:31Z

Hi,
we started testing piwik recently for high traffic, that requires more than 1 worker to make sense, and since we are using UserID it seems piwik is good as dead for us? Or was this issue fixed just someone forgot to write about it?
I'm commenting here, since piwik WebUI (plugin settings) points here.. and says that only 1 worker should be run because of this bug = 7691

tsteur · 2016-03-28T19:22:18Z

@saurtar So far this is not fixed AFAIK. It should be an edge case and not the norm but this issue can still occur

mattab · 2018-04-19T05:26:18Z

FYI we'll likely work on & merge this PR soon, which would in theory help a lot with this issue: #12742 When setting or resetting User ID, do not update the Visitor ID in the first party cookie

MichaelRoosz · 2018-10-19T14:41:44Z

This pull request completely separates userId from visitorId as a per-site setting:
#13620

if several users connect on the same device, within 30min, the same visit would be re-used and only the latest User Id would be kept in the visit (currently, we create a new visit for each separate user id)

My change will create a new visit if the userid changes. But only if the previous userId is non-empty and the new userId is non-empty.

in Tracking API to be friendly to devs who want to only use user id uid and don't want to care to use visitor id _id, we'd ensure to default _id it to the user id hash so multiple actions for same user id are still tracked in the same visit

If we still want to do that (fall back to userId as visitorId if visitorId is missing), I will adjust my pull request.

siva538 · 2019-05-20T15:25:50Z

Thanks Michael for the update. Can you confirm if this would be fixed/merged soon? This is holding us to use the UserID feature with the combination of redis worker queues (16 in number).

peachp · 2019-07-29T20:45:11Z

Hi guys. From what I understand, I'm getting the feeling that Visitor ID / User ID logic is based on the assumption that often multiple people would sign in / out, and hence logging in or out should be tracked as if it were different person before / after log out. At least it seems to be the case for actions prior to the login -> even if they were recorded without user_id, they are not attributed to the user who just logged in.

Nowadays, at least based on what I know from my friends and my colleagues, it is much more often that only one singe person uses the device. And even if someone else uses same device and browser, they will often use incognito mode (where it is good to record new visitor).

It would be great to have an option for something like
"Attribute actions without User ID to when User ID is set"
...or to put it differently
"Don't change Visitor ID when setting User ID"

At least in our business, in that way we would more accurately track actions, and avoid seeing what seems to me distorted reports where I guess the unique visitors could be double (same person before + after login).

benwarfield-usds · 2019-12-12T19:19:16Z

Since #14360 was release as part of 3.13.0, should this issue be closed? Or possibly, closed after removing the reference to it in the QueuedTracking settings page? 🙂

tsteur · 2019-12-12T20:24:35Z

Will do for now :)

tsteur mentioned this issue Apr 17, 2015

Possibility to use multiple workers to insert requests from Redis to DB matomo-org/plugin-QueuedTracking#6

Merged

mattab mentioned this issue Apr 29, 2015

Incorrect browser logged when user switches browsers when using userId #7785

Closed

mattab added the Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change. label Jul 14, 2015

mattab added this to the Short term milestone Jul 14, 2015

mattab mentioned this issue Nov 16, 2015

UserID "Signing out use-case" - actions still attributed to the same Visitor #7556

Closed

mattab mentioned this issue Jan 2, 2017

Users are converting but not retaining pre-conversion history in visitor log #11112

Closed

ghost mentioned this issue Jul 27, 2017

A new visit is created on the Database when changing the userId field from the Android-SDK #11903

Open

mattab mentioned this issue Apr 19, 2018

When setting or resetting User ID, do not update the Visitor ID in the first party cookie #12742

Merged

mattab modified the milestones: Backlog (Help wanted), Priority Backlog (Help wanted) May 13, 2018

MichaelRoosz mentioned this issue Apr 24, 2019

UserID no longer overwrites VisitorId #14360

Merged

mattab changed the title ~~Piwik might create too many visits when using userId feature~~ Matomo might create too many visits when using userId feature Nov 2, 2019

tsteur closed this as completed Dec 12, 2019

mattab removed this from the Priority Backlog (Help wanted) milestone Dec 10, 2023

mattab added this to the Backlog (Help wanted) milestone Dec 10, 2023

innocraft-automation removed this from the Backlog (Help wanted) milestone Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matomo might create too many visits when using userId feature #7691

Matomo might create too many visits when using userId feature #7691

tsteur commented Apr 16, 2015

mattab commented Apr 17, 2015 •

edited

Loading

d4rken commented Apr 18, 2015

andre-hh commented Apr 26, 2015

mattab commented Apr 29, 2015

claytondaley commented May 5, 2015

tsteur commented May 5, 2015

mattab commented Jul 14, 2015

kachkaev commented Oct 23, 2015

saurtar commented Mar 21, 2016

tsteur commented Mar 28, 2016

mattab commented Apr 19, 2018

MichaelRoosz commented Oct 19, 2018 •

edited

Loading

siva538 commented May 20, 2019 •

edited

Loading

peachp commented Jul 29, 2019

benwarfield-usds commented Dec 12, 2019

tsteur commented Dec 12, 2019

Matomo might create too many visits when using userId feature #7691

Matomo might create too many visits when using userId feature #7691

Comments

tsteur commented Apr 16, 2015

mattab commented Apr 17, 2015 • edited Loading

d4rken commented Apr 18, 2015

andre-hh commented Apr 26, 2015

mattab commented Apr 29, 2015

claytondaley commented May 5, 2015

tsteur commented May 5, 2015

mattab commented Jul 14, 2015

kachkaev commented Oct 23, 2015

saurtar commented Mar 21, 2016

tsteur commented Mar 28, 2016

mattab commented Apr 19, 2018

MichaelRoosz commented Oct 19, 2018 • edited Loading

siva538 commented May 20, 2019 • edited Loading

peachp commented Jul 29, 2019

benwarfield-usds commented Dec 12, 2019

tsteur commented Dec 12, 2019

mattab commented Apr 17, 2015 •

edited

Loading

MichaelRoosz commented Oct 19, 2018 •

edited

Loading

siva538 commented May 20, 2019 •

edited

Loading