-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
synapse/storage/__init__.py
Outdated
WHERE last_seen > ? | ||
""" | ||
txn.execute(sql, (yesterday_start_time,)) | ||
user_visits = txn.fetchall() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably okay? If not I can fetchone on each iteration of the subsequent loop
@@ -48,3 +48,4 @@ env/ | |||
*.config | |||
|
|||
.vscode/ | |||
.ropeproject/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to this PR just want to ignore Atom python ide
I would be tempted to work out the active users for today as you go rather than yesterday. I.e. every hour running something like: INSERT INTO user_daily_visits (user_id, device_id, timestamp)
SELECT user_id, device_id, $today
FROM user_ips AS u
LEFT JOIN user_daily_visits USING (user_id, device_id)
WHERE last_seen > $today AND timestamp IS NULL (The join to ensure that we don't insert duplicate values) |
…cohort_analytics
…tead insert incrementally through the day
Note change in SQL to handle case of duplicates on insert (where there are multiple entries that did not match previous entries in user_daily_visits) |
The aim is to keep track of when it was last called and only query from that point in time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I'd just switch around when you set self._last_user_visit_update
, but otherwise is good
synapse/storage/__init__.py
Outdated
# frequently | ||
|
||
now = datetime.datetime.utcnow() | ||
self._last_user_visit_update = int(time.mktime(now.timetuple())) * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use self.clock.time_msec()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, You probably want to get this before the query, so you do:
now = self.clock.time_msec()
txn.execute(sql, (
today_start, today_start,
self._last_user_visit_update,
now,
))
self._last_user_visit_update = now
This ensures there isn't going to be any overlaps or missed updates.
synapse/storage/__init__.py
Outdated
# where if the user logs in at 23:59 and overwrites their | ||
# last_seen at 00:01 then they will not be counted in the | ||
# previous day's stats - it is important that the query is run | ||
# to minimise this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean:
it is important that the query is run often...
synapse/storage/__init__.py
Outdated
|
||
txn.execute(sql, (today_start, today_start, | ||
self._last_user_visit_update, | ||
today_start + a_day_in_milliseconds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The identing style we use tends to not use this continuation indent. See other comment for how it normally looks)
Create a new table to store daily user visit information which can be used in cohort analysis.
Rather than insert on each user action, instead slurp the user_ips table daily.
Not sure how best to test this - any pointers?