Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Cohort analytics #3163

Merged
merged 13 commits into from
May 16, 2018
Merged

Cohort analytics #3163

merged 13 commits into from
May 16, 2018

Conversation

neilisfragile
Copy link
Contributor

Create a new table to store daily user visit information which can be used in cohort analysis.

Rather than insert on each user action, instead slurp the user_ips table daily.

Not sure how best to test this - any pointers?

WHERE last_seen > ?
"""
txn.execute(sql, (yesterday_start_time,))
user_visits = txn.fetchall()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably okay? If not I can fetchone on each iteration of the subsequent loop

@@ -48,3 +48,4 @@ env/
*.config

.vscode/
.ropeproject/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR just want to ignore Atom python ide

@turt2live turt2live mentioned this pull request May 1, 2018
@erikjohnston
Copy link
Member

I would be tempted to work out the active users for today as you go rather than yesterday. I.e. every hour running something like:

INSERT INTO user_daily_visits (user_id, device_id, timestamp)
SELECT user_id, device_id, $today
FROM user_ips AS u
LEFT JOIN user_daily_visits USING (user_id, device_id)
WHERE last_seen > $today AND timestamp IS NULL

(The join to ensure that we don't insert duplicate values)

@neilisfragile
Copy link
Contributor Author

Note change in SQL to handle case of duplicates on insert (where there are multiple entries that did not match previous entries in user_daily_visits)

@erikjohnston erikjohnston removed their assignment May 15, 2018
The aim is to keep track of when it was last called and only query from that point in time
Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'd just switch around when you set self._last_user_visit_update, but otherwise is good

# frequently

now = datetime.datetime.utcnow()
self._last_user_visit_update = int(time.mktime(now.timetuple())) * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use self.clock.time_msec()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, You probably want to get this before the query, so you do:

now = self.clock.time_msec()
txn.execute(sql, (
    today_start, today_start,
    self._last_user_visit_update,
    now,
))
self._last_user_visit_update = now

This ensures there isn't going to be any overlaps or missed updates.

# where if the user logs in at 23:59 and overwrites their
# last_seen at 00:01 then they will not be counted in the
# previous day's stats - it is important that the query is run
# to minimise this case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean:

it is important that the query is run often...


txn.execute(sql, (today_start, today_start,
self._last_user_visit_update,
today_start + a_day_in_milliseconds))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The identing style we use tends to not use this continuation indent. See other comment for how it normally looks)

@neilisfragile neilisfragile merged commit dc8930e into develop May 16, 2018
@neilisfragile neilisfragile deleted the cohort_analytics branch May 16, 2018 10:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants