Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling posthog.identify() starts a new session #247

Closed
1 task done
krschacht opened this issue Jun 9, 2021 · 6 comments
Closed
1 task done

Calling posthog.identify() starts a new session #247

krschacht opened this issue Jun 9, 2021 · 6 comments
Labels

Comments

@krschacht
Copy link

Bug description

When a session is being recorded with a series of events, as soon as the user logs in (and posthog.identify() is called with their user_id) the logs begin a new session for this user.

Expected behavior

Conceptually, I expect the data model of a "session" with a unique session_id to be one continuous sequence of actions without a 30 minute gap of inactivity. Here is a screenshot of a test user. This is one session and you can see the moment the user_id was passed it started grouping those events as a new session.

https://share.getcloudapp.com/jkuPKjpn

How to reproduce

  1. Open incognito window, visit site
  2. Click around and find this session in PostHog
  3. Login and watch it start recording all future events in this session as if it's a new session

Environment

  • self-hosted PostHog, version/commit: I'm not sure how to confirm what version I'm running, but it lives at data.vizz.com so maybe you can tell from the page source?

Thank you for your bug report – we love squashing them!

@mariusandra mariusandra transferred this issue from PostHog/posthog Jun 15, 2021
@yakkomajuri
Copy link
Contributor

yakkomajuri commented Jul 8, 2021

Hey @krschacht! Thanks for reporting. I can confirm this is the case.


For our own team:

This was moved here but I'm not actually sure it belongs in posthog-js.

I believe posthog-jsactually does the right thing associating new events with the newly-set ID, but the problem seems to be that our sessions are based on distinct IDs, and not person IDs. There's probably a discussion here about conceptually what makes more sense as well as practically too.

@yakkomajuri
Copy link
Contributor

yakkomajuri commented Jul 8, 2021

Upon actually looking a bit into this I think it might be related to our duplicate persons problem.

@macobo
Copy link
Contributor

macobo commented Jul 8, 2021

The sessions query indeed aggregates based on distinctId not personId. Proof: https://github.com/PostHog/posthog/blob/master/ee/clickhouse/sql/sessions/list.py#L95-L97

Meta-issue required to be solved here is what even is a session and what usecases does it serve: PostHog/posthog#4884

@krschacht
Copy link
Author

krschacht commented Jul 8, 2021

Got it, I have noticed this duplicate person problem too.

In case it's helpful, I resolved both these issues internally for myself with some complicated SQL. As you are trying to resolve this, it could help. This is very far from a PR so just some raw SQL that might be helpful:

CREATE VIEW sessionized_events
SELECT
    device.id as device_id,
    CAST(event.properties ->> '$device_id' as varchar) as device_key,
    event.*,
CASE WHEN
    COALESCE (lag(timestamp) OVER (ORDER BY event.distinct_id,timestamp) <= timestamp - interval '30 min',true)
    OR COALESCE (lag(event.distinct_id) OVER (ORDER BY event.distinct_id,timestamp) <> event.distinct_id,true)
THEN true ELSE false end as session_start_at
FROM
    posthog.posthog_event as event,
    posthog.posthog_person as device,
    posthog.posthog_persondistinctid as device_link
WHERE event.distinct_id = device_link.distinct_id AND device_link.person_id = device.id
ORDER BY event.distinct_id, event.timestamp;

And then:

CREATE VIEW events_with_sessions
SELECT
    id,
    device_id,
    device_key::uuid,
    device_key as device_key_str,
    COUNT(*) FILTER (WHERE session_start_at) OVER (ORDER BY device_id,timestamp) AS session_id,
    session_start_at,
    event,
    site_url,
    elements_hash,
    elements,
    properties,
    team_id,
    timestamp,
    created_at
FROM sessionized_events
ORDER BY device_id,timestamp

Finally:

CREATE VIEW sessions
SELECT
    session_id as id,
    MIN(device_id) as device_id,
    CAST(MIN(device_key_str) as uuid) as device_key,
    MIN(device_key_str) as device_key_str,
    MIN(timestamp) as first_event_at,
    MAX(timestamp) as last_event_at,
    COUNT(*) as events_count
FROM
    events
GROUP BY session_id

This doesn't fix it within the app myself, but I have some custom dashboards that show me correct sessions thanks to these queries.

@posthog-bot
Copy link
Collaborator

This issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

@posthog-bot
Copy link
Collaborator

This issue was closed due to lack of activity. Feel free to reopen if it's still relevant.

@posthog-bot posthog-bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants