Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible duplication of telemetry events #19

Closed
edublancas opened this issue Nov 17, 2022 · 1 comment · Fixed by #20
Closed

possible duplication of telemetry events #19

edublancas opened this issue Nov 17, 2022 · 1 comment · Fixed by #20
Assignees

Comments

@edublancas
Copy link
Contributor

we recently ran a workshop, and most attendees ran the code locally. then, we saw an uptick of events in posthog that didn't match the number of attendees: we had about 20 attendees but saw >100 new users on posthog. I suspect this has to do with #15, but I'm unsure.

the workshop covers a few of our projects, but I suspect the problem is with the Running experiments in parallel section, which runs this pipeline.

We need to investigate if this is still an issue and fix it.

FYI: @idomic

@yafimvo
Copy link
Collaborator

yafimvo commented Nov 21, 2022

Yes, It seems like the problem happens when we use the parallel executor and we check if it's the 1st usage.

def check_first_time_usage():
    """
    The function checks for first time usage if the conf file exists and the
    uid file doesn't exist.
    """
    internal = Internal()
    first_time = internal.first_time
    internal.first_time = False
    return first_time

Since the default values of the Internal config file are these:

class Internal(Config):
    """
    Internal file to store settings (not intended to be modified by the
    user)
    """
    last_version_check: datetime.datetime = None
    uid: str
    first_time: bool = True

    def uid_default(self):
        return str(uuid4())

If we use parallel, and each process sees an empty config file, it gets the default values and creates a new user id. Since we're using internal = Internal() a lot of times this happens often.

I implemented a solution in this PR

I tried to reduce the number of times we write to this config file and changed the check_first_time_usage logic to read directly from the configuration file.

I ran the workshop locally and it seems to solve this duplications issue and didn't affect the tests.
This is my user: b76b83ed-8075-4e5b-ad8c-766aa4e773e9 (~186 events)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants