Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with CH persons double click on retention table #2794

Closed
paolodamico opened this issue Dec 16, 2020 · 12 comments · Fixed by #3108
Closed

Bug with CH persons double click on retention table #2794

paolodamico opened this issue Dec 16, 2020 · 12 comments · Fixed by #3108
Labels
bug Something isn't working right P0 Critical, breaking issue (page crash, missing functionality)

Comments

@paolodamico
Copy link
Contributor

Bug description

We have an issue in which the table of users for each cohort does not contain the same number of users as the table reports.

Example

(Source)

Table shows 6 users in cohort:

Persons list shows a lot more than 6 users in cohort:

Expected behavior

The persons list and the retention table should match.

How to reproduce

See description above.

Environment

Cloud, Clickhouse.

Additional context

Currently blocking internal personas analysis.

Thank you for your bug report – we love squashing them!

@paolodamico
Copy link
Contributor Author

This seems to be happening in other graphs too. See example

@EDsCODE
Copy link
Member

EDsCODE commented Jan 19, 2021

#2992 address the problem for the line graph example. Retention issue still open

@paolodamico
Copy link
Contributor Author

I don't know if I discovered another bug or is the same bug manifesting another way. See here the table saying there's 21 users, but then the table contains way more than 21 users (~90).

In addition, worth mentioning that I'm experiencing a ton of timeouts when getting the persons list (about 8 or 9 times out of 10).

@paolodamico
Copy link
Contributor Author

There still seems to be a bug here (unsure if it's a one-off thing), but I opened this graph today and got no users,

@paolodamico paolodamico reopened this Feb 11, 2021
@EDsCODE EDsCODE added the P0 Critical, breaking issue (page crash, missing functionality) label Feb 11, 2021
@yakkomajuri
Copy link
Contributor

Still not really working and also essential for my work following up with onboarded users that might be churning. Right now I have no idea who the users are.

@yakkomajuri
Copy link
Contributor

Are these the same?

#3018

@EDsCODE
Copy link
Member

EDsCODE commented Feb 12, 2021

Dont think so. Going to revisit this today

@EDsCODE
Copy link
Member

EDsCODE commented Feb 12, 2021

update on this:
#3331 and #3332 fixes the blank results and overload of results that surpassed counts.

What's happening now that's still unintuitive is that I'm filtering out people that don't exist in postgres. I think there's a sync issue between postgres and clickhouse where a user doesn't exist at all in postgres but exists in clickhouse. So the query is accounting for users in clickhouse but when I try to compile the list with person details, it can't find the user in postgres. I've opted to omit these missing rows and just return a list of the users that do exist. This will result in a count discrepancy again where sometimes the actual number of rows will be less than the number shown at the top.

I could match the returned row count with the totals at the top of the modal but then there would be a difference between the table number that's shown in the visualization and the numbers shown in the modal

@mariusandra
Copy link
Collaborator

Just guessing here, but could there be some issue with running the clickhouse person table ALTER queries somewhere?

One possible explanation is that these queries silently failed or were skipped for the python ingestion for a while, but worked with the plugin server ingestion. This also gives a plausible explanation to the outage as the server was just not able to handle this many newly seen alter queries.

The other option is the plugin server skipped a lot of them when ingesting, especially when clickhouse went down. It possibly wasn't fully down, but just the person queries failed at first.

What's the date range for the faulty data?

@yakkomajuri
Copy link
Contributor

@mariusandra I mean the issue was reported back in December - was the plugin server at all in cloud back then?

@mariusandra
Copy link
Collaborator

Nope, it's only been doing something for the last 2 weeks

@Twixes
Copy link
Member

Twixes commented Mar 15, 2021

I'll close this since it appears the original problem is not relevant anymore. A related but different problem should get its own issue.

@Twixes Twixes closed this as completed Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right P0 Critical, breaking issue (page crash, missing functionality)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants