Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More granular incremental stats #3164

Merged
merged 13 commits into from
Feb 11, 2025
Merged

Conversation

amCap1712
Copy link
Member

Currently, the entirety of incremental dumps is checked to filter entities whose stats/popularity needs to be recomputed. This can be further optimized by storing the latest created timestamp of the listens in incremental dumps when a stat is
run. The next time a stat is run, only listens with a higher created value (added through newer incremental dumps) are considered for the filter.

Note that the incremental aggregate is still computed from all of incremental listens dump, only the filter is made more granular.

@amCap1712 amCap1712 requested a review from mayhem February 3, 2025 14:17
Copy link
Member

@mayhem mayhem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the question about the duplicated function is resolved, good to go.

@@ -91,11 +94,13 @@ def get_table_prefix(self) -> str:
def get_base_path(self) -> str:
return LISTENBRAINZ_POPULARITY_DIRECTORY

def get_filter_aggregate_query(self, existing_aggregate: str, incremental_aggregate: str) -> str:
def get_filter_aggregate_query_coarse(self, existing_aggregate: str, incremental_aggregate: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function has the same name as the one above. Intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two classes in this file, so once in each.

Currently, the entirety of incremental dumps is checked to filter entities whose
stats/popularity needs to be recomputed. This can be further optimized by storing
the latest created timestamp of the listens in incremental dumps when a stat is
run. The next time a stat is run, only listens with a higher created value (added
through newer incremental dumps) are considered for the filter.

Note that the incremental aggregate is still computed from all of incremental listens
dump, only the filter is made more granular.
@amCap1712 amCap1712 force-pushed the popularity-incremental-granular branch from 08be738 to 3d9ed9a Compare February 10, 2025 15:55
@amCap1712 amCap1712 merged commit f866646 into master Feb 11, 2025
1 check failed
@amCap1712 amCap1712 deleted the popularity-incremental-granular branch February 11, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants