-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More granular incremental stats #3164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the question about the duplicated function is resolved, good to go.
@@ -91,11 +94,13 @@ def get_table_prefix(self) -> str: | |||
def get_base_path(self) -> str: | |||
return LISTENBRAINZ_POPULARITY_DIRECTORY | |||
|
|||
def get_filter_aggregate_query(self, existing_aggregate: str, incremental_aggregate: str) -> str: | |||
def get_filter_aggregate_query_coarse(self, existing_aggregate: str, incremental_aggregate: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function has the same name as the one above. Intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two classes in this file, so once in each.
Currently, the entirety of incremental dumps is checked to filter entities whose stats/popularity needs to be recomputed. This can be further optimized by storing the latest created timestamp of the listens in incremental dumps when a stat is run. The next time a stat is run, only listens with a higher created value (added through newer incremental dumps) are considered for the filter. Note that the incremental aggregate is still computed from all of incremental listens dump, only the filter is made more granular.
08be738
to
3d9ed9a
Compare
Currently, the entirety of incremental dumps is checked to filter entities whose stats/popularity needs to be recomputed. This can be further optimized by storing the latest created timestamp of the listens in incremental dumps when a stat is
run. The next time a stat is run, only listens with a higher created value (added through newer incremental dumps) are considered for the filter.
Note that the incremental aggregate is still computed from all of incremental listens dump, only the filter is made more granular.