More granular incremental stats #3164

amCap1712 · 2025-02-03T14:17:09Z

Currently, the entirety of incremental dumps is checked to filter entities whose stats/popularity needs to be recomputed. This can be further optimized by storing the latest created timestamp of the listens in incremental dumps when a stat is
run. The next time a stat is run, only listens with a higher created value (added through newer incremental dumps) are considered for the filter.

Note that the incremental aggregate is still computed from all of incremental listens dump, only the filter is made more granular.

mayhem

Once the question about the duplicated function is resolved, good to go.

mayhem · 2025-02-05T12:05:53Z

listenbrainz_spark/popularity/listens.py

@@ -91,11 +94,13 @@ def get_table_prefix(self) -> str:
    def get_base_path(self) -> str:
        return LISTENBRAINZ_POPULARITY_DIRECTORY

-    def get_filter_aggregate_query(self, existing_aggregate: str, incremental_aggregate: str) -> str:
+    def get_filter_aggregate_query_coarse(self, existing_aggregate: str, incremental_aggregate: str,


this function has the same name as the one above. Intended?

There are two classes in this file, so once in each.

Currently, the entirety of incremental dumps is checked to filter entities whose stats/popularity needs to be recomputed. This can be further optimized by storing the latest created timestamp of the listens in incremental dumps when a stat is run. The next time a stat is run, only listens with a higher created value (added through newer incremental dumps) are considered for the filter. Note that the incremental aggregate is still computed from all of incremental listens dump, only the filter is made more granular.

amCap1712 requested a review from mayhem February 3, 2025 14:17

mayhem approved these changes Feb 5, 2025

View reviewed changes

amCap1712 added 11 commits February 10, 2025 20:06

Update popularity providers

4d4ba29

fix SitewideStatsQueryProvider

f07fb35

fix str-ing path where needed

b8a85e7

fix where clause

2bf35a7

fix where clause - 2

b3d6374

debug

3bf09d3

debug - 2

040bf02

need to filter incremental aggregate too

17cb800

debug 3

151c68d

debug 4

3d9ed9a

amCap1712 force-pushed the popularity-incremental-granular branch from 08be738 to 3d9ed9a Compare February 10, 2025 15:55

amCap1712 added 2 commits February 11, 2025 17:07

fix popularity generation

f4f5d21

fix function name in sitewide stats

673cfac

amCap1712 merged commit f866646 into master Feb 11, 2025
1 check failed

amCap1712 deleted the popularity-incremental-granular branch February 11, 2025 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More granular incremental stats #3164

More granular incremental stats #3164

amCap1712 commented Feb 3, 2025

mayhem left a comment

mayhem Feb 5, 2025

amCap1712 Feb 11, 2025

More granular incremental stats #3164

More granular incremental stats #3164

Conversation

amCap1712 commented Feb 3, 2025

mayhem left a comment

Choose a reason for hiding this comment

mayhem Feb 5, 2025

Choose a reason for hiding this comment

amCap1712 Feb 11, 2025

Choose a reason for hiding this comment