Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce MV insert memory usage #126

Open
bmtcril opened this issue Oct 21, 2024 · 4 comments
Open

Reduce MV insert memory usage #126

bmtcril opened this issue Oct 21, 2024 · 4 comments
Assignees

Comments

@bmtcril
Copy link
Contributor

bmtcril commented Oct 21, 2024

We should do a memory pass on MV inserts and see which insert paths could use tweaking. It seems like there are methods to limit the number of rows being considered on the right side of joins in MVs as documented here: https://stackoverflow.com/questions/75243697/joining-large-tables-in-clickhouse-out-of-memory-or-slow

Where possible we should implement that or projections to speed up inserts and reduce memory consumption. This comes out of an issue reporting this error:

Code: 241. DB::Exception: Memory limit (total) exceeded: would use 14.49 GiB (attempt to allocate chunk of 4238898 bytes), maximum: 13.83 GiB. OvercommitTracker decision: Query was selected to stop by OvercommitTracker.: While executing JoiningTransform: while pushing to view xapi.subsection_problem_engagement_mv (b6e66e2d-e108-463c-ae7b-17ac579daf95): while pushing to view xapi.problem_events_mv (74481e14-1e07-42a2-986e-f7a269a79079): while pushing to view xapi.xapi_events_all_parsed_mv (2831c02c-9fca-494b-b8f6-3d22dc921e72): While executing WaitForAsyncInsert. (MEMORY_LIMIT_EXCEEDED) (version 23.8.2.7 (official build))

Similar errors have been seen with watched_video_duration:

aspects-job-20241023085611-n9mht aspects 13:59:30  34 of 47 ERROR creating sql materialized_view model `xapi`.`watched_video_duration`  [ERROR in 21.92s]
aspects-job-20241023085611-n9mht aspects 13:59:30  
aspects-job-20241023085611-n9mht aspects 13:59:30  Finished running 4 dictionary models, 26 materialized view models, 17 view models in 0 hours 3 minutes and 0.98 seconds (180.98s).
aspects-job-20241023085611-n9mht aspects 13:59:30  
aspects-job-20241023085611-n9mht aspects 13:59:30  Completed with 1 error and 0 warnings:
aspects-job-20241023085611-n9mht aspects 13:59:30  
aspects-job-20241023085611-n9mht aspects 13:59:30    Database Error in model watched_video_duration (models/video/watched_video_duration.sql)
aspects-job-20241023085611-n9mht aspects   Code: 241.
aspects-job-20241023085611-n9mht aspects   DB::Exception: Memory limit (total) exceeded: would use 13.86 GiB (attempt to allocate chunk of 5071098 bytes), maximum: 13.80 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero.: While executing JoiningTransform. Stack trace:
@bmtcril bmtcril changed the title Reduct MV insert memory usage Reduce MV insert memory usage Oct 24, 2024
@saraburns1 saraburns1 self-assigned this Nov 13, 2024
@saraburns1 saraburns1 moved this from Ready for Work to Doing in Data Working Group Nov 13, 2024
@saraburns1
Copy link
Contributor

watched_video_duration has been updated to use a parameterized view instead of materialized which allows superset to run the query on a subset of data and eliminates the memory issues.

dbt model - https://github.com/openedx/aspects-dbt/blob/main/models/video/watched_video_duration.sql#L25-L36
superset dataset - https://github.com/openedx/tutor-contrib-aspects/blob/main/tutoraspects/templates/openedx-assets/queries/watched_video_duration.sql#L36-L38

@saraburns1
Copy link
Contributor

We can do similar changes to other high memory models. I'll compile a list of highest memory queries and queries that have been moved out of dbt into superset which we may be able to move back

@bmtcril
Copy link
Contributor Author

bmtcril commented Dec 2, 2024

Do you think this is a thing we can actually use to remove most of the SQL pre-where queries from Superset? It seems like it would work.

@saraburns1
Copy link
Contributor

I don't see why not. It might take some refactoring of the queries if we need to use parameters but shouldn't be too bad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Doing
Development

No branches or pull requests

2 participants