Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add performance metrics commands #657

Merged
merged 11 commits into from
Mar 18, 2024
Merged

Conversation

Ian2012
Copy link
Contributor

@Ian2012 Ian2012 commented Mar 15, 2024

Description

This PR does the following changes:

  1. Restore the query_context param in charts to query data from charts programmatically.
  2. Fixes the datasets's Jinja template so that the data is only filtered on the dashboard context where the filters are defined.
  3. Adds a performance-metrics command that queries the data on all the SUPERSET_EMBEDDABLE_DASHBOARDS charts and gets the data needed for performance measurements. Every query performed (by clickhouse-connect) is attached a custom http_user_agent following this format aspects-{ASPECTS_VERSION}-{UUID[0:6]} so that those queries can be retrieved and matched later.

The output is sorted from slowest to fastest. An example output is here:

1. Distribution Of Hints Per Correct Answer
Superset time: 2.207999 (s).
Query duration: 1.613 (s).
Result rows: 0
Memory Usage (MB): 378.3172836303711
Row count (superset) 0
Filters: []

2. Responses Per Problem
Superset time: 1.856917 (s).
Query duration: 1.314 (s).
Result rows: 1
Memory Usage (MB): 367.7551851272583
Row count (superset) 1
Filters: []

3. Distribution Of Attempts
Superset time: 1.845082 (s).
Query duration: 1.334 (s).
Result rows: 0
Memory Usage (MB): 367.78697681427
Row count (superset) 0
Filters: [{'column': 'success'}]

4. Watched Video Segments
Superset time: 1.041357 (s).
Query duration: 0.284 (s).
Result rows: 12
Memory Usage (MB): 98.69888496398926
Row count (superset) 12
Filters: [{'column': 'started_at'}]

5. Course Grade Distribution
Superset time: 1.02784 (s).
Query duration: 0.442 (s).
Result rows: 1
Memory Usage (MB): 190.35809421539307
Row count (superset) 1
Filters: [{'column': 'emission_time'}]

6. Distribution Of Problem Grades
Superset time: 0.622855 (s).
Query duration: 0.431 (s).
Result rows: 1
Memory Usage (MB): 216.47877311706543
Row count (superset) 1
Filters: [{'column': 'emission_time'}]

7. Distribution Of Responses
Superset time: 0.519835 (s).
Query duration: 0.318 (s).
Result rows: 3
Memory Usage (MB): 113.22687339782715
Row count (superset) 3
Filters: [{'column': 'emission_time'}]

8. Transcripts / Captions Per Video
Superset time: 0.503895 (s).
Query duration: 0.344 (s).
Result rows: 0
Memory Usage (MB): 106.25424003601074
Row count (superset) 0
Filters: [{'column': 'emission_time'}]

9. Watches Per Video
Superset time: 0.460999 (s).
Query duration: 0.314 (s).
Result rows: 3
Memory Usage (MB): 106.36078453063965
Row count (superset) 3
Filters: [{'column': 'emission_time'}]

10. Currently Enrolled Learners Per Day
Superset time: 0.428259 (s).
Query duration: 0.04 (s).
Result rows: 0
Memory Usage (MB): 4.363988876342773
Row count (superset) 0
Filters: [{'column': 'enrollment_status'}, {'column': 'enrollment_status_date'}]

11. Enrollment Events Per Day
Superset time: 0.17767 (s).
Query duration: 0.01 (s).
Result rows: 0
Memory Usage (MB): 0.27298736572265625
Row count (superset) 0
Filters: [{'column': 'emission_time'}]

12. Enrollments By Enrollment Mode
Superset time: 0.157921 (s).
Query duration: 0.012 (s).
Result rows: 0
Memory Usage (MB): 0.272613525390625
Row count (superset) 0
Filters: [{'column': 'enrollment_status'}, {'column': 'emission_time'}]

13. Posts per user
Superset time: 0.12636 (s).
Query duration: 0.009 (s).
Result rows: 0
Memory Usage (MB): 0.35162353515625
Row count (superset) 0
Filters: []

14. Distinct forum users
Superset time: 0.123969 (s).
Query duration: 0.007 (s).
Result rows: 1
Memory Usage (MB): 0.2674407958984375
Row count (superset) 1
Filters: [{'column': 'emission_time'}]
...

The following chart and its dataset are created to visualize the queries performed from this tool:

image

Pending changes

  • Perform each query 5-10 times and calculate median, average and other metrics.
  • Support different testing scenarios with filters injected

@openedx-webhooks
Copy link

Thanks for the pull request, @Ian2012! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 15, 2024
@Ian2012 Ian2012 requested review from bmtcril and pomegranited March 15, 2024 19:17
Base automatically changed from cag/fix-dev to main March 15, 2024 19:17
@Ian2012 Ian2012 force-pushed the cag/performance-metrics branch from db3cc91 to 612f5e0 Compare March 15, 2024 19:25
@Ian2012 Ian2012 force-pushed the cag/performance-metrics branch from 612f5e0 to 5a2dcba Compare March 15, 2024 19:25
Copy link
Contributor

@bmtcril bmtcril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, I'm excited to start getting some real measurements on these that we can iterate on!

tutoraspects/commands_v0.py Outdated Show resolved Hide resolved
@@ -3,10 +3,10 @@ with grades as (
from {{ DBT_PROFILE_TARGET_DATABASE }}.fact_grades
where grade_type = 'course'
{% raw %}
{% if filter_values('course_name') != [] %}
{% if get_filters('course_name', remove_filter=True) != [] && filter_values('course_name') != [] %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the effect of remove_filter here. Did it have a performance impact? @SoryRawyer do you think it might cause problems with our predicate pushdown workarounds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of remove_filter is that it prevents Superset from adding the filter to the final generated SQL. I've used it in the past when we want to filter on a column in a subquery but don't want to include that column in the final result. What's not clear to me is how this would affect subsequent calls to filter_values('course_name') within this query.

Additionally, the get_filters check feels redundant; I can't tell if there will be a case where that returns a different answer than the comparison on filter_values('course_name').

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be in the same condition, I was testing some changes. The final SQL should be:

{% if get_filters('course_name', remove_filter=True) != [] %}
# When there are no filters defined (such as in the SQL Lab, programmatic access, or when creating charts)
# Do nothing, no filter
{% if filter_values('course_name') != [] %}
# Bassically in the dashboard context
# Do the filter thing
{% else %}
 1=0
{% endif %}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_filters returns the applied filters to a column in a list with the comparator and the value. See https://superset.apache.org/docs/installation/sql-templating/#available-macros

@Ian2012 Ian2012 force-pushed the cag/performance-metrics branch from f5feea7 to cbf3bf9 Compare March 18, 2024 14:57
Copy link
Contributor

@bmtcril bmtcril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, remember to squash 😅

@Ian2012 Ian2012 merged commit 59139d6 into main Mar 18, 2024
10 checks passed
@Ian2012 Ian2012 deleted the cag/performance-metrics branch March 18, 2024 16:28
@openedx-webhooks
Copy link

@Ian2012 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants