feat: add performance metrics commands #657

Ian2012 · 2024-03-15T19:17:30Z

Description

This PR does the following changes:

Restore the query_context param in charts to query data from charts programmatically.
Fixes the datasets's Jinja template so that the data is only filtered on the dashboard context where the filters are defined.
Adds a performance-metrics command that queries the data on all the SUPERSET_EMBEDDABLE_DASHBOARDS charts and gets the data needed for performance measurements. Every query performed (by clickhouse-connect) is attached a custom http_user_agent following this format aspects-{ASPECTS_VERSION}-{UUID[0:6]} so that those queries can be retrieved and matched later.

The output is sorted from slowest to fastest. An example output is here:

1. Distribution Of Hints Per Correct Answer
Superset time: 2.207999 (s).
Query duration: 1.613 (s).
Result rows: 0
Memory Usage (MB): 378.3172836303711
Row count (superset) 0
Filters: []

2. Responses Per Problem
Superset time: 1.856917 (s).
Query duration: 1.314 (s).
Result rows: 1
Memory Usage (MB): 367.7551851272583
Row count (superset) 1
Filters: []

3. Distribution Of Attempts
Superset time: 1.845082 (s).
Query duration: 1.334 (s).
Result rows: 0
Memory Usage (MB): 367.78697681427
Row count (superset) 0
Filters: [{'column': 'success'}]

4. Watched Video Segments
Superset time: 1.041357 (s).
Query duration: 0.284 (s).
Result rows: 12
Memory Usage (MB): 98.69888496398926
Row count (superset) 12
Filters: [{'column': 'started_at'}]

5. Course Grade Distribution
Superset time: 1.02784 (s).
Query duration: 0.442 (s).
Result rows: 1
Memory Usage (MB): 190.35809421539307
Row count (superset) 1
Filters: [{'column': 'emission_time'}]

6. Distribution Of Problem Grades
Superset time: 0.622855 (s).
Query duration: 0.431 (s).
Result rows: 1
Memory Usage (MB): 216.47877311706543
Row count (superset) 1
Filters: [{'column': 'emission_time'}]

7. Distribution Of Responses
Superset time: 0.519835 (s).
Query duration: 0.318 (s).
Result rows: 3
Memory Usage (MB): 113.22687339782715
Row count (superset) 3
Filters: [{'column': 'emission_time'}]

8. Transcripts / Captions Per Video
Superset time: 0.503895 (s).
Query duration: 0.344 (s).
Result rows: 0
Memory Usage (MB): 106.25424003601074
Row count (superset) 0
Filters: [{'column': 'emission_time'}]

9. Watches Per Video
Superset time: 0.460999 (s).
Query duration: 0.314 (s).
Result rows: 3
Memory Usage (MB): 106.36078453063965
Row count (superset) 3
Filters: [{'column': 'emission_time'}]

10. Currently Enrolled Learners Per Day
Superset time: 0.428259 (s).
Query duration: 0.04 (s).
Result rows: 0
Memory Usage (MB): 4.363988876342773
Row count (superset) 0
Filters: [{'column': 'enrollment_status'}, {'column': 'enrollment_status_date'}]

11. Enrollment Events Per Day
Superset time: 0.17767 (s).
Query duration: 0.01 (s).
Result rows: 0
Memory Usage (MB): 0.27298736572265625
Row count (superset) 0
Filters: [{'column': 'emission_time'}]

12. Enrollments By Enrollment Mode
Superset time: 0.157921 (s).
Query duration: 0.012 (s).
Result rows: 0
Memory Usage (MB): 0.272613525390625
Row count (superset) 0
Filters: [{'column': 'enrollment_status'}, {'column': 'emission_time'}]

13. Posts per user
Superset time: 0.12636 (s).
Query duration: 0.009 (s).
Result rows: 0
Memory Usage (MB): 0.35162353515625
Row count (superset) 0
Filters: []

14. Distinct forum users
Superset time: 0.123969 (s).
Query duration: 0.007 (s).
Result rows: 1
Memory Usage (MB): 0.2674407958984375
Row count (superset) 1
Filters: [{'column': 'emission_time'}]
...

The following chart and its dataset are created to visualize the queries performed from this tool:

Pending changes

Perform each query 5-10 times and calculate median, average and other metrics.
Support different testing scenarios with filters injected

openedx-webhooks · 2024-03-15T19:17:35Z

Thanks for the pull request, @Ian2012! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

supporting documentation
Open edX discussion forum threads
timeline information ("this must be merged by XX date", and why that is)
partner information ("this is a course on edx.org")
any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

bmtcril

This is great, I'm excited to start getting some real measurements on these that we can iterate on!

tutoraspects/templates/aspects/apps/superset/pythonpath/performance_metrics.py

tutoraspects/commands_v0.py

tutoraspects/templates/aspects/apps/superset/pythonpath/performance_metrics.py

bmtcril · 2024-03-18T12:42:22Z

tutoraspects/templates/openedx-assets/queries/fact_course_grades.sql

@@ -3,10 +3,10 @@ with grades as (
    from {{ DBT_PROFILE_TARGET_DATABASE }}.fact_grades
    where grade_type = 'course'
    {% raw %}
-    {% if filter_values('course_name') != [] %}
+    {% if get_filters('course_name', remove_filter=True) != [] && filter_values('course_name') != [] %}


I'm curious about the effect of remove_filter here. Did it have a performance impact? @SoryRawyer do you think it might cause problems with our predicate pushdown workarounds?

My understanding of remove_filter is that it prevents Superset from adding the filter to the final generated SQL. I've used it in the past when we want to filter on a column in a subquery but don't want to include that column in the final result. What's not clear to me is how this would affect subsequent calls to filter_values('course_name') within this query.

Additionally, the get_filters check feels redundant; I can't tell if there will be a case where that returns a different answer than the comparison on filter_values('course_name').

It shouldn't be in the same condition, I was testing some changes. The final SQL should be:

{% if get_filters('course_name', remove_filter=True) != [] %} # When there are no filters defined (such as in the SQL Lab, programmatic access, or when creating charts) # Do nothing, no filter {% if filter_values('course_name') != [] %} # Bassically in the dashboard context # Do the filter thing {% else %} 1=0 {% endif %}

get_filters returns the applied filters to a column in a list with the comparator and the value. See https://superset.apache.org/docs/installation/sql-templating/#available-macros

bmtcril

LGTM, remember to squash 😅

openedx-webhooks · 2024-03-18T16:28:45Z

@Ian2012 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 15, 2024

Ian2012 requested review from bmtcril and pomegranited March 15, 2024 19:17

Base automatically changed from cag/fix-dev to main March 15, 2024 19:17

Ian2012 force-pushed the cag/performance-metrics branch from db3cc91 to 612f5e0 Compare March 15, 2024 19:25

Ian2012 added 5 commits March 15, 2024 14:25

fix: only filter datasets on dashboards

85e3f20

fix: restore query context for charts

2f62e5d

fix: use default locale on failed oauth remote connection

cbe1918

feat: add performance metrics commands

5a40b6c

build: add performance metrics step in CI

5a2dcba

Ian2012 force-pushed the cag/performance-metrics branch from 612f5e0 to 5a2dcba Compare March 15, 2024 19:25

Ian2012 added 4 commits March 15, 2024 14:35

chore: quality fixes

6a097aa

chore: use logger instead of print

bd0e343

chore: restore instructor dashboard chart ids

6e15856

fix: inject datasource_id

05ef907

bmtcril reviewed Mar 18, 2024

View reviewed changes

chore: only filter datasets on dashboard

cbf3bf9

Ian2012 force-pushed the cag/performance-metrics branch from f5feea7 to cbf3bf9 Compare March 18, 2024 14:57

chore: address PR suggestions

a8ec751

bmtcril approved these changes Mar 18, 2024

View reviewed changes

Ian2012 merged commit 59139d6 into main Mar 18, 2024
10 checks passed

Ian2012 deleted the cag/performance-metrics branch March 18, 2024 16:28

edx-semantic-release mentioned this pull request Mar 18, 2024

chore: preparing release 0.87.0 #665

Merged

This was referenced Mar 18, 2024

Capture all queries in Course Dashboard openedx/openedx-aspects#208

Closed

Run Course Dashboard test and store results openedx/openedx-aspects#211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add performance metrics commands #657

feat: add performance metrics commands #657

Ian2012 commented Mar 15, 2024 •

edited

Loading

openedx-webhooks commented Mar 15, 2024

bmtcril left a comment

bmtcril Mar 18, 2024

SoryRawyer Mar 18, 2024

Ian2012 Mar 18, 2024

Ian2012 Mar 18, 2024

bmtcril left a comment

openedx-webhooks commented Mar 18, 2024

feat: add performance metrics commands #657

feat: add performance metrics commands #657

Conversation

Ian2012 commented Mar 15, 2024 • edited Loading

Description

Pending changes

openedx-webhooks commented Mar 15, 2024

bmtcril left a comment

Choose a reason for hiding this comment

bmtcril Mar 18, 2024

Choose a reason for hiding this comment

SoryRawyer Mar 18, 2024

Choose a reason for hiding this comment

Ian2012 Mar 18, 2024

Choose a reason for hiding this comment

Ian2012 Mar 18, 2024

Choose a reason for hiding this comment

bmtcril left a comment

Choose a reason for hiding this comment

openedx-webhooks commented Mar 18, 2024

Ian2012 commented Mar 15, 2024 •

edited

Loading