-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize block date retrieval for course outlines. #99
Optimize block date retrieval for course outlines. #99
Conversation
return f"edx-when.content_dates:{course_key}:{query_dict_str}" | ||
published_version_str = '' | ||
if published_version: | ||
published_version_str = published_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is published_version
supposed to be a str
at this point, or still the BSON obj?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, published_version
would be a str
at this point. But since I still haven't gotten my hands on an actual published_version
, I'm not yet sure what type it will be.
I'm trying to get hold of a published_version
here:
https://github.com/edx/edx-platform/blob/7e146e04879944c956ee89fb5ae126ae0e3371c7/openedx/core/djangoapps/course_date_signals/handlers.py#L130
And haven't had any luck yet...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a str
at this point - verified it.
edx_when/api.py
Outdated
version = published_version | ||
if published_version: | ||
version, __ = models.CourseVersion.objects.get_or_create(published_version=published_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor and optional, but personally, I find it easier to digest this when it's broken out something like:
version = None
if published_version:
version, __ = models.CourseVersion.objects.get_or_create(published_version=published_version)
or
if published_version:
version, __ = models.CourseVersion.objects.get_or_create(published_version=published_version)
else:
version = None
Mostly because it's easier for my head to grok that version
can be None
or this model object, but it's weird when I think that it can also be a string, but only for a really limited time.
edx_when/models.py
Outdated
|
||
class Meta: | ||
"""Django Metadata.""" | ||
|
||
unique_together = ('policy', 'location', 'field') | ||
constraints = [ | ||
models.UniqueConstraint(fields=('location', 'block_type'), name='unique_location_block_type'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a unique constraint, I think the extra index is redundant.
@@ -96,11 +113,19 @@ class ContentDate(models.Model): | |||
location = UsageKeyField(null=True, default=None, db_index=True, max_length=255) | |||
field = models.CharField(max_length=255, default='') | |||
active = models.BooleanField(default=True) | |||
block_type = models.CharField(max_length=255, null=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably also want a composite index on (course_id, block_type)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added below.
2232f61
to
61798b4
Compare
needs_save = True | ||
if existing_date.block_type != block_id.block_type: | ||
existing_date.block_type = block_id.block_type | ||
needs_save = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the new block_type
field is added as NULL, this code will lazily fill the field with block types.
qset = models.ContentDate.objects.filter(course_id=course_id, active=True, **rel_lookup) | ||
if subsection_and_higher_only: | ||
# Include NULL block_type values as well because of lazy rollout. | ||
qset = qset.filter(Q(block_type__in=('course', 'chapter', 'sequential', None))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This filter is the new one that limits the queryset to the block types shown. None
is also included because of the lazy filling of the block_type
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a condition on which we can remove the None? Or just always keep because we won't ever be able to guarantee everything has it filled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds the new column with null=True
to avoid DB locking on deployment. And I've chosen to do a lazy load of the column upon course publish. With this strategy, it's possible that some courses will never be published and won't have values for block_type ever - unless we populate the value for all courses using some asynchronous process. Once a course is published, no None
values would be expected for block_type.
Summary: With the current plan, just always keep.
qset = cache.get(external_cache_key) if use_cached else None | ||
raw_results_cache_key = _content_dates_cache_key( | ||
course_id, rel_lookup, subsection_and_higher_only, published_version | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the cache key for the raw, unprocessed query results before the additional processing below.
# query set to be returned. | ||
processed_results_cache_key = _processed_results_cache_key( | ||
course_id, user_id, schedule, allow_relative_dates, subsection_and_higher_only, published_version | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the cache key for the queried and processed results.
edx_when/api.py
Outdated
cache_key = _content_dates_cache_key( | ||
course_key, query_dict, subsection_and_higher_only, course_version | ||
) | ||
TieredCache.delete_all_tiers(cache_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cache clearing got a bit uglier because of the new additions to the cache key. 😿
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be called though? If the version information is encoded into the cache key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have just removed it - but the fact that a None
published_version can still be sent in made me think that the code still should clear the cache in that case. (I also hadn't made the edx-platform PR yet which uses the new edx-when, so I wasn't sure at the time if a published course version could be passed-in for all calls. I've since found out that a course version can be passed-in everywhere.)
Also, in the edx-platform PR, I'm getting the published version from course.course_version
in most places. And I don't understand completely when I can be assured that attribute will be present and filled-in. In Old Mongo courses, wouldn't that attribute be None
or non-existent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did my best to understand this, but I'm honestly not particularly well versed in caching so feel free to take all my comments with a grain of salt. I think it would be good to wait (if possible) for Mike Terry to return from vacation so he can also take a look. He is back Thursday
qset = models.ContentDate.objects.filter(course_id=course_id, active=True, **rel_lookup) | ||
if subsection_and_higher_only: | ||
# Include NULL block_type values as well because of lazy rollout. | ||
qset = qset.filter(Q(block_type__in=('course', 'chapter', 'sequential', None))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a condition on which we can remove the None? Or just always keep because we won't ever be able to guarantee everything has it filled?
|
||
class Meta: | ||
"""Django Metadata.""" | ||
|
||
unique_together = ('policy', 'location', 'field') | ||
indexes = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Add course_id and location here using the Index format instead of db_index on the field definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - I'll try this and see how it changes the migration file.
@@ -296,14 +364,17 @@ def get_overrides_for_user(course_id, user): | |||
yield {'location': udate.content_date.location, 'actual_date': udate.actual_date} | |||
|
|||
|
|||
def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, reason='', actor=None): | |||
def set_date_for_block( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this method also be clearing the cache when a new date is set? I know we do it when we clear dates, but wouldn't this be a similar process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption I'd made here is that a set_date_for_block
would be preceded by a course publish - and therefore a new published_version
would be passed-in. This new published_version
would leave the old cached value behind. Clearing all course dates on blocks might not necessarily be preceded by a course publish.
edx_when/models.py
Outdated
@@ -84,6 +84,16 @@ def clean(self): | |||
raise ValidationError(_("Absolute and relative dates cannot both be used")) | |||
|
|||
|
|||
class CourseVersion(models.Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have this as its own model rather than just a field on a ContentDate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! Because I just can't help myself from normalizing DB data? 😸 It would also work fine as a field on ContentDate
- I arbitrarily chose to use a new model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @Dillon-Dumesnil ! I'll discuss with my team about waiting for a merge. We'd like to get a PR approval from an owner of this repo ideally before proceeding.
qset = models.ContentDate.objects.filter(course_id=course_id, active=True, **rel_lookup) | ||
if subsection_and_higher_only: | ||
# Include NULL block_type values as well because of lazy rollout. | ||
qset = qset.filter(Q(block_type__in=('course', 'chapter', 'sequential', None))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds the new column with null=True
to avoid DB locking on deployment. And I've chosen to do a lazy load of the column upon course publish. With this strategy, it's possible that some courses will never be published and won't have values for block_type ever - unless we populate the value for all courses using some asynchronous process. Once a course is published, no None
values would be expected for block_type.
Summary: With the current plan, just always keep.
@@ -296,14 +364,17 @@ def get_overrides_for_user(course_id, user): | |||
yield {'location': udate.content_date.location, 'actual_date': udate.actual_date} | |||
|
|||
|
|||
def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, reason='', actor=None): | |||
def set_date_for_block( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption I'd made here is that a set_date_for_block
would be preceded by a course publish - and therefore a new published_version
would be passed-in. This new published_version
would leave the old cached value behind. Clearing all course dates on blocks might not necessarily be preceded by a course publish.
edx_when/api.py
Outdated
def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, reason='', actor=None): | ||
def set_date_for_block( | ||
course_id, block_id, field, date_or_timedelta, | ||
user=None, reason='', actor=None, published_version=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: while the published_version
is saved on each ContentDate
object, it's not used as a filter when querying course dates (though it is returned for each block).
edx_when/models.py
Outdated
@@ -84,6 +84,16 @@ def clean(self): | |||
raise ValidationError(_("Absolute and relative dates cannot both be used")) | |||
|
|||
|
|||
class CourseVersion(models.Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! Because I just can't help myself from normalizing DB data? 😸 It would also work fine as a field on ContentDate
- I arbitrarily chose to use a new model.
|
||
class Meta: | ||
"""Django Metadata.""" | ||
|
||
unique_together = ('policy', 'location', 'field') | ||
indexes = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - I'll try this and see how it changes the migration file.
edx_when/models.py
Outdated
@@ -96,18 +106,23 @@ class ContentDate(models.Model): | |||
location = UsageKeyField(null=True, default=None, db_index=True, max_length=255) | |||
field = models.CharField(max_length=255, default='') | |||
active = models.BooleanField(default=True) | |||
block_type = models.CharField(max_length=255, null=True) | |||
course_version = models.ForeignKey(CourseVersion, null=True, on_delete=models.CASCADE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee In thinking more about this field, does it need to be present at all? The code currently does not use it to filter against. It's stored for ContentDate
rows but never used to filter against in the current code - so it's really there just as a piece of info that might be useful for future debugging. The course_version/published_version is currently only used to invalidate old cache keys upon a new course publish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds fine to remove.
3d16a0e
to
dc00e3d
Compare
model_name='contentdate', | ||
index=models.Index(fields=['location'], name='edx_when_location_idx'), | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the SQL which this migration will run - generated with ./manage.py cms sqlmigrate edx_when 0008
:
BEGIN;
--
-- Add field block_type to contentdate
--
ALTER TABLE `edx_when_contentdate` ADD COLUMN `block_type` varchar(255) NULL;
--
-- Alter field course_id on contentdate
--
DROP INDEX `edx_when_contentdate_course_id_e6c39fdc` ON `edx_when_contentdate`;
--
-- Alter field location on contentdate
--
DROP INDEX `edx_when_contentdate_location_485206ea` ON `edx_when_contentdate`;
--
-- Create index edx_when_course_block_type_idx on field(s) course_id, block_type of model contentdate
--
CREATE INDEX `edx_when_course_block_type_idx` ON `edx_when_contentdate` (`course_id`, `block_type`);
--
-- Create index edx_when_course_id_idx on field(s) course_id of model contentdate
--
CREATE INDEX `edx_when_course_id_idx` ON `edx_when_contentdate` (`course_id`);
--
-- Create index edx_when_location_idx on field(s) location of model contentdate
--
CREATE INDEX `edx_when_location_idx` ON `edx_when_contentdate` (`location`);
COMMIT;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah whoops. I didn't mean for additional SQL to be executed via the switch to models.Index vs the db_index parameter. Happy to have you keep it, but if you'd rather not keep in these additional changes, totally understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually going to remove it - just to avoid risk by doing less index creation and dropping for this PR. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the index changes as mentioned - the SQL that will now be executed is below:
BEGIN;
--
-- Add field block_type to contentdate
--
ALTER TABLE `edx_when_contentdate` ADD COLUMN `block_type` varchar(255) NULL;
--
-- Create index edx_when_course_block_type_idx on field(s) course_id, block_type of model contentdate
--
CREATE INDEX `edx_when_course_block_type_idx` ON `edx_when_contentdate` (`course_id`, `block_type`);
COMMIT;
@Dillon-Dumesnil To clarify the concerns about approving this PR now:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the nits late in the game, but hey, they're nits. Do with them what you will. Biggest question is probably regarding the functionality (in the comment about subsection_and_higher_only variable name), but trust y'all's decision on that.
Thanks for this PR and all of the little improvements you made throughout the repo. We always appreciate a PR that leaves the place better than it found it and this certainly hits that!
@@ -26,15 +25,27 @@ | |||
FIELDS_TO_EXTRACT = ('due', 'start', 'end') | |||
|
|||
|
|||
def _content_dates_cache_key(course_key, query_dict): | |||
"""Memcached key for ContentDates given course_key and filter args.""" | |||
def _content_dates_cache_key(course_key, query_dict, subsection_and_higher_only, published_version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I don't love the name subsection_and_higher_only
. It's so specific to a use case, but I don't ave any better ideas. I do appreciate the change from outline_only
as I know we discussed that before and that it bled business logic.
I feel like ideally, this feature would be able to just specify different block types and have those returned and then the cache key would reflect those. I'm not sure how feasible/realistic that is so this is all just a nit and me thinking about different ideas. But being able to say "get me only units" or "sections and subsections" and pass in ['vertical']
or ['chapter', 'sequential']
would reduce any specificity of this feature. And enable the outline to still say, give me ['course', 'chapter', 'sequential']
. Or maybe even ['course', 'sequential']
(I'm not sure we even use the section dates on the outline).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in 😍 with the name subsection_or_higher_only
either - but it's the best I could think of and it's at least explicit without mentioning outlines. I agree with your proposed ideal solution as well. But I pulled back from that solution (passing in a list of block types) to avoid the extra scope & testing - and KISS/YAGNI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fair. Thank you!
edx_when/api.py
Outdated
query_dict_str = ".".join( | ||
sorted( | ||
"{},{}".format(key, value or "") | ||
for key, value in query_dict.items() | ||
) | ||
) | ||
return f"edx-when.content_dates:{course_key}:{query_dict_str}" | ||
high_level_block_str = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: switching this variable to be more like the string it will actually be. It's the only spot we use high_level_block
vs subsection_higher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - I'll change it.
edx_when/api.py
Outdated
@@ -122,9 +135,32 @@ def _get_end_dates_from_content_dates(qset): | |||
return end_datetime, cutoff_datetime | |||
|
|||
|
|||
def _processed_results_cache_key( | |||
course_id, user_id, schedule, allow_relative_dates, | |||
subsection_and_higher_only, published_version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super Nit: bring the ):
to be in line with the function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do here and elsewhere in the file.
edx_when/api.py
Outdated
return f"edx-when.content_dates:{course_key}:{query_dict_str}" | ||
high_level_block_str = '' | ||
if subsection_and_higher_only: | ||
high_level_block_str = 'subsection_higher' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if it's just me not understanding cache keys particularly well, but why use subsection_higher
here, but in _processed_results_cache_key
, you use subsection_and_higher_only
. I also realize the keys are different anyway (:
vs .
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out! I'll change this to stay consistent.
This method changes currently-cached rows - but the cache key contains the course's published version. | ||
And the calling method is *only* invoked upon course publish, which guarantees that a new course published | ||
version will be used to get course dates after this call is made, invalidating existing cache entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this comment. I'm sure it will be a life-saver in the future
@@ -327,21 +394,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, | |||
existing_date.active = True | |||
except models.ContentDate.DoesNotExist as error: | |||
if user: | |||
# A UserDate creation below requires an existing ContentDate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these awesome comments! Always good to have some fresh eyes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took awhile to figure it out! So thought I'd capture it.
@@ -318,6 +377,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, | |||
else: | |||
date_kwargs = {'abs_date': date_or_timedelta} | |||
|
|||
def _set_content_date_policy(date_kwargs, existing_content_date): | |||
# Race conditions were creating multiple DatePolicies w/ the same values. Handle that case. | |||
existing_policies = list(models.DatePolicy.objects.filter(**date_kwargs).order_by('id')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just me wondering, and no obligation at all to test or try or change anything, but I wonder why do the list and [0] vs just using .first()
on the query set. Not sure the error it throws if none (maybe none) but maybe an option. Not at all for you to do, just commenting while in my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://docs.djangoproject.com/en/2.2/ref/models/querysets/#first, using first()
would work just as well. I simply copied the existing code, removing the never-thrown exception.
edx_when/api.py
Outdated
needs_save = True | ||
|
||
if user and not user.is_anonymous: | ||
# Course block date is for a particular user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super Nit: Put the comment here and on line 419 above the if/else lines (one line higher) rather than just below. I personally always interpret python comments on what is directly beneath so seeing this comment and then the UserDate below, I was like yeah, cool, checks out. But then on line 419, the if statement didn't click with the comment and I realized it was applying to the whole else block. It's a small thing, feel free to ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@@ -318,6 +377,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, | |||
else: | |||
date_kwargs = {'abs_date': date_or_timedelta} | |||
|
|||
def _set_content_date_policy(date_kwargs, existing_content_date): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for DRYing it up in here
model_name='contentdate', | ||
index=models.Index(fields=['location'], name='edx_when_location_idx'), | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah whoops. I didn't mean for additional SQL to be executed via the switch to models.Index vs the db_index parameter. Happy to have you keep it, but if you'd rather not keep in these additional changes, totally understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the re-review, @Dillon-Dumesnil ! I've addressed all your comments and will incorporate most of them.
@@ -26,15 +25,27 @@ | |||
FIELDS_TO_EXTRACT = ('due', 'start', 'end') | |||
|
|||
|
|||
def _content_dates_cache_key(course_key, query_dict): | |||
"""Memcached key for ContentDates given course_key and filter args.""" | |||
def _content_dates_cache_key(course_key, query_dict, subsection_and_higher_only, published_version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in 😍 with the name subsection_or_higher_only
either - but it's the best I could think of and it's at least explicit without mentioning outlines. I agree with your proposed ideal solution as well. But I pulled back from that solution (passing in a list of block types) to avoid the extra scope & testing - and KISS/YAGNI.
edx_when/api.py
Outdated
return f"edx-when.content_dates:{course_key}:{query_dict_str}" | ||
high_level_block_str = '' | ||
if subsection_and_higher_only: | ||
high_level_block_str = 'subsection_higher' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out! I'll change this to stay consistent.
edx_when/api.py
Outdated
query_dict_str = ".".join( | ||
sorted( | ||
"{},{}".format(key, value or "") | ||
for key, value in query_dict.items() | ||
) | ||
) | ||
return f"edx-when.content_dates:{course_key}:{query_dict_str}" | ||
high_level_block_str = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - I'll change it.
edx_when/api.py
Outdated
cache_key += '.with-rel-dates' | ||
if subsection_and_higher_only: | ||
cache_key += '.subsection_and_higher_only' | ||
# Always add published version to the cache key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove - states the obvious.
edx_when/api.py
Outdated
use_cached: bool (optional) - skips cache lookups (but not saves) if False | ||
schedule: Schedule obj (optional) - override for a user's enrollment Schedule, used | ||
for relative date calculations | ||
subsection_and_higher_only: bool (optional) - used to generate course outlines; only returns dates for blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@@ -318,6 +377,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, | |||
else: | |||
date_kwargs = {'abs_date': date_or_timedelta} | |||
|
|||
def _set_content_date_policy(date_kwargs, existing_content_date): | |||
# Race conditions were creating multiple DatePolicies w/ the same values. Handle that case. | |||
existing_policies = list(models.DatePolicy.objects.filter(**date_kwargs).order_by('id')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://docs.djangoproject.com/en/2.2/ref/models/querysets/#first, using first()
would work just as well. I simply copied the existing code, removing the never-thrown exception.
@@ -327,21 +394,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, | |||
existing_date.active = True | |||
except models.ContentDate.DoesNotExist as error: | |||
if user: | |||
# A UserDate creation below requires an existing ContentDate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took awhile to figure it out! So thought I'd capture it.
edx_when/api.py
Outdated
needs_save = True | ||
|
||
if user and not user.is_anonymous: | ||
# Course block date is for a particular user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
model_name='contentdate', | ||
index=models.Index(fields=['location'], name='edx_when_location_idx'), | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually going to remove it - just to avoid risk by doing less index creation and dropping for this PR. Thanks!
edx_when/api.py
Outdated
@@ -122,9 +135,32 @@ def _get_end_dates_from_content_dates(qset): | |||
return end_datetime, cutoff_datetime | |||
|
|||
|
|||
def _processed_results_cache_key( | |||
course_id, user_id, schedule, allow_relative_dates, | |||
subsection_and_higher_only, published_version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do here and elsewhere in the file.
… for use in a course outline, which don't need block dates below the subsection level of a course. Also, rework the block date caching to use TieredCache, as well as using the current published version of a course in the cache key to remove the need for cache invalidation. TNL-8061
dc00e3d
to
b9930ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick post-merge comment to say this looks good and thank you Julia for your work!
@@ -1,7 +1,7 @@ | |||
""" | |||
Your project description goes here. | |||
Central source of course block dates for the LMS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:chefkiss:
Description:
When a course outline is requested, currently all block dates in the course are returned. This PR adds an optional optimization that allows only the dates for outline-relevant block types to be returned (course, chapter, sequential), which should greatly improve response times.
Also, caching is re-worked to use the TieredCache which caches data in both the ephemeral request cache as well as Django's long-term cache.
I've also created an edx-platform PR which uses this edx-when version for testing and proving out these changes - it's here:
https://github.com/edx/edx-platform/pull/28551
JIRA: https://openedx.atlassian.net/browse/TNL-8061
Dependencies: None
Merge deadline: None
Installation instructions: None
Testing instructions:
The change has been deployed to this sandbox:
https://edx-when-juliasq.sandbox.edx.org/
in order to test normal operation.
Reviewers:
Merge checklist:
Post merge:
finished.
Author concerns: I've added notes throughout.