Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize block date retrieval for course outlines. #99

Merged

Conversation

doctoryes
Copy link
Contributor

@doctoryes doctoryes commented Aug 24, 2021

Description:

When a course outline is requested, currently all block dates in the course are returned. This PR adds an optional optimization that allows only the dates for outline-relevant block types to be returned (course, chapter, sequential), which should greatly improve response times.

Also, caching is re-worked to use the TieredCache which caches data in both the ephemeral request cache as well as Django's long-term cache.

I've also created an edx-platform PR which uses this edx-when version for testing and proving out these changes - it's here:
https://github.com/edx/edx-platform/pull/28551

JIRA: https://openedx.atlassian.net/browse/TNL-8061

Dependencies: None

Merge deadline: None

Installation instructions: None

Testing instructions:

The change has been deployed to this sandbox:
https://edx-when-juliasq.sandbox.edx.org/
in order to test normal operation.

Reviewers:

Merge checklist:

  • All reviewers approved
  • CI build is green
  • Version bumped
  • Changelog record added
  • Documentation updated (not only docstrings)
  • Commits are squashed
  • PR author is listed in AUTHORS

Post merge:

  • Create a tag
  • Check new version is pushed to PyPi after tag-triggered build is
    finished.
  • Delete working branch (if not needed anymore)

Author concerns: I've added notes throughout.

@doctoryes doctoryes changed the title Juliasq/tnl 8061 optimize outline block data retrieval Optimize block date retrieval for course outlines. Aug 25, 2021
return f"edx-when.content_dates:{course_key}:{query_dict_str}"
published_version_str = ''
if published_version:
published_version_str = published_version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is published_version supposed to be a str at this point, or still the BSON obj?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, published_version would be a str at this point. But since I still haven't gotten my hands on an actual published_version, I'm not yet sure what type it will be.

I'm trying to get hold of a published_version here:
https://github.com/edx/edx-platform/blob/7e146e04879944c956ee89fb5ae126ae0e3371c7/openedx/core/djangoapps/course_date_signals/handlers.py#L130
And haven't had any luck yet...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a str at this point - verified it.

edx_when/api.py Outdated Show resolved Hide resolved
edx_when/api.py Outdated
Comment on lines 86 to 90
version = published_version
if published_version:
version, __ = models.CourseVersion.objects.get_or_create(published_version=published_version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor and optional, but personally, I find it easier to digest this when it's broken out something like:

        version = None
        if published_version:
            version, __ = models.CourseVersion.objects.get_or_create(published_version=published_version)

or

        if published_version:
            version, __ = models.CourseVersion.objects.get_or_create(published_version=published_version)
        else:
            version = None

Mostly because it's easier for my head to grok that version can be None or this model object, but it's weird when I think that it can also be a string, but only for a really limited time.


class Meta:
"""Django Metadata."""

unique_together = ('policy', 'location', 'field')
constraints = [
models.UniqueConstraint(fields=('location', 'block_type'), name='unique_location_block_type'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a unique constraint, I think the extra index is redundant.

@@ -96,11 +113,19 @@ class ContentDate(models.Model):
location = UsageKeyField(null=True, default=None, db_index=True, max_length=255)
field = models.CharField(max_length=255, default='')
active = models.BooleanField(default=True)
block_type = models.CharField(max_length=255, null=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also want a composite index on (course_id, block_type)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added below.

@doctoryes doctoryes force-pushed the juliasq/TNL_8061_optimize_outline_block_data_retrieval branch 6 times, most recently from 2232f61 to 61798b4 Compare August 27, 2021 16:51
needs_save = True
if existing_date.block_type != block_id.block_type:
existing_date.block_type = block_id.block_type
needs_save = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the new block_type field is added as NULL, this code will lazily fill the field with block types.

qset = models.ContentDate.objects.filter(course_id=course_id, active=True, **rel_lookup)
if subsection_and_higher_only:
# Include NULL block_type values as well because of lazy rollout.
qset = qset.filter(Q(block_type__in=('course', 'chapter', 'sequential', None)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter is the new one that limits the queryset to the block types shown. None is also included because of the lazy filling of the block_type field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a condition on which we can remove the None? Or just always keep because we won't ever be able to guarantee everything has it filled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds the new column with null=True to avoid DB locking on deployment. And I've chosen to do a lazy load of the column upon course publish. With this strategy, it's possible that some courses will never be published and won't have values for block_type ever - unless we populate the value for all courses using some asynchronous process. Once a course is published, no None values would be expected for block_type.

Summary: With the current plan, just always keep.

qset = cache.get(external_cache_key) if use_cached else None
raw_results_cache_key = _content_dates_cache_key(
course_id, rel_lookup, subsection_and_higher_only, published_version
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cache key for the raw, unprocessed query results before the additional processing below.

# query set to be returned.
processed_results_cache_key = _processed_results_cache_key(
course_id, user_id, schedule, allow_relative_dates, subsection_and_higher_only, published_version
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cache key for the queried and processed results.

edx_when/api.py Outdated
cache_key = _content_dates_cache_key(
course_key, query_dict, subsection_and_higher_only, course_version
)
TieredCache.delete_all_tiers(cache_key)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache clearing got a bit uglier because of the new additions to the cache key. 😿

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be called though? If the version information is encoded into the cache key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have just removed it - but the fact that a None published_version can still be sent in made me think that the code still should clear the cache in that case. (I also hadn't made the edx-platform PR yet which uses the new edx-when, so I wasn't sure at the time if a published course version could be passed-in for all calls. I've since found out that a course version can be passed-in everywhere.)

Also, in the edx-platform PR, I'm getting the published version from course.course_version in most places. And I don't understand completely when I can be assured that attribute will be present and filled-in. In Old Mongo courses, wouldn't that attribute be None or non-existent?

@doctoryes doctoryes marked this pull request as ready for review August 27, 2021 22:40
Copy link
Contributor

@Dillon-Dumesnil Dillon-Dumesnil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did my best to understand this, but I'm honestly not particularly well versed in caching so feel free to take all my comments with a grain of salt. I think it would be good to wait (if possible) for Mike Terry to return from vacation so he can also take a look. He is back Thursday

qset = models.ContentDate.objects.filter(course_id=course_id, active=True, **rel_lookup)
if subsection_and_higher_only:
# Include NULL block_type values as well because of lazy rollout.
qset = qset.filter(Q(block_type__in=('course', 'chapter', 'sequential', None)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a condition on which we can remove the None? Or just always keep because we won't ever be able to guarantee everything has it filled?


class Meta:
"""Django Metadata."""

unique_together = ('policy', 'location', 'field')
indexes = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add course_id and location here using the Index format instead of db_index on the field definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - I'll try this and see how it changes the migration file.

@@ -296,14 +364,17 @@ def get_overrides_for_user(course_id, user):
yield {'location': udate.content_date.location, 'actual_date': udate.actual_date}


def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, reason='', actor=None):
def set_date_for_block(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method also be clearing the cache when a new date is set? I know we do it when we clear dates, but wouldn't this be a similar process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption I'd made here is that a set_date_for_block would be preceded by a course publish - and therefore a new published_version would be passed-in. This new published_version would leave the old cached value behind. Clearing all course dates on blocks might not necessarily be preceded by a course publish.

@@ -84,6 +84,16 @@ def clean(self):
raise ValidationError(_("Absolute and relative dates cannot both be used"))


class CourseVersion(models.Model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this as its own model rather than just a field on a ContentDate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Because I just can't help myself from normalizing DB data? 😸 It would also work fine as a field on ContentDate - I arbitrarily chose to use a new model.

Copy link
Contributor Author

@doctoryes doctoryes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @Dillon-Dumesnil ! I'll discuss with my team about waiting for a merge. We'd like to get a PR approval from an owner of this repo ideally before proceeding.

qset = models.ContentDate.objects.filter(course_id=course_id, active=True, **rel_lookup)
if subsection_and_higher_only:
# Include NULL block_type values as well because of lazy rollout.
qset = qset.filter(Q(block_type__in=('course', 'chapter', 'sequential', None)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds the new column with null=True to avoid DB locking on deployment. And I've chosen to do a lazy load of the column upon course publish. With this strategy, it's possible that some courses will never be published and won't have values for block_type ever - unless we populate the value for all courses using some asynchronous process. Once a course is published, no None values would be expected for block_type.

Summary: With the current plan, just always keep.

@@ -296,14 +364,17 @@ def get_overrides_for_user(course_id, user):
yield {'location': udate.content_date.location, 'actual_date': udate.actual_date}


def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, reason='', actor=None):
def set_date_for_block(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption I'd made here is that a set_date_for_block would be preceded by a course publish - and therefore a new published_version would be passed-in. This new published_version would leave the old cached value behind. Clearing all course dates on blocks might not necessarily be preceded by a course publish.

edx_when/api.py Outdated
def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None, reason='', actor=None):
def set_date_for_block(
course_id, block_id, field, date_or_timedelta,
user=None, reason='', actor=None, published_version=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: while the published_version is saved on each ContentDate object, it's not used as a filter when querying course dates (though it is returned for each block).

@@ -84,6 +84,16 @@ def clean(self):
raise ValidationError(_("Absolute and relative dates cannot both be used"))


class CourseVersion(models.Model):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Because I just can't help myself from normalizing DB data? 😸 It would also work fine as a field on ContentDate - I arbitrarily chose to use a new model.


class Meta:
"""Django Metadata."""

unique_together = ('policy', 'location', 'field')
indexes = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - I'll try this and see how it changes the migration file.

@@ -96,18 +106,23 @@ class ContentDate(models.Model):
location = UsageKeyField(null=True, default=None, db_index=True, max_length=255)
field = models.CharField(max_length=255, default='')
active = models.BooleanField(default=True)
block_type = models.CharField(max_length=255, null=True)
course_version = models.ForeignKey(CourseVersion, null=True, on_delete=models.CASCADE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ormsbee In thinking more about this field, does it need to be present at all? The code currently does not use it to filter against. It's stored for ContentDate rows but never used to filter against in the current code - so it's really there just as a piece of info that might be useful for future debugging. The course_version/published_version is currently only used to invalidate old cache keys upon a new course publish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds fine to remove.

@doctoryes doctoryes force-pushed the juliasq/TNL_8061_optimize_outline_block_data_retrieval branch 3 times, most recently from 3d16a0e to dc00e3d Compare August 31, 2021 20:34
model_name='contentdate',
index=models.Index(fields=['location'], name='edx_when_location_idx'),
),
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the SQL which this migration will run - generated with ./manage.py cms sqlmigrate edx_when 0008:

BEGIN;
--
-- Add field block_type to contentdate
--
ALTER TABLE `edx_when_contentdate` ADD COLUMN `block_type` varchar(255) NULL;
--
-- Alter field course_id on contentdate
--
DROP INDEX `edx_when_contentdate_course_id_e6c39fdc` ON `edx_when_contentdate`;
--
-- Alter field location on contentdate
--
DROP INDEX `edx_when_contentdate_location_485206ea` ON `edx_when_contentdate`;
--
-- Create index edx_when_course_block_type_idx on field(s) course_id, block_type of model contentdate
--
CREATE INDEX `edx_when_course_block_type_idx` ON `edx_when_contentdate` (`course_id`, `block_type`);
--
-- Create index edx_when_course_id_idx on field(s) course_id of model contentdate
--
CREATE INDEX `edx_when_course_id_idx` ON `edx_when_contentdate` (`course_id`);
--
-- Create index edx_when_location_idx on field(s) location of model contentdate
--
CREATE INDEX `edx_when_location_idx` ON `edx_when_contentdate` (`location`);
COMMIT;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah whoops. I didn't mean for additional SQL to be executed via the switch to models.Index vs the db_index parameter. Happy to have you keep it, but if you'd rather not keep in these additional changes, totally understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually going to remove it - just to avoid risk by doing less index creation and dropping for this PR. Thanks!

Copy link
Contributor Author

@doctoryes doctoryes Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the index changes as mentioned - the SQL that will now be executed is below:

BEGIN;
--
-- Add field block_type to contentdate
--
ALTER TABLE `edx_when_contentdate` ADD COLUMN `block_type` varchar(255) NULL;
--
-- Create index edx_when_course_block_type_idx on field(s) course_id, block_type of model contentdate
--
CREATE INDEX `edx_when_course_block_type_idx` ON `edx_when_contentdate` (`course_id`, `block_type`);
COMMIT;

@doctoryes
Copy link
Contributor Author

@Dillon-Dumesnil To clarify the concerns about approving this PR now:

  • Is @mikix 's approval a pre-requisite for merging?
  • Would your team feel comfortable with us merging/deploying this change based on our own review - and having @mikix review the changes post-deploy?
    • Naturally, our team would assume responsibility for the deployment and any needed fixes/rollback.
      Thanks!

Copy link
Contributor

@Dillon-Dumesnil Dillon-Dumesnil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for all the nits late in the game, but hey, they're nits. Do with them what you will. Biggest question is probably regarding the functionality (in the comment about subsection_and_higher_only variable name), but trust y'all's decision on that.
Thanks for this PR and all of the little improvements you made throughout the repo. We always appreciate a PR that leaves the place better than it found it and this certainly hits that!

@@ -26,15 +25,27 @@
FIELDS_TO_EXTRACT = ('due', 'start', 'end')


def _content_dates_cache_key(course_key, query_dict):
"""Memcached key for ContentDates given course_key and filter args."""
def _content_dates_cache_key(course_key, query_dict, subsection_and_higher_only, published_version):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't love the name subsection_and_higher_only. It's so specific to a use case, but I don't ave any better ideas. I do appreciate the change from outline_only as I know we discussed that before and that it bled business logic.

I feel like ideally, this feature would be able to just specify different block types and have those returned and then the cache key would reflect those. I'm not sure how feasible/realistic that is so this is all just a nit and me thinking about different ideas. But being able to say "get me only units" or "sections and subsections" and pass in ['vertical'] or ['chapter', 'sequential'] would reduce any specificity of this feature. And enable the outline to still say, give me ['course', 'chapter', 'sequential']. Or maybe even ['course', 'sequential'] (I'm not sure we even use the section dates on the outline).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in 😍 with the name subsection_or_higher_only either - but it's the best I could think of and it's at least explicit without mentioning outlines. I agree with your proposed ideal solution as well. But I pulled back from that solution (passing in a list of block types) to avoid the extra scope & testing - and KISS/YAGNI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair. Thank you!

edx_when/api.py Outdated
query_dict_str = ".".join(
sorted(
"{},{}".format(key, value or "")
for key, value in query_dict.items()
)
)
return f"edx-when.content_dates:{course_key}:{query_dict_str}"
high_level_block_str = ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: switching this variable to be more like the string it will actually be. It's the only spot we use high_level_block vs subsection_higher

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - I'll change it.

edx_when/api.py Outdated
@@ -122,9 +135,32 @@ def _get_end_dates_from_content_dates(qset):
return end_datetime, cutoff_datetime


def _processed_results_cache_key(
course_id, user_id, schedule, allow_relative_dates,
subsection_and_higher_only, published_version):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super Nit: bring the ): to be in line with the function name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do here and elsewhere in the file.

edx_when/api.py Outdated
return f"edx-when.content_dates:{course_key}:{query_dict_str}"
high_level_block_str = ''
if subsection_and_higher_only:
high_level_block_str = 'subsection_higher'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if it's just me not understanding cache keys particularly well, but why use subsection_higher here, but in _processed_results_cache_key, you use subsection_and_higher_only. I also realize the keys are different anyway (: vs .).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! I'll change this to stay consistent.

Comment on lines +105 to +108
This method changes currently-cached rows - but the cache key contains the course's published version.
And the calling method is *only* invoked upon course publish, which guarantees that a new course published
version will be used to get course dates after this call is made, invalidating existing cache entries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this comment. I'm sure it will be a life-saver in the future

@@ -327,21 +394,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None,
existing_date.active = True
except models.ContentDate.DoesNotExist as error:
if user:
# A UserDate creation below requires an existing ContentDate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these awesome comments! Always good to have some fresh eyes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took awhile to figure it out! So thought I'd capture it.

@@ -318,6 +377,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None,
else:
date_kwargs = {'abs_date': date_or_timedelta}

def _set_content_date_policy(date_kwargs, existing_content_date):
# Race conditions were creating multiple DatePolicies w/ the same values. Handle that case.
existing_policies = list(models.DatePolicy.objects.filter(**date_kwargs).order_by('id'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just me wondering, and no obligation at all to test or try or change anything, but I wonder why do the list and [0] vs just using .first() on the query set. Not sure the error it throws if none (maybe none) but maybe an option. Not at all for you to do, just commenting while in my head.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://docs.djangoproject.com/en/2.2/ref/models/querysets/#first, using first() would work just as well. I simply copied the existing code, removing the never-thrown exception.

edx_when/api.py Outdated
needs_save = True

if user and not user.is_anonymous:
# Course block date is for a particular user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super Nit: Put the comment here and on line 419 above the if/else lines (one line higher) rather than just below. I personally always interpret python comments on what is directly beneath so seeing this comment and then the UserDate below, I was like yeah, cool, checks out. But then on line 419, the if statement didn't click with the comment and I realized it was applying to the whole else block. It's a small thing, feel free to ignore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -318,6 +377,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None,
else:
date_kwargs = {'abs_date': date_or_timedelta}

def _set_content_date_policy(date_kwargs, existing_content_date):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for DRYing it up in here

model_name='contentdate',
index=models.Index(fields=['location'], name='edx_when_location_idx'),
),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah whoops. I didn't mean for additional SQL to be executed via the switch to models.Index vs the db_index parameter. Happy to have you keep it, but if you'd rather not keep in these additional changes, totally understand.

Copy link
Contributor Author

@doctoryes doctoryes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the re-review, @Dillon-Dumesnil ! I've addressed all your comments and will incorporate most of them.

@@ -26,15 +25,27 @@
FIELDS_TO_EXTRACT = ('due', 'start', 'end')


def _content_dates_cache_key(course_key, query_dict):
"""Memcached key for ContentDates given course_key and filter args."""
def _content_dates_cache_key(course_key, query_dict, subsection_and_higher_only, published_version):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in 😍 with the name subsection_or_higher_only either - but it's the best I could think of and it's at least explicit without mentioning outlines. I agree with your proposed ideal solution as well. But I pulled back from that solution (passing in a list of block types) to avoid the extra scope & testing - and KISS/YAGNI.

edx_when/api.py Outdated
return f"edx-when.content_dates:{course_key}:{query_dict_str}"
high_level_block_str = ''
if subsection_and_higher_only:
high_level_block_str = 'subsection_higher'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! I'll change this to stay consistent.

edx_when/api.py Outdated
query_dict_str = ".".join(
sorted(
"{},{}".format(key, value or "")
for key, value in query_dict.items()
)
)
return f"edx-when.content_dates:{course_key}:{query_dict_str}"
high_level_block_str = ''
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - I'll change it.

edx_when/api.py Outdated
cache_key += '.with-rel-dates'
if subsection_and_higher_only:
cache_key += '.subsection_and_higher_only'
# Always add published version to the cache key.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove - states the obvious.

edx_when/api.py Outdated
use_cached: bool (optional) - skips cache lookups (but not saves) if False
schedule: Schedule obj (optional) - override for a user's enrollment Schedule, used
for relative date calculations
subsection_and_higher_only: bool (optional) - used to generate course outlines; only returns dates for blocks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -318,6 +377,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None,
else:
date_kwargs = {'abs_date': date_or_timedelta}

def _set_content_date_policy(date_kwargs, existing_content_date):
# Race conditions were creating multiple DatePolicies w/ the same values. Handle that case.
existing_policies = list(models.DatePolicy.objects.filter(**date_kwargs).order_by('id'))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://docs.djangoproject.com/en/2.2/ref/models/querysets/#first, using first() would work just as well. I simply copied the existing code, removing the never-thrown exception.

@@ -327,21 +394,14 @@ def set_date_for_block(course_id, block_id, field, date_or_timedelta, user=None,
existing_date.active = True
except models.ContentDate.DoesNotExist as error:
if user:
# A UserDate creation below requires an existing ContentDate.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took awhile to figure it out! So thought I'd capture it.

edx_when/api.py Outdated
needs_save = True

if user and not user.is_anonymous:
# Course block date is for a particular user.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

model_name='contentdate',
index=models.Index(fields=['location'], name='edx_when_location_idx'),
),
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually going to remove it - just to avoid risk by doing less index creation and dropping for this PR. Thanks!

edx_when/api.py Outdated
@@ -122,9 +135,32 @@ def _get_end_dates_from_content_dates(qset):
return end_datetime, cutoff_datetime


def _processed_results_cache_key(
course_id, user_id, schedule, allow_relative_dates,
subsection_and_higher_only, published_version):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do here and elsewhere in the file.

… for use in a

course outline, which don't need block dates below the subsection level of a course.
Also, rework the block date caching to use TieredCache, as well as using the current
published version of a course in the cache key to remove the need for cache invalidation.

TNL-8061
@doctoryes doctoryes force-pushed the juliasq/TNL_8061_optimize_outline_block_data_retrieval branch from dc00e3d to b9930ec Compare September 1, 2021 15:47
@doctoryes doctoryes merged commit 5b233d5 into master Sep 1, 2021
@doctoryes doctoryes deleted the juliasq/TNL_8061_optimize_outline_block_data_retrieval branch September 1, 2021 17:53
Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick post-merge comment to say this looks good and thank you Julia for your work!

@@ -1,7 +1,7 @@
"""
Your project description goes here.
Central source of course block dates for the LMS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:chefkiss:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants