-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tagging] Reduce the number of queries used in content tagging REST API #185
[Tagging] Reduce the number of queries used in content tagging REST API #185
Comments
FYI @bradenmacdonald |
I tested this with different users ("content_creatorA", "instructorA", "library_staffA", etc..) and the number goes to 19-25. Also, it seems that is a cache between the tests because this number varies between tests (the first user gets 25 calls an the next ones 19) |
@rpenido @bradenmacdonald I was able to make some small improvements on the "taxonomy list" endpoint today, but there's still way too many queries happening:
I can work on the prefetch caching, but that could take some time.. Do you have any other ideas I can try? |
I think one culprit is related to our class TaxonomySerializer(UserPermissionsSerializerMixin, serializers.ModelSerializer):
"""
Serializer for the Taxonomy model.
"""
# ...
def to_representation(self, instance):
"""
Cast the taxonomy before serialize
"""
instance = instance.cast()
return super().to_representation(instance) class Taxonomy(models.Model):
# ...
def cast(self):
"""
Returns the current Taxonomy instance cast into its taxonomy_class.
If no taxonomy_class is set, or if we're unable to import it, then it just returns self.
"""
try:
TaxonomyClass = self.taxonomy_class
if TaxonomyClass and not isinstance(self, TaxonomyClass):
return TaxonomyClass().copy(self)
except ImportError:
# Log error and continue
log.exception(
f"Unable to import taxonomy_class for {self}: {self._taxonomy_class}"
)
return self
# ...
def copy(self, taxonomy: Taxonomy) -> Taxonomy:
"""
Copy the fields from the given Taxonomy into the current instance.
"""
self.id = taxonomy.id
self.name = taxonomy.name
self.description = taxonomy.description
self.enabled = taxonomy.enabled
self.allow_multiple = taxonomy.allow_multiple
self.allow_free_text = taxonomy.allow_free_text
self.visible_to_authors = taxonomy.visible_to_authors
self._taxonomy_class = taxonomy._taxonomy_class # pylint: disable=protected-access
return self We are losing the cache the way we copy the
|
We got a good start here. I don't know if we can do this with O(1), but we are not that bad if it's O(N), with the number of taxonomies. I think we don't have O(taxonomy*org), right? Awesome investigation here @pomegranited! Good work! I think we should also check this with our most common user profile. Unfortunately, the staff is one of our best-case scenarios. |
I think you mean query 7? Query 8 actually looks totally unnecessary to me - why is it loading tag details for a "taxonomy list" endpoint? As far as I know, we don't include any actual tag data in the response. I'm also unclear on why queries 8,9,10 are only happening for taxonomy 3 and not the other taxonomies in the response. 11/12/14/17 are weird... they are checking if |
3 is probably a system-defined taxonomy, and we are calling .cast() on it (and losing the prefetched cache). |
Good catch! |
We progress! Have reduced the number of queries executed by the "taxonomy list" endpoint from 23 to 11! @rpenido you called it when you said:
I'm now copying the prefetch cache when casting, and that dropped 2 queries.
Oops, yes.
We include the
Agreed, fixed here. Since we're prefetching the The remaining queries look valid to me:
|
Nothing to do here. Query 5 is the only tagging-related query logged ; all the others are overhead from the CMS.
|
Was able to drop 2 queries off both the content_tagging and oel_tagging views:
|
"As a content author, I need content tagging APIs to respond quickly and efficiently, so that I don't waste time waiting for the UIs to load."
As part of openedx/edx-platform#34004 we had tests that verified that the number of query counts made by the content taxonomy REST APIs were not increased by adding user permissions to the response. This revealed that the content tagging REST APIs make too many queries when compiling their responses:
A likely cause for these discrepancies is the org-based rules logic, and permission checks being run too often. But there may also be query sets that would benefit from an added
select_related
.See also this comment about cross-org tagging and taxonomy admins -- address this issue too.
Acceptance criteria
The text was updated successfully, but these errors were encountered: