-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System defined taxonomies #67
System defined taxonomies #67
Conversation
Thanks for the pull request, @ChrisChV! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
* Adds Taxonomy.object_tag_class property, which can be overridden by Taxonomy subclasses to provide their own ObjectTag subclasses. * Replaces the Taxonomy._set_tag method with an ObjectTag.tag_ref property setter, so that ObjectTag subclasses can dynamically create Tags as needed during tag_object() * Moves the tag_class_model and tag_class_value properties to a new ModelObjectTag abstract class, so the ModelSystemDefinedTaxonomy can be simpler. * Updates tests for coverage
removing the need for _get_tags_query_set
3d23976
to
0c74bb4
Compare
- Taxonomy.system_defined updated to be a hardcoded property. - TaxonomyAdmin updated to avoid edit and delete system taxonomies. - TagAdmin updated to avoid edit and delete tags from system taxonomies. - TagAmdin updated to avoid create Tags on system taxonomies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This is looking great @ChrisChV !
I left some suggestions and a nits, but if you could address them, I think it's ready for upstream review. Feel free to merge open-craft#8 or close and rewrite as you see fit?
- I tested this with the PR test instructions, and by ensuring code coverage for automated tests.
- I read through the code and tests
-
I checked for accessibility issuesN/A - Includes documentation
- Commit structure follows OEP-0051
""" | ||
Returns True if the instance is valid | ||
""" | ||
return super()._check_tag(object_tag) and self._check_instance(object_tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to think about post-MVP:
I totally agree that for a ModelObjectTag to be fully valid, the instance it refers to needs to exist.
But I'm worried about the extra database call required to check this -- e.g if we decide to hide "invalid" object tags from learners, then we could be calling this method a lot.
So I don't think this needs changing right now, but when we know more about how we want to use tags in the LMS, we might decide to change this down the track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During day-to-day operations, does it even matter if there are ModelObjectTags that refer to non-existent objects?
Certainly when displaying tags for a particular (existing) content object, I think we want to be able to hide invalid tags whose taxonomy doesn't exist or doesn't match. But if there are a bunch of extra ObjectTags in the database pointing to deleted content, does it really matter? We can just have some kind of cleanup script that runs once in a while, to delete them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I thought this was talking about tagging non-existing content but it's talking about verifying if the Tag is valid (e.g. an author tag on an XBlock is invalid if the author has been deleted). Yeah we do want to check that.
to ensure that we're using the correct Taxonomy and ObjectTag class types
and fixes nit
Thanks for applying those changes @ChrisChV ! I think this is ready for @ormsbee and/or @bradenmacdonald 's eyes now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably won't get time for a full review today, but I started reading through the code.
@@ -431,6 +447,7 @@ class ObjectTag(models.Model): | |||
id = models.BigAutoField(primary_key=True) | |||
object_id = case_insensitive_char_field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this, but why is object_id
case insensitive? Shouldn't it be case sensitive? abc
and ABC
are usually different IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok.. I think that came from a misunderstanding about content UsageKeys.. I thought they were case insensitive.
@ChrisChV Do you mind to fix this here? Sorry there's been so many little mistakes like this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bradenmacdonald Are you ok with the other case-insensitive char fields defined here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix that in a small, focused, separate PR. Just add it to the list you're making. I don't want to muddy the waters or delay this PR any longer than I already have, and smaller focused PRs are usually better.
I am not sure about _name
but I guess it makes sense to be case-insensitive.
If _value
holds IDs it should be case-sensitive I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return True | ||
|
||
|
||
class ModelObjectTag(ObjectTag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with ContentObjectTag
?
I don't really like that sometimes the type of taxonomy (system-defined or not) determines which subclass of ObjectTag
to use (ModelObjectTag
), and other times the type of content (XBlock or Course) determines it (ContentObjectTag
). I know that they're both driven by the taxonomy actually, since ContentTaxnonomy is a type of Taxonomy too, but it seems like there's some conceptual confusion. (I'm not asking for changes just yet, but trying to understand what the idea is here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. maybe ModelObjectTag's functionality should be in a mixin instead, so that we can have ModelContentObjectTags..
But that probably also means that the SystemTaxonomy / ModelSystemTaxonomy functionality should be in a mixin too, same reason.. is this getting to complicated?
I really want to do some refactoring/simplification of the Taxonomy vs ObjectTag distinctions we ended up with.. but we don't have time for it right now. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup task: openedx/modular-learning#85
#. Tags are created on the fly when new ObjectTags are added. | ||
#. Tag.external_id we store an identifier from the instance (eg. User.pk). | ||
#. Tag.value we store a human readable representation of the instance (eg. User.username). | ||
#. Resync the tags to re-fetch the value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I definitely like the approach described here ^^
But what I'm still caught up on is ModelObjectTag
. Do we need to subclass ObjectTag and push this logic into it? From reading the approach described here, and from what we discussed, I assumed we'd happily just use the normal ObjectTag
or even ContentObjectTag
. Because we're creating Tag
instances as needed, and the regular ObjectTag
already works correctly with Tag
instances in Taxonomies.
Of course you need some logic to:
- list available tags even before they've been created as
Tag
instances: e.g. list all known users or languages → this belongs in the Taxonomy subclass but I don't think it's even necessary for this project, because we never need to display the "available" tags of system taxonomies to users. The tags will get auto-created by the system and the system already knows what user/language is in use. There's no UI and no dropdown menu to show. - auto-create
Tag
instances when tagging an object, if they don't yet exist. Sure, but this logic could live anywhere - in theedx-platform
code that's applying the User and Language tags, in thetag_object()
API, in theTaxonomy.tag_object()
API - I don't think it is necessary to put this into anObjectTag
subclass, and doing so creates a lot of complexity.- Think MVP, and what's the simplest solution? Just auto-create the tag in the same place you apply the tag to some content. No special behavior needed in this repo at all. In fact I almost wonder if the User Taxonomy is just a regular closed taxonomy that's not editable, and the platform code can create
Tag
s in it just-in-time... What other functionality do we need here?
- Think MVP, and what's the simplest solution? Just auto-create the tag in the same place you apply the tag to some content. No special behavior needed in this repo at all. In fact I almost wonder if the User Taxonomy is just a regular closed taxonomy that's not editable, and the platform code can create
- auto-delete
Tag
instances when the upstream object is deleted: we don't really need to worry about this for now, as languages won't be deleted and user deletion is very rare. In fact, I don't think user deletion even really happens; it's just retirement/anonymization that happens per GDPR. So in the future we could add a signal listener that detects "user retirement" and then deletes anyTag
andObjectTag
s associated with the user, but I would leave that out of scope for now. We don't even really know how/where such user tags will be used yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with everything you've said here, except for:
Just auto-create the tag in the same place you apply the tag to some content. No special behavior needed in this repo at all.
I think providing a ModelTaxonomy
(or mixin) that handles Tag auto-creation is broadly useful.
And I think it belongs in this openedx-learning library, because it's not content-specific and could be useful when tagging other types of objects, like people or forum posts. However, if it makes more sense for this to live in edx-platform for the MVP, I'm OK with that, we can always move it into openedx-learning later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list available tags even before they've been created as Tag instances: e.g. list all known users or languages → this belongs in the Taxonomy subclass but I don't think it's even necessary for this project, because we never need to display the "available" tags of system taxonomies to users. The tags will get auto-created by the system and the system already knows what user/language is in use. There's no UI and no dropdown menu to show.
That's true except for language, it's the only system-defined taxonomy that can be "edited" in a content by the content author.
I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.
I agree with Jill too, in addition to the auto creation, it checks that the instance (User, Organization) exists to see if the tag is valid.
auto-delete Tag instances when the upstream object is deleted: we don't really need to worry about this for now, as languages won't be deleted and user deletion is very rare. In fact, I don't think user deletion even really happens; it's just retirement/anonymization that happens per GDPR. So in the future we could add a signal listener that detects "user retirement" and then deletes any Tag and ObjectTags associated with the user, but I would leave that out of scope for now. We don't even really know how/where such user tags will be used yet.
I agree, the simplest thing is to check if the user or organization exists.
To clarify: the ModelTaxonomy is not used for languages, it is only used for the taxonomy of users and organizations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.
👍🏻 Sure, good point. That's totally reasonable, and that sounds like a mixin to me. After all, in the platform if we keep the "ContentTaxonomy" vs "regular Taxonomy" distinction, it would have to be a mixin, to allow Content tagging with auto-creation of tags.
That's true except for language, it's the only system-defined taxonomy that can be "edited" in a content by the content author.
Oh OK. I've asked for clarification on that because I heard the opposite. But in any case, it seems we agree that "list available tags" should live in the Taxonomy. (I think it should be similar to what you implemented in LanguageTaxonomy.get_tags() -> List[Tag]
but we'll probably eventually need an API that returns tag strings/IDs, not Tag
objects because they may not exist yet. Imagine doing an auto-complete to tag users and there are 1,000,000 users in the system - we don't want to create Tag
objects for each user unless they're actually used as a tag. Or maybe that would be a separate search_available_tags
API. In any case, future PR.)
I agree with Jill too, in addition to the auto creation, it checks that the instance (User, Organization) exists to see if the tag is valid.
That's fine. I do think however that belongs in the Taxonomy, not the ObjectTag itself. Because it's mostly a question of validating the User/Org then creating the corresponding Tag
instance. The ObjectTag can validate itself with the relation to the Tag
and doesn't need any special logic to handle ongoing validation related to the User/Org. If necessary, the Taxonomy can have some code to keep the Tag
instances in sync with the User/Org models, but I think we can leave that out of scope for now to keep it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.
That's totally reasonable, and that sounds like a mixin to me. After all, in the platform if we keep the "ContentTaxonomy" vs "regular Taxonomy" distinction, it would have to be a mixin, to allow Content tagging with auto-creation of tags.
👍
Or maybe that would be a separate search_available_tags API. In any case, future PR.)
Yep, agreed -- and note we don't need to search model tags in MVP (User and Organization system taxonomies are our only model system taxonomies, and neither are editable by the content authors).
I do think however that belongs in the Taxonomy, not the ObjectTag itself...but I think we can leave that out of scope for now to keep it simple.
Cool, thanks for your clarifications here @bradenmacdonald . I'll note this on my "cleanup" task: openedx/modular-learning#85
@@ -119,16 +138,16 @@ class TestModelTagTaxonomy(TestTagTaxonomyMixin, TestCase): | |||
|
|||
def test_system_defined(self): | |||
assert not self.taxonomy.system_defined | |||
assert self.system_taxonomy.system_defined | |||
assert self.system_taxonomy.cast().system_defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needing to remember this .cast
is likely to be a problem in the future.
My recommendation (for you to consider in the future!) is that you add a decorator like @auto_cast
to the base class properties like .system_defined
that will detect when the developer forgot to .cast()
and either auto-cast or throw an exception. Probably the latter to be more predictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so: I don't think there's a good justification for ModelObjectTag
and I'd like to see you either prove that it's important to the design despite the complexity it adds, or just remove it. I think everything else we've landed on makes sense.
You can merge this now and remove ModelObjectTag soon in a separate PR, or remove it now, or justify why we want to keep it.
One thing to think about for the future: I think that our conceptual model may be missing a key piece: a model to track how a Right now, there's no place to say "I want to use the Language Taxonomy to tag Content Objects belonging to X organization" or "I want users from org X to use OrgXTaxonomy to tag Course objects belonging to the organization" or "I want users from org Y to use the SpokenLanguage taxonomy to tag learners". If we had a That would solve the weird problem now where if you use LanguageSystemTaxonomy to tag an XBlock, it gets a plain Anyhow, I'm not saying to do it this way or to launch a major refactor. Just keep this idea in mind while we're iterating the models in the future, in case it resonates with you or inspires something. |
@bradenmacdonald I also agree that it should be removed. @pomegranited added it to do it in a separate PR: #67 (comment)
We are currently solving this by creating subclasses for tagging content. But yes, I agree that there must be a simpler way to do it. |
@ChrisChV OK, somehow I really didn't understand that until now. So to tag content with the (built-in, system defined) language Taxonomy, we cast it to Like if you had a I have already approved this PR, but I just noticed there is no test that actually applies some system-defined tags to content objects. Shouldn't we add that in? |
@bradenmacdonald Thanks for your feedback!
Yes, it is.
Yes, sorry, I added the system defined to the api test, that includes tag objects: 034726d |
Description
Subclasses related to build System-defined taxonomies:
models.py
moved tomodels/base.py
tag_ref
setter to allow override how to set a tag on an object_tagobject_tag_class
property added to TaxonomyTesting instructions
Manual testing is possible but not very useful, since we don't yet have views for these models and APIs.
But you can check that the added tests cover the expected polymorphism behavior supported by this change, and that tests are still covering 100% of the openedx_tagging module.
Other information
Relates to openedx/modular-learning#77