System defined taxonomies #67

ChrisChV · 2023-07-21T15:32:49Z

Description

Subclasses related to build System-defined taxonomies:

Docs updated to reflect the new architecture of taxonomies.
models.py moved to models/base.py
tag_ref setter to allow override how to set a tag on an object_tag
object_tag_class property added to Taxonomy
Base system-defined subclasses added.
Language tags fixture.
Taxonomy.system_defined updated to be a hardcoded property.

Testing instructions

Manual testing is possible but not very useful, since we don't yet have views for these models and APIs.

But you can check that the added tests cover the expected polymorphism behavior supported by this change, and that tests are still covering 100% of the openedx_tagging module.

Other information

Relates to openedx/modular-learning#77

openedx-webhooks · 2023-07-21T15:32:55Z

Thanks for the pull request, @ChrisChV! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

supporting documentation
Open edX discussion forum threads
timeline information ("this must be merged by XX date", and why that is)
partner information ("this is a course on edx.org")
any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

…t tags

…and tags

* Adds Taxonomy.object_tag_class property, which can be overridden by Taxonomy subclasses to provide their own ObjectTag subclasses. * Replaces the Taxonomy._set_tag method with an ObjectTag.tag_ref property setter, so that ObjectTag subclasses can dynamically create Tags as needed during tag_object() * Moves the tag_class_model and tag_class_value properties to a new ModelObjectTag abstract class, so the ModelSystemDefinedTaxonomy can be simpler. * Updates tests for coverage

removing the need for _get_tags_query_set

- Taxonomy.system_defined updated to be a hardcoded property. - TaxonomyAdmin updated to avoid edit and delete system taxonomies. - TagAdmin updated to avoid edit and delete tags from system taxonomies. - TagAmdin updated to avoid create Tags on system taxonomies.

pomegranited

👍 This is looking great @ChrisChV !

I left some suggestions and a nits, but if you could address them, I think it's ready for upstream review. Feel free to merge open-craft#8 or close and rewrite as you see fit?

I tested this with the PR test instructions, and by ensuring code coverage for automated tests.
I read through the code and tests
~~I checked for accessibility issues~~ N/A
Includes documentation
Commit structure follows OEP-0051

openedx_tagging/core/tagging/admin.py

openedx_tagging/core/tagging/models/system_defined.py

pomegranited · 2023-07-23T23:06:21Z

openedx_tagging/core/tagging/models/system_defined.py

+        """
+        Returns True if the instance is valid
+        """
+        return super()._check_tag(object_tag) and self._check_instance(object_tag)


Something to think about post-MVP:

I totally agree that for a ModelObjectTag to be fully valid, the instance it refers to needs to exist.
But I'm worried about the extra database call required to check this -- e.g if we decide to hide "invalid" object tags from learners, then we could be calling this method a lot.

So I don't think this needs changing right now, but when we know more about how we want to use tags in the LMS, we might decide to change this down the track.

During day-to-day operations, does it even matter if there are ModelObjectTags that refer to non-existent objects?

Certainly when displaying tags for a particular (existing) content object, I think we want to be able to hide invalid tags whose taxonomy doesn't exist or doesn't match. But if there are a bunch of extra ObjectTags in the database pointing to deleted content, does it really matter? We can just have some kind of cleanup script that runs once in a while, to delete them.

Oh sorry, I thought this was talking about tagging non-existing content but it's talking about verifying if the Tag is valid (e.g. an author tag on an XBlock is invalid if the author has been deleted). Yeah we do want to check that.

openedx_tagging/core/tagging/models/base.py

to ensure that we're using the correct Taxonomy and ObjectTag class types

and fixes nit

pomegranited · 2023-07-24T21:54:07Z

Thanks for applying those changes @ChrisChV ! I think this is ready for @ormsbee and/or @bradenmacdonald 's eyes now.

bradenmacdonald

I probably won't get time for a full review today, but I started reading through the code.

bradenmacdonald · 2023-07-25T21:06:53Z

openedx_tagging/core/tagging/models/base.py

@@ -431,6 +447,7 @@ class ObjectTag(models.Model):
    id = models.BigAutoField(primary_key=True)
    object_id = case_insensitive_char_field(


I just noticed this, but why is object_id case insensitive? Shouldn't it be case sensitive? abc and ABC are usually different IDs.

Ah ok.. I think that came from a misunderstanding about content UsageKeys.. I thought they were case insensitive.

@ChrisChV Do you mind to fix this here? Sorry there's been so many little mistakes like this..

@bradenmacdonald Are you ok with the other case-insensitive char fields defined here?

Please fix that in a small, focused, separate PR. Just add it to the list you're making. I don't want to muddy the waters or delay this PR any longer than I already have, and smaller focused PRs are usually better.

I am not sure about _name but I guess it makes sense to be case-insensitive.
If _value holds IDs it should be case-sensitive I think.

cf openedx/modular-learning#84

bradenmacdonald · 2023-07-25T21:11:39Z

openedx_tagging/core/tagging/models/system_defined.py

+        return True
+
+
+class ModelObjectTag(ObjectTag):


How does this interact with ContentObjectTag?

I don't really like that sometimes the type of taxonomy (system-defined or not) determines which subclass of ObjectTag to use (ModelObjectTag), and other times the type of content (XBlock or Course) determines it (ContentObjectTag). I know that they're both driven by the taxonomy actually, since ContentTaxnonomy is a type of Taxonomy too, but it seems like there's some conceptual confusion. (I'm not asking for changes just yet, but trying to understand what the idea is here).

Hmm.. maybe ModelObjectTag's functionality should be in a mixin instead, so that we can have ModelContentObjectTags..

But that probably also means that the SystemTaxonomy / ModelSystemTaxonomy functionality should be in a mixin too, same reason.. is this getting to complicated?

I really want to do some refactoring/simplification of the Taxonomy vs ObjectTag distinctions we ended up with.. but we don't have time for it right now. :(

Cleanup task: openedx/modular-learning#85

bradenmacdonald · 2023-07-25T21:38:54Z

docs/decisions/0012-system-taxonomy-creation.rst

+#. Tags are created on the fly when new ObjectTags are added.
+#. Tag.external_id we store an identifier from the instance (eg. User.pk).
+#. Tag.value we store a human readable representation of the instance (eg. User.username).
+#. Resync the tags to re-fetch the value. 


OK, I definitely like the approach described here ^^

But what I'm still caught up on is ModelObjectTag. Do we need to subclass ObjectTag and push this logic into it? From reading the approach described here, and from what we discussed, I assumed we'd happily just use the normal ObjectTag or even ContentObjectTag. Because we're creating Tag instances as needed, and the regular ObjectTag already works correctly with Tag instances in Taxonomies.

Of course you need some logic to:

list available tags even before they've been created as Tag instances: e.g. list all known users or languages → this belongs in the Taxonomy subclass but I don't think it's even necessary for this project, because we never need to display the "available" tags of system taxonomies to users. The tags will get auto-created by the system and the system already knows what user/language is in use. There's no UI and no dropdown menu to show.

auto-create Tag instances when tagging an object, if they don't yet exist. Sure, but this logic could live anywhere - in the edx-platform code that's applying the User and Language tags, in the tag_object() API, in the Taxonomy.tag_object() API - I don't think it is necessary to put this into an ObjectTag subclass, and doing so creates a lot of complexity.

Think MVP, and what's the simplest solution? Just auto-create the tag in the same place you apply the tag to some content. No special behavior needed in this repo at all. In fact I almost wonder if the User Taxonomy is just a regular closed taxonomy that's not editable, and the platform code can create Tags in it just-in-time... What other functionality do we need here?

auto-delete Tag instances when the upstream object is deleted: we don't really need to worry about this for now, as languages won't be deleted and user deletion is very rare. In fact, I don't think user deletion even really happens; it's just retirement/anonymization that happens per GDPR. So in the future we could add a signal listener that detects "user retirement" and then deletes any Tag and ObjectTags associated with the user, but I would leave that out of scope for now. We don't even really know how/where such user tags will be used yet.

I agree with everything you've said here, except for:

Just auto-create the tag in the same place you apply the tag to some content. No special behavior needed in this repo at all.

I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.

And I think it belongs in this openedx-learning library, because it's not content-specific and could be useful when tagging other types of objects, like people or forum posts. However, if it makes more sense for this to live in edx-platform for the MVP, I'm OK with that, we can always move it into openedx-learning later.

list available tags even before they've been created as Tag instances: e.g. list all known users or languages → this belongs in the Taxonomy subclass but I don't think it's even necessary for this project, because we never need to display the "available" tags of system taxonomies to users. The tags will get auto-created by the system and the system already knows what user/language is in use. There's no UI and no dropdown menu to show.

That's true except for language, it's the only system-defined taxonomy that can be "edited" in a content by the content author.

I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.

I agree with Jill too, in addition to the auto creation, it checks that the instance (User, Organization) exists to see if the tag is valid.

auto-delete Tag instances when the upstream object is deleted: we don't really need to worry about this for now, as languages won't be deleted and user deletion is very rare. In fact, I don't think user deletion even really happens; it's just retirement/anonymization that happens per GDPR. So in the future we could add a signal listener that detects "user retirement" and then deletes any Tag and ObjectTags associated with the user, but I would leave that out of scope for now. We don't even really know how/where such user tags will be used yet.

I agree, the simplest thing is to check if the user or organization exists.

To clarify: the ModelTaxonomy is not used for languages, it is only used for the taxonomy of users and organizations

I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.

👍🏻 Sure, good point. That's totally reasonable, and that sounds like a mixin to me. After all, in the platform if we keep the "ContentTaxonomy" vs "regular Taxonomy" distinction, it would have to be a mixin, to allow Content tagging with auto-creation of tags.

That's true except for language, it's the only system-defined taxonomy that can be "edited" in a content by the content author.

Oh OK. I've asked for clarification on that because I heard the opposite. But in any case, it seems we agree that "list available tags" should live in the Taxonomy. (I think it should be similar to what you implemented in LanguageTaxonomy.get_tags() -> List[Tag] but we'll probably eventually need an API that returns tag strings/IDs, not Tag objects because they may not exist yet. Imagine doing an auto-complete to tag users and there are 1,000,000 users in the system - we don't want to create Tag objects for each user unless they're actually used as a tag. Or maybe that would be a separate search_available_tags API. In any case, future PR.)

I agree with Jill too, in addition to the auto creation, it checks that the instance (User, Organization) exists to see if the tag is valid.

That's fine. I do think however that belongs in the Taxonomy, not the ObjectTag itself. Because it's mostly a question of validating the User/Org then creating the corresponding Tag instance. The ObjectTag can validate itself with the relation to the Tag and doesn't need any special logic to handle ongoing validation related to the User/Org. If necessary, the Taxonomy can have some code to keep the Tag instances in sync with the User/Org models, but I think we can leave that out of scope for now to keep it simple.

think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.
That's totally reasonable, and that sounds like a mixin to me. After all, in the platform if we keep the "ContentTaxonomy" vs "regular Taxonomy" distinction, it would have to be a mixin, to allow Content tagging with auto-creation of tags.

👍

Or maybe that would be a separate search_available_tags API. In any case, future PR.)

Yep, agreed -- and note we don't need to search model tags in MVP (User and Organization system taxonomies are our only model system taxonomies, and neither are editable by the content authors).

I do think however that belongs in the Taxonomy, not the ObjectTag itself...but I think we can leave that out of scope for now to keep it simple.

Cool, thanks for your clarifications here @bradenmacdonald . I'll note this on my "cleanup" task: openedx/modular-learning#85

openedx_tagging/core/tagging/models/system_defined.py

bradenmacdonald · 2023-07-26T17:26:30Z

tests/openedx_tagging/core/tagging/test_models.py

@@ -119,16 +138,16 @@ class TestModelTagTaxonomy(TestTagTaxonomyMixin, TestCase):

    def test_system_defined(self):
        assert not self.taxonomy.system_defined
-        assert self.system_taxonomy.system_defined
+        assert self.system_taxonomy.cast().system_defined


Needing to remember this .cast is likely to be a problem in the future.

My recommendation (for you to consider in the future!) is that you add a decorator like @auto_cast to the base class properties like .system_defined that will detect when the developer forgot to .cast() and either auto-cast or throw an exception. Probably the latter to be more predictable.

bradenmacdonald

OK, so: I don't think there's a good justification for ModelObjectTag and I'd like to see you either prove that it's important to the design despite the complexity it adds, or just remove it. I think everything else we've landed on makes sense.

You can merge this now and remove ModelObjectTag soon in a separate PR, or remove it now, or justify why we want to keep it.

bradenmacdonald · 2023-07-26T17:39:48Z

One thing to think about for the future:

I think that our conceptual model may be missing a key piece: a model to track how a Taxonomy should be used.

Right now, there's no place to say "I want to use the Language Taxonomy to tag Content Objects belonging to X organization" or "I want users from org X to use OrgXTaxonomy to tag Course objects belonging to the organization" or "I want users from org Y to use the SpokenLanguage taxonomy to tag learners".

If we had a TaxonomyUseCase (?) model, it could specify who, what, why, where -> which org/users can use which taxonomy for which purpose to tag what type of objects using which ObjectTag subclass.

That would solve the weird problem now where if you use LanguageSystemTaxonomy to tag an XBlock, it gets a plain ObjectTag, but if you use an org's closed taxonomy to tag the same XBlock, it gets a ContentObjectTag. I think both should be using ContentObjectTag if they're both tagging content objects, but currently you have to specify the ObjectTag subclass at the Taxonomy level, not as part of defining a particular use case for the taxonomy.

Anyhow, I'm not saying to do it this way or to launch a major refactor. Just keep this idea in mind while we're iterating the models in the future, in case it resonates with you or inspires something.

openedx_tagging/core/tagging/models/base.py

ChrisChV · 2023-07-27T19:05:40Z

You can merge this now and remove ModelObjectTag soon in a separate PR, or remove it now, or justify why we want to keep it.

@bradenmacdonald I also agree that it should be removed. @pomegranited added it to do it in a separate PR: #67 (comment)

That would solve the weird problem now where if you use LanguageSystemTaxonomy to tag an XBlock, it gets a plain ObjectTag, but if you use an org's closed taxonomy to tag the same XBlock, it gets a ContentObjectTag. I think both should be using ContentObjectTag if they're both tagging content objects, but currently you have to specify the ObjectTag subclass at the Taxonomy level, not as part of defining a particular use case for the taxonomy.

We are currently solving this by creating subclasses for tagging content. But yes, I agree that there must be a simpler way to do it.

bradenmacdonald · 2023-07-31T17:47:22Z

We are currently solving this by creating subclasses for tagging content. But yes, I agree that there must be a simpler way to do it.

@ChrisChV OK, somehow I really didn't understand that until now. So to tag content with the (built-in, system defined) language Taxonomy, we cast it to ContentLanguageTaxonomy which is still the same taxonomy, but we now have a different "view" of that Taxonomy which is now using ContentObjectTag to tag objects? Is that right? I guess that works, and I'm fine with that for now, but it seems like a strange use of subclassing to me, as there's nothing really different about ContentLanguageTaxonomy vs LanguageTaxonomy except for what type of ObjectTag it applies, and that only affects the add_tag action, not what the overall Taxonomy is. I thought that just using tag_content_object() would be enough to apply a ContentObjectTag instead of an ObjectTag.

Like if you had a Person class and the person is signing autographs, and now you want the person to sign autographs using blue ink instead of black ink, it makes much more sense to say Person.sign_autograph(book_to_sign, using_ink=Ink.Blue) than to say Person.cast_to(BlueInkUsingPerson).sign_autograph(book_to_sign).

I have already approved this PR, but I just noticed there is no test that actually applies some system-defined tags to content objects. Shouldn't we add that in?

ChrisChV · 2023-07-31T22:11:04Z

@bradenmacdonald Thanks for your feedback!

OK, somehow I really didn't understand that until now. So to tag content with the (built-in, system defined) language Taxonomy, we cast it to ContentLanguageTaxonomy which is still the same taxonomy, but we now have a different "view" of that Taxonomy which is now using ContentObjectTag to tag objects? Is that right?

Yes, it is.

I have already approved this PR, but I just noticed there is no test that actually applies some system-defined tags to content objects. Shouldn't we add that in?

Yes, sorry, I added the system defined to the api test, that includes tag objects: 034726d

pomegranited · 2023-08-01T08:14:41Z

Sorry @ChrisChV one last change request: could you bump the version to "0.1.1"? We'll tag it after merging to produce a new release.

openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Jul 21, 2023

ChrisChV and others added 15 commits July 21, 2023 10:38

feat: System-defined taxonomies

ac21e1c

feat: Language and Author taxonomies

55636f7

feat: Generic system defined object tags and Language object tag added

327bb58

chore: get_tags_query_set added to LanguageObjectTag

58fc1e8

chore: Adding _validate_taxonomy function to all system defined objec…

9e7ed4e

…t tags

chore: Updating system_defined_taxonomy_id

899d61f

refactor: consolidates ObjectTag validation

1e68687

feat: System defined taxonomies subclasses

9e5d854

style: linting and formatting

9b72fec

refactor: use negative numbers as primary keys for system taxonomies …

69eeece

…and tags

refactor: LanguageTaxonomy overrides get_tags

de3cc3e

removing the need for _get_tags_query_set

docs: System taxonomy creation doc updated with Dynamic tags approach

437ae98

style: Updating comments

8ca16ea

style: Separating the models into base and system defined

0c74bb4

ChrisChV force-pushed the chris/system-defined-taxonomies branch from 3d23976 to 0c74bb4 Compare July 21, 2023 15:40

ChrisChV added 3 commits July 21, 2023 10:53

fix: Update language fixture to negative pk

9a38047

feat: Instance validations on ModelSystemDefinedTaxonomy

eaaba78

pomegranited approved these changes Jul 24, 2023

View reviewed changes

This was referenced Jul 24, 2023

feat: System defined taxonomies open-craft/edx-platform#551

Closed

System defined taxonomies open-craft/openedx-learning#3

Closed

pomegranited added 3 commits July 24, 2023 12:26

feat: use Taxonomy.cast and ObjectTag.cast in rules

bfc532a

to ensure that we're using the correct Taxonomy and ObjectTag class types

fix: adds unique_together

c54cdda

and fixes nit

fix: indexes

4f51683

pomegranited requested review from ormsbee and bradenmacdonald July 24, 2023 21:53

fix: Model pk validation on Model taxonomy

a29d227

bradenmacdonald reviewed Jul 25, 2023

View reviewed changes

bradenmacdonald reviewed Jul 26, 2023

View reviewed changes

openedx_tagging/core/tagging/models/system_defined.py Show resolved Hide resolved

bradenmacdonald reviewed Jul 26, 2023

View reviewed changes

bradenmacdonald approved these changes Jul 26, 2023

View reviewed changes

pomegranited reviewed Jul 26, 2023

View reviewed changes

openedx_tagging/core/tagging/models/base.py Show resolved Hide resolved

This was referenced Jul 26, 2023

[Tagging] Make ObjectTag.object_id case-sensitive openedx/modular-learning#84

Closed

[Tagging] Clean up Tagging models openedx/modular-learning#85

Closed

style: comments and style

a34d401

ChrisChV mentioned this pull request Jul 27, 2023

System defined taxonomies openedx/edx-platform#32869

Merged

ChrisChV added 3 commits July 31, 2023 15:56

feat: Creating language taxonomy on fixtures

4b56ec8

test: Added system defined to api tests

034726d

Merge branch 'main' into chris/system-defined-taxonomies

8f1cac1

ChrisChV added 2 commits August 1, 2023 11:12

test: Fixing tests after merge with main

cde323e

chore: Package version update

8a9d241

bradenmacdonald approved these changes Aug 1, 2023

View reviewed changes

pomegranited mentioned this pull request Aug 1, 2023

feat: add taxonomies for org api openedx/edx-platform#32871

Merged

pomegranited merged commit 04225e7 into openedx:main Aug 2, 2023

pomegranited deleted the chris/system-defined-taxonomies branch August 2, 2023 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System defined taxonomies #67

System defined taxonomies #67

ChrisChV commented Jul 21, 2023 •

edited

Loading

openedx-webhooks commented Jul 21, 2023

pomegranited left a comment

pomegranited Jul 23, 2023

bradenmacdonald Jul 25, 2023

bradenmacdonald Jul 25, 2023

pomegranited commented Jul 24, 2023

bradenmacdonald left a comment

bradenmacdonald Jul 25, 2023

pomegranited Jul 26, 2023

pomegranited Jul 26, 2023

bradenmacdonald Jul 26, 2023

pomegranited Jul 26, 2023

bradenmacdonald Jul 25, 2023

pomegranited Jul 26, 2023 •

edited

Loading

pomegranited Jul 26, 2023

bradenmacdonald Jul 25, 2023

pomegranited Jul 26, 2023 •

edited

Loading

ChrisChV Jul 26, 2023

bradenmacdonald Jul 26, 2023

pomegranited Jul 26, 2023 •

edited

Loading

bradenmacdonald Jul 26, 2023

bradenmacdonald left a comment

bradenmacdonald commented Jul 26, 2023 •

edited

Loading

ChrisChV commented Jul 27, 2023

bradenmacdonald commented Jul 31, 2023

ChrisChV commented Jul 31, 2023

pomegranited commented Aug 1, 2023

		@@ -431,6 +447,7 @@ class ObjectTag(models.Model):
		id = models.BigAutoField(primary_key=True)
		object_id = case_insensitive_char_field(

System defined taxonomies #67

System defined taxonomies #67

Conversation

ChrisChV commented Jul 21, 2023 • edited Loading

Description

Testing instructions

Other information

openedx-webhooks commented Jul 21, 2023

pomegranited left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pomegranited commented Jul 24, 2023

bradenmacdonald left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pomegranited Jul 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pomegranited Jul 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pomegranited Jul 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bradenmacdonald left a comment

Choose a reason for hiding this comment

bradenmacdonald commented Jul 26, 2023 • edited Loading

ChrisChV commented Jul 27, 2023

bradenmacdonald commented Jul 31, 2023

ChrisChV commented Jul 31, 2023

pomegranited commented Aug 1, 2023

ChrisChV commented Jul 21, 2023 •

edited

Loading

pomegranited Jul 26, 2023 •

edited

Loading

pomegranited Jul 26, 2023 •

edited

Loading

pomegranited Jul 26, 2023 •

edited

Loading

bradenmacdonald commented Jul 26, 2023 •

edited

Loading