Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Tag Models [FC-0030] #87

Merged
merged 13 commits into from
Oct 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 51 additions & 44 deletions docs/decisions/0012-system-taxonomy-creation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@
Context
--------

System-defined taxonomies are taxonomies created by the system. Some of these are totally static (e.g Language)
and some depends on a core data model (e.g. Organizations). It is necessary to define how to create and validate
the System-defined taxonomies and their tags.
System-defined taxonomies are taxonomies created by the system. Some of these
depend on Django settings (e.g. Languages) and others depends on a core data
model (e.g. Organizations or Users). It is necessary to define how to create and
validate the System-defined taxonomies and their tags.


Decision
Expand All @@ -15,44 +16,50 @@ Decision
System Tag lists and validation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each System-defined Taxonomy will have its own ``ObjectTag`` subclass which is used for tag validation (e.g. ``LanguageObjectTag``, ``OrganizationObjectTag``).
Each subclass can overwrite ``get_tags``; to configure the valid tags, and ``is_valid``; to check if a list of tags are valid. Both functions are implemented on the ``ObjectTag`` base class, but can be overwritten to handle special cases.

We need to create an instance of each System-defined Taxonomy in a fixture. With their respective characteristics and subclasses.
The ``pk`` of these instances must be negative so as not to affect the auto-incremented ``pk`` of Taxonomies.

Later, we need to create content-side ObjectTags that live on ``openedx.features.content_tagging`` for each content and taxonomy to be used (eg. ``CourseLanguageObjectTag``, ``CourseOrganizationObjectTag``).
This new class is used to configure the automatic content tagging. You can read the `document number 0013`_ to see this configuration.

Tags creation
~~~~~~~~~~~~~~

We have two ways to handle Tags creation and validation for System-defined Taxonomies:

**Hardcoded by fixtures/migrations**

#. If the tags don't change over the time, you can create all on a fixture (e.g Languages).
The ``pk`` of these instances must be negative.
#. If the tags change over the time, you can create all on a migration. If you edit, delete, or add new tags, you should also do it in a migration.

**Dynamic tags**

Closed Taxonomies that depends on a core data model. Ex. AuthorTaxonomy with Users as Tags

#. Tags are created on the fly when new ObjectTags are added.
#. Tag.external_id we store an identifier from the instance (eg. User.pk).
#. Tag.value we store a human readable representation of the instance (eg. User.username).
#. Resync the tags to re-fetch the value.


Rejected Options
-----------------

Free-form tags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Open Taxonomy that depends on a core data model, but simplifies the creation of Tags by allowing free-form tags,

Rejected because it has been seen that using dynamic tags provides more functionality and more advantages.

.. _document number 0013: https://github.com/openedx/openedx-learning/blob/main/docs/decisions/0013-system-taxonomy-auto-tagging.rst
Each Taxonomy has two methods for validating tags:
#. ``validate_value``
#. ``validate_external_id``

These functions will return ``True`` if a given tag is valid, based on its
external ID or value. Subclasses should override these as needed, to implement
different types of taxonomy behavior (e.g. based on a model or baed on Django
settings).

For example ``validate_value("English")`` will return ``True`` for the language
taxonomy if the English language is enabled in the Django settings. Likewise,
``validate_external_id("en")`` would return true, but
``validate_external_id("zz")`` would be ``False`` because there is no such
language. Or, for a User taxonomy, ``validate_value("username")`` would return
``True`` if a user with that username exists, or ``validate_external_id(...)``
could validate if a user with that ID exists (note that the ID must be converted
to a string).

In all of these cases, a ``Tag`` instance may or may not exist in the database.
Before saving an ``ObjectTag`` which references a tag in these taxonomies, the
tagging API will use either ``Taxonomy.tag_for_value`` or
``Taxonomy.tag_for_external_id``. These methods are responsible for both
validating the tag (like ``validate_...``) but also auto-creating the ``Tag``
instance in case it doesn't already exist. Subclasses should override these as
needed.

In this way, the system-defined taxonomies are fully dynamic and can represent
tags based on Languages, Users, or Organizations that may exist in large numbers
or be constantly created.

At present, there isn't a good way to *list* all of the [potential] tags that
exist in a system-defined Taxonomy. We may add an API for that in the future,
for example to list all of the available languages. However for other cases like
users it doesn't make sense to even try to list all of the available tags. So
for now, the assumption is that the UI will not even try to display a list of
available tags for system-defined taxonomies. After all, system-defined tags are
usually applied automatically, rather than a user manually selecting from a
list. If there is a need to show a list of tags to the user, use the API that
lists the actually applied tags - i.e. the values of the ``ObjectTag``s
currently applied to objects using the taxonomy.

Tags hard-coded by fixtures/migrations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In the future there may be system-defined taxonomies that are not dynamics at
all, where the list of tags are defined by ``Tag`` instances created by a
fixture or migration. However, as of now we don't have a use case for that.
2 changes: 1 addition & 1 deletion openedx_learning/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""
Open edX Learning ("Learning Core").
"""
__version__ = "0.1.8"
__version__ = "0.2.0"
143 changes: 118 additions & 25 deletions openedx_tagging/core/tagging/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,15 @@
"""
from __future__ import annotations

from typing import Iterator

from django.db.models import QuerySet
from django.db import transaction
from django.db.models import F, QuerySet
from django.utils.translation import gettext_lazy as _

from .models import ObjectTag, Tag, Taxonomy

# Export this as part of the API
TagDoesNotExist = Tag.DoesNotExist


def create_taxonomy(
name: str,
Expand Down Expand Up @@ -46,11 +48,11 @@ def create_taxonomy(
return taxonomy.cast()


def get_taxonomy(id: int) -> Taxonomy | None:
def get_taxonomy(taxonomy_id: int) -> Taxonomy | None:
"""
Returns a Taxonomy cast to the appropriate subclass which has the given ID.
"""
taxonomy = Taxonomy.objects.filter(id=id).first()
taxonomy = Taxonomy.objects.filter(pk=taxonomy_id).first()
return taxonomy.cast() if taxonomy else None


Expand Down Expand Up @@ -139,21 +141,18 @@ def resync_object_tags(object_tags: QuerySet | None = None) -> int:


def get_object_tags(
object_id: str, taxonomy_id: str | None = None
object_id: str,
taxonomy_id: int | None = None,
object_tag_class: type[ObjectTag] = ObjectTag
) -> QuerySet[ObjectTag]:
"""
Returns a Queryset of object tags for a given object.

Pass taxonomy to limit the returned object_tags to a specific taxonomy.
Pass taxonomy_id to limit the returned object_tags to a specific taxonomy.
"""
ObjectTagClass = ObjectTag
extra_filters = {}
if taxonomy_id is not None:
taxonomy = Taxonomy.objects.get(pk=taxonomy_id)
ObjectTagClass = taxonomy.object_tag_class
extra_filters["taxonomy_id"] = taxonomy_id
filters = {"taxonomy_id": taxonomy_id} if taxonomy_id else {}
tags = (
ObjectTagClass.objects.filter(object_id=object_id, **extra_filters)
object_tag_class.objects.filter(object_id=object_id, **filters)
.select_related("tag", "taxonomy")
.order_by("id")
)
Expand All @@ -164,30 +163,103 @@ def delete_object_tags(object_id: str):
"""
Delete all ObjectTag entries for a given object.
"""
tags = ObjectTag.objects.filter(
object_id=object_id,
)
tags = ObjectTag.objects.filter(object_id=object_id)

tags.delete()


# TODO: a function called "tag_object" should take "object_id" as its first parameter, not taxonomy
def tag_object(
taxonomy: Taxonomy,
tags: list[str],
object_id: str,
) -> list[ObjectTag]:
object_tag_class: type[ObjectTag] = ObjectTag,
) -> None:
"""
Replaces the existing ObjectTag entries for the given taxonomy + object_id with the given list of tags.
Replaces the existing ObjectTag entries for the given taxonomy + object_id
with the given list of tags.

tags: A list of the values of the tags from this taxonomy to apply.

If taxonomy.allows_free_text, then the list should be a list of tag values.
Otherwise, it should be a list of existing Tag IDs.
object_tag_class: Optional. Use a proxy subclass of ObjectTag for additional
validation. (e.g. only allow tagging certain types of objects.)

Raised ValueError if the proposed tags are invalid for this taxonomy.
Preserves existing (valid) tags, adds new (valid) tags, and removes omitted (or invalid) tags.
Raised Tag.DoesNotExist if the proposed tags are invalid for this taxonomy.
Preserves existing (valid) tags, adds new (valid) tags, and removes omitted
(or invalid) tags.
"""
return taxonomy.cast().tag_object(tags, object_id)

def _check_new_tag_count(new_tag_count: int) -> None:
"""
Checks if the new count of tags for the object is equal or less than 100
"""
# Exclude self.id to avoid counting the tags that are going to be updated
current_count = ObjectTag.objects.filter(object_id=object_id).exclude(taxonomy_id=taxonomy.id).count()

if current_count + new_tag_count > 100:
raise ValueError(
_(f"Cannot add more than 100 tags to ({object_id}).")
)

if not isinstance(tags, list):
raise ValueError(_(f"Tags must be a list, not {type(tags).__name__}."))

ObjectTagClass = object_tag_class
taxonomy = taxonomy.cast() # Make sure we're using the right subclass. This is a no-op if we are already.
tags = list(dict.fromkeys(tags)) # Remove duplicates preserving order

_check_new_tag_count(len(tags))

if not taxonomy.allow_multiple and len(tags) > 1:
raise ValueError(_(f"Taxonomy ({taxonomy.name}) only allows one tag per object."))

if taxonomy.required and len(tags) == 0:
raise ValueError(
_(f"Taxonomy ({taxonomy.id}) requires at least one tag per object.")
)

current_tags = list(
ObjectTagClass.objects.filter(taxonomy=taxonomy, object_id=object_id)
)
updated_tags = []
if taxonomy.allow_free_text:
for tag_value in tags:
object_tag_index = next((i for (i, t) in enumerate(current_tags) if t.value == tag_value), -1)
if object_tag_index >= 0:
# This tag is already applied.
object_tag = current_tags.pop(object_tag_index)
else:
object_tag = ObjectTagClass(taxonomy=taxonomy, object_id=object_id, _value=tag_value)
updated_tags.append(object_tag)
else:
# Handle closed taxonomies:
for tag_value in tags:
tag = taxonomy.tag_for_value(tag_value) # Will raise Tag.DoesNotExist if the value is invalid.
object_tag_index = next((i for (i, t) in enumerate(current_tags) if t.tag_id == tag.id), -1)
if object_tag_index >= 0:
# This tag is already applied.
object_tag = current_tags.pop(object_tag_index)
if object_tag._value != tag.value: # pylint: disable=protected-access
# The ObjectTag's cached '_value' is out of sync with the Tag, so update it:
object_tag._value = tag.value # pylint: disable=protected-access
updated_tags.append(object_tag)
else:
# We are newly applying this tag:
object_tag = ObjectTagClass(taxonomy=taxonomy, object_id=object_id, tag=tag)
updated_tags.append(object_tag)

# Save all updated tags at once to avoid partial updates
with transaction.atomic():
# delete any omitted existing tags. We do this first to reduce chances of UNIQUE constraint edge cases
for old_tag in current_tags:
old_tag.delete()
# add the new tags:
for object_tag in updated_tags:
object_tag.full_clean() # Run validation
object_tag.save()


# TODO: return tags from closed taxonomies as well as the count of how many times each is used.
def autocomplete_tags(
taxonomy: Taxonomy,
search: str,
Expand Down Expand Up @@ -228,4 +300,25 @@ def autocomplete_tags(
"using get_tags() and filtering them on the frontend."
)
)
return taxonomy.cast().autocomplete_tags(search, object_id)
# Fetch tags that the object already has to exclude them from the result
excluded_tags: list[str] = []
if object_id:
excluded_tags = list(
taxonomy.objecttag_set.filter(object_id=object_id).values_list(
"_value", flat=True
)
)
return (
# Fetch object tags from this taxonomy whose value contains the search
taxonomy.objecttag_set.filter(_value__icontains=search)
# omit any tags whose values match the tags on the given object
.exclude(_value__in=excluded_tags)
# alphabetical ordering
.order_by("_value")
# Alias the `_value` field to `value` to make it nicer for users
.annotate(value=F("_value"))
# obtain tag values
.values("value", "tag_id")
# remove repeats
.distinct()
)
Original file line number Diff line number Diff line change
Expand Up @@ -1296,3 +1296,4 @@
allow_multiple: false
allow_free_text: false
visible_to_authors: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language taxonomy creates tags as needed. We don't actually need the fixture anymore, though I haven't decided yet if it makes sense to delete it or not.

I agree that we don't need the fixture to create the _tags_that are found in the language taxonomy -- so we could delete lines 1-1288 here.

However, we still need lines 1289+ in this fixture to create the Language taxonomy itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry I wasn't clear. We can delete the languages, but we still need to create the taxonomy. However, if it's just creating a single object, I would prefer to move that to be a data migration instead of a fixture, to keep things simple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha.. data migration is fine with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we remove the fixture, we can also remove openedx_tagging/core/tagging/management/commands/.

_taxonomy_class: openedx_tagging.core.tagging.models.system_defined.LanguageTaxonomy
2 changes: 1 addition & 1 deletion openedx_tagging/core/tagging/import_export/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ def _import_export_validations(taxonomy: Taxonomy):
if taxonomy.allow_free_text:
raise NotImplementedError(
_(
f"Import/export for free-form taxonomies will be implemented in the future."
"Import/export for free-form taxonomies will be implemented in the future."
)
)
if taxonomy.system_defined:
Expand Down
32 changes: 32 additions & 0 deletions openedx_tagging/core/tagging/migrations/0010_cleanups.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Generated by Django 3.2.19 on 2023-09-29 16:59

import django.db.models.expressions
from django.db import migrations, models

import openedx_learning.lib.fields


class Migration(migrations.Migration):

dependencies = [
('oel_tagging', '0009_alter_objecttag_object_id'),
]

operations = [
migrations.DeleteModel(
name='ModelObjectTag',
),
migrations.DeleteModel(
name='UserModelObjectTag',
),
migrations.AlterUniqueTogether(
name='objecttag',
unique_together={('object_id', 'taxonomy', 'tag_id'), ('object_id', 'taxonomy', '_value')},
),
# ObjectTag.Tag can be blank
migrations.AlterField(
model_name='objecttag',
name='tag',
field=models.ForeignKey(blank=True, default=None, help_text="Tag associated with this object tag. Provides the tag's 'value' if set.", null=True, on_delete=django.db.models.deletion.SET_NULL, to='oel_tagging.tag'),
),
]
5 changes: 4 additions & 1 deletion openedx_tagging/core/tagging/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
"""
Core models for Tagging
"""
from .base import ObjectTag, Tag, Taxonomy
from .import_export import TagImportTask, TagImportTaskState
from .system_defined import LanguageTaxonomy, ModelObjectTag, ModelSystemDefinedTaxonomy, UserSystemDefinedTaxonomy
from .system_defined import LanguageTaxonomy, ModelSystemDefinedTaxonomy, UserSystemDefinedTaxonomy
Loading
Loading