-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import/export Taxonomy API functions #58
Conversation
Thanks for the pull request, @ChrisChV! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good @ChrisChV ! I added one comment about test coverage -- do you have time to address that as part of this task? Would be great if we could keep the coverage for this module at 100%.
But what's here LGTM: 👍
- I tested this by running the tests
- I read through the code and tests with care
-
I checked for accessibility issuesN/A -
Includes documentationN/A
@@ -205,3 +211,124 @@ def test_tag_object_invalid_tag(self): | |||
"course", | |||
) | |||
assert "Invalid object tag for taxonomy" in str(exc.exception) | |||
|
|||
def test_import_tags_csv(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some lines missing from the test coverage for import and export -- could you add tests for them too?
openedx_tagging/core/tagging/api.py 145 11 64 9 90% 185, 191, 206, 212->224, 215, 221-222, 224->exit, 287-289, 304, 310, 340->exit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks!
442cb64
to
c2e5099
Compare
@ChrisChV Until @pomegranited 's PR is merged, can you please change the "merge target" of this PR from |
openedx_tagging/core/tagging/api.py
Outdated
if taxonomy.allow_free_text: | ||
raise ValueError( | ||
_( | ||
f"Invalid taxonomy ({taxonomy.id}): You can't import free-from tags taxonomies" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording is a bit confusing. I'd suggest: "You cannot import into a free-form taxonomy."
openedx_tagging/core/tagging/api.py
Outdated
) | ||
) | ||
tags_data = list(csv_reader) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using if format not in TaxonomyDataFormat.__members__.values():
above and assuming here that there are only two valid options, it's better to use
elif format == TaxonomyDataFormat.JSON:
...
else:
raise ValueError(_(f"Invalid format: {format}"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bradenmacdonald I think this is valid on the import. In the export I have left the verification at the beginning before loading the tags
openedx_tagging/core/tagging/api.py
Outdated
if tag_id not in new_tags and tag_id not in updated_tags: | ||
try: | ||
# Update tag | ||
tag_instance = Tag.objects.get(external_id=tag_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to use taxonomy=taxonomy
here, or you can have cross-taxonomy bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow.. I should have seen that!
openedx_tagging/core/tagging/api.py
Outdated
# if there is no parent in the data import | ||
tag_instance.parent = None | ||
updated_tags.append(tag_id) | ||
except ObjectDoesNotExist: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use Tag.DoesNotExist
instead of the generic base class.
openedx_tagging/core/tagging/api.py
Outdated
return tag_instance | ||
else: | ||
# Returns the created/updated tag from history | ||
return Tag.objects.get(external_id=tag_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, needs taxonomy=taxonomy
. The same external_id
can appear in multiple taxonomies.
BTW if you add something like this to the Taxonomy
class:
@property
def tags(self):
return Tag.objects.filter(taxonomy=self)
then this code can become return taxonomy.tags.get(external_id=tag_id)
😎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already taxonomy.tag_set
, so I don't think we need another property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, duh... not sure why I forgot that. But yes, use that :)
a10c6f6
to
c0e5a48
Compare
@bradenmacdonald I have not found the way to change |
@ChrisChV Oh, I see. In that case, what you'd have to do is push Jill's branch to the |
c0e5a48
to
ebda011
Compare
@ormsbee This is ready for your review |
ebda011
to
455e783
Compare
a043b41
to
e5b1698
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisChV I found a couple of nits, but otherwise good 👍
- I tested this by running the tests locally and ensuring 100% coverage.
- I read through the code and tests, and ensured that the tests cover our use cases.
-
I checked for accessibility issuesN/A - Includes documentation
- Commit structure follows OEP-0051
from django.utils.translation import gettext_lazy as _ | ||
|
||
from .models import ObjectTag, Tag, Taxonomy | ||
|
||
csv_fields = ['id', 'name', 'parent_id', 'parent_name'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would like to make it clear that this is a constant, and it's not part of the externally-exportable python api:
csv_fields = ['id', 'name', 'parent_id', 'parent_name'] | |
_CSV_FIELDS = ['id', 'name', 'parent_id', 'parent_name'] |
tagging_api.resync_object_tags([object_tag]) | ||
object_tag = ObjectTag.objects.get(object_id=object_id) | ||
self.assertEqual(object_tag.tag.value, 'Bacteria') | ||
self.assertEqual(object_tag._value, 'Bacteria') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pylint doesn't like this, but I agree it's necessary for the test. So could you please add this here and in the two other mentions below?
self.assertEqual(object_tag._value, 'Bacteria') | |
self.assertEqual(object_tag._value, 'Bacteria') # pylint: disable=protected-access |
if replace: | ||
taxonomy.tag_set.exclude(external_id__in=updated_tags).delete() | ||
|
||
resync_object_tags(ObjectTag.objects.filter(taxonomy=taxonomy)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a large operation.. so I don't think it should sit under the same atomic operation. Can you bump it out a level?
resync_object_tags(ObjectTag.objects.filter(taxonomy=taxonomy)) | |
resync_object_tags(ObjectTag.objects.filter(taxonomy=taxonomy)) |
json_result = { | ||
'name': taxonomy.name, | ||
'description': taxonomy.description, | ||
'tags': result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
'tags': result | |
'tags': result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started writing per-line reviews, but then I thought it best to step back a bit.
Import is going to be one of those places where the complexity is going to grow a lot as time goes on. We already know that we're going to have to support a variety of modifications in the future–consolidating tags, renaming tags, etc.
We're clearly not going to implement that functionality yet. But the fact that this is going to grow a lot means that we should be really disciplined and explicit about the stages of data transformation that we have here. We also have to keep the code as painfully simple and easy to test as possible.
What I expect to see as a top level import function would be something fairly short that that calls out to other classes/functions to implement something like the following steps, where the output of one step is the input into the next:
- Parse the raw input into a DSL of actions (e.g. CreateOrUpdate, Delete, Rename, etc.).
- Validate that the parsed set of actions is internally consistent (e.g. we're not adding a Tag as a parent of itself).
- Validate that the actions make sense against the Taxonomy data being operated on (requires reading data form our Taxonomy).
- Create a plan for the operations we'll need to do (because preview functionality is really nice to have when there's no revert button).
- Execute that plan.
Each of those steps should be separately testable. We should be able to have a CSVParser and a JSONParser and have the inputs and outputs of their tests only be the raw data and the actions they generate.
We should be able to have a set of actions, and test the function that validates them.
When we chain things together, we should be able to set up a database state and a list of DSL statements and test the plan that we generate.
Right now, all of these concerns are mixed together in the import_tags
. This will make it harder to debug as time goes on, and encourage the use of side-effects to effectively "message" later parts of the pipeline with dicts built earlier on.
These may not seem like immediate problems, but if we make it so that the easiest way to add new features is to add a couple dozen lines to import_tags
and make a new 50 line inner function there, things are going to get very confusing very quickly.
|
||
updated_tags = [] | ||
|
||
def create_update_tag(tag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to avoid inner functions this large and complex. The create_update_tag
function has interesting logic but it's harder to test separately because it's nested in here.
Closed in favor of #64 |
@ChrisChV Even though your pull request wasn’t merged, please take a moment to answer a two question survey so we can improve your experience in the future. |
Description
Implements import_tags and export_tags functions
Supporting information
Merge after #57.
Closes openedx/modular-learning#64
Testing instructions