-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PP-149 Refactor Acquisition feeds and Annotators #1308
Conversation
8d74097
to
108c105
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1308 +/- ##
==========================================
+ Coverage 90.02% 90.15% +0.13%
==========================================
Files 204 219 +15
Lines 28051 30067 +2016
Branches 6458 6959 +501
==========================================
+ Hits 25252 27107 +1855
- Misses 1821 1893 +72
- Partials 978 1067 +89
☔ View full report in Codecov by Sentry. |
c0b54eb
to
2e2da92
Compare
b50811b
to
f70ec00
Compare
I'm just starting the code review on this, but one thing that I see immediately is it would be really nice if we had some type hints on all this new code. I know a lot of the logic is recycled from the existing classes, but since we're moving it around and cleaning it up, having type hints on it would make things a lot easier to understand. Ideally I'd like to see diff --git a/pyproject.toml b/pyproject.toml
index 6e503ae69..9e9d59a81 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -74,6 +74,7 @@ module = [
"api.adobe_vendor_id",
"api.circulation",
"api.integration.*",
+ "core.feed_protocol.*",
"core.integration.*",
"core.model.announcements",
"core.model.hassessioncache", |
I might just be misunderstanding the design here, but why use Pydantic over a normal Python dataclasses for the classes in |
One reason was to have validations, which ended up not being required (yet). |
c4b041b
to
f1f1b6b
Compare
@jonathangreen This has been implemented. |
Thanks @RishiDiwanTT. I'll code review the rest of this today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good Rishi, it is really great work to get this code untangled between feeds, annotators and serializers. It looks like it was a huge amount of work (almost 9000 lines!!) ⛰️.
I added some comments to the code as I went though it, most of them are just minor things that would be nice to get cleaned up before merging.
Before this is merged I'd like to figure out what we do with the old annotator and feed code. I'd prefer that we remove the old feed and annotator classes (including updating the database to remove the unused cache fields) as part of this PR, think they confuse things, now that we have the new classes that are part of this PR. However I'm open to handling this however you would like, if you want to remove them in a future ticket instead.
@jonathangreen I think I got everything, and rebased #1340 as well to allow simultaneous code reviews. |
@RishiDiwanTT I went through and resolved all the comments that were addressed. Looks like there are a couple more with some work left to do on them. The other thing is this comment:
How would you like to handle this? |
@jonathangreen Considering both PRs are still in-progress I would like to take this up as an additional ticket, since this may need some effort to ensure we're not breaking some dependant functionality still in use. |
0574016
to
45a2f09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple todo
comments in the new code, that I'd like to make sure we shouldn't sort out before merging this one.
identifier = entry.identifier or work.presentation_edition.primary_identifier | ||
|
||
permalink_uri, permalink_type = self.permalink_for(identifier) | ||
# TODO: Do not force OPDS types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we have all endpoints under OPDS 2.0 maybe we should keep the APIs as OPDS1.x and keep the TODO as a reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now what the todo is mentioning, it wasn't clear to me before. Can you maybe elaborate on the comment in the todo a bit more, so our future selves know what we are talking about here?
Okay looks like all the comments are resolved here. Since we are removing the old classes in a followup, can you make a jira ticket for that @RishiDiwanTT? |
Once #1340 gets merged into this, I think there at two things that we need to do before this one can get merged:
|
With an acquisition page feed This aims to remove all XML directives from within the feed and annotator workflows And also to simplify the workflow making it more linear and less circular Only a Serializer should write XML to the feed A hardcoded example has been set up in the /feed route
Along with LibraryAnnotator tests
… APIs and scripts Now the new feed classes control the XML generation
Implemented some pydantic dict() and iteration functionality for a smooth refactor
* Added acquisition link serialization for OPDS2 * OPDS2 serialization with acuqisition links * Loan and hold specific statuses for acquisition links * Content types come from the serializer now OPDS2 publication feed is now the same as the OPDS1 page feed, with a different serializer * Ensure test_url_for is only used during testing * Mimetypes are iterated in priority order before serialization Created a Serializer base for better typing and OOP
Allow /groups to also be OPDS2 if requested
38f4da5
to
bef06fe
Compare
Currently, please ignore all OPDS2 classes, they are just POCs.
Description
This aims to remove all XML directives from within the feed and annotator workflows And also to simplify the workflow making it more linear and less circular Only a Serializer should write XML to the feed.
The feed workflow should always be AcquisitionFeed -> Annotator -> Serializer, as opposed to the back and forth that was occurring earlier.
About 70% of the time that went into this PR refactoring tests and understanding the different paradigms used in the existing code.
This is not a complete rewrite, all the logic has been left as-is without any changes.
Feed Class
Only bothers with the outer shell of the feed, which is everything except the
publications
in OPDS2 and<entry>'s
in OPDS1.Eg. The facet links and other metadata.
Annotator Class
Populates the data required for each work entry and may also potentially add some additional metadata into the outer feed.
Serializer Class
Accepts pydantic models and outputs the OPDS feed in whichever format it is responsible for.
Pydantic Models
We use pydantic models (FeedEntryType's) to store the annotated information for the feed as well as for each entry.
The FeedEntryType model is the base model for any nestable attribute of the OPDS feeds.
Eg. If we have an acquisition link, we know this is nestable with indirectAcquisitions, so all links are inherited from the FeedEntryType models so they can have arbitrary information added into them.
The model fields
Work.simple_opds_entry
andWork.verbose_opds_entry
are not used in this implementation because the implementation is specifically not targetting XML.This means we no longer have an internal partial cache of XML entries per work record.
Motivation and Context
In order to allow all API feeds to be served in both OPDS1 and OPDS2 protocols all the XML generation had to be extricated from the Feed and Annotator classes, and pushed into a separate serializer class.
JIRA
How Has This Been Tested?
All the Feed and Annotator tests were copied and refactored to work with the new style classes.
All the Feeds and Annotators have unit tests that test the equivalence between the old and new style classes, thereby ensuring we are creating exactly the same feeds.
Manually tested whichever endpoints came to mind.
Checklist