-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: PyPDFToDocument
correctly serializes custom converters, deprecate DefaultConverter
#8430
Conversation
Pull Request Test Coverage Report for Build 11127151741Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Thanks, @shadeMe The more detailed errors will also be a great help to the users! |
…ate `DefaultConverter`
e16058d
to
0754486
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this so quickly and thoroughly! The changes look very good to me. The naming of the util methods too. Only suggestion I have is to add tests for the new util methods auto_serialize_class_instance
and auto_deserialize_class_instance
too. What do you think @silvanocerza ?
I would suggest to also update the class docstring "If no converter is provided, uses a default text extraction converter implementation." or something like that here. Makes it more clear that there is no default converter anymore.
@shadeMe I'm probably missing some context. I understand that there are some serialization issues. But why did we decide to deprecate |
Because it honestly didn't have a good reason to exist outside the component, which was primarily the reason why the serialization bug crept in. |
@julian-risch My only concern about the methods is the
@anakin87 Mainly the assumption that converters don't need state most of the times so they can be simple functions. Though I'm unsure about that, I kinda remember that converters can be configured and if we treat them as callables we'd lose that possibility. I remember that you briefly worked on this to change the converter backend so I thought you'd know more if that's the case or not. |
Now I better understand the motivation and am OK with deprecating/removing In general, this component has always seemed a bit tricky to me from the UX point of view, and I would be happy if we improve it in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
Test failure is unrelated; merging. |
…ate `DefaultConverter` (#8430) * fix: `PyPDFToDocument` correctly serializes custom converters, deprecate `DefaultConverter` * Remove `auto` prefix from serde util function names, add unit tests
Proposed Changes:
The
PyPDFToDocument
component was incorrectly serializing its default converter. This PR fixes it and deprecates the latter.Utility methods were added to aid the serde of custom classes that implement
from_dict
andto_dict
methods.How did you test it?
Unit tests
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.