Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve handling of disallowed formats #429

Merged
merged 6 commits into from
Dec 3, 2024
Merged

Conversation

cau-git
Copy link
Contributor

@cau-git cau-git commented Nov 25, 2024

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link

mergify bot commented Nov 25, 2024

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:

Copy link
Contributor

@vagenas vagenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cau-git

Regarding convert():
I see no need to touch the return type, we can just run next() in a try catch and in case of StopIteration raise a more user-friendly error instead (ideally some DoclingError we can define).

Regarding convert_all():
I would leave this untouched.

@dolfim-ibm
Copy link
Contributor

I also think we should not touch the return type, since the function have an explicit argument raises_on_error.

But it is still unclear to me what could/should be the return value when raises_on_error=False and the document is invalid?
Would a failed ConversionResult with the proper status, and the default _EMPTY_DOCLING_DOC be a good choice?

@vagenas
Copy link
Contributor

vagenas commented Nov 26, 2024

I also think we should not touch the return type, since the function have an explicit argument raises_on_error.

But it is still unclear to me what could/should be the return value when raises_on_error=False and the document is invalid? Would a failed ConversionResult with the proper status, and the default _EMPTY_DOCLING_DOC be a good choice?

Yes and we can introduce a new ConversionStatus.INVALID for that case.

cau-git and others added 4 commits November 26, 2024 21:03
Signed-off-by: Christoph Auer <[email protected]>
- Introduced new explicit exception types instead of `RuntimeError`
- Introduced new `ConversionStatus` value for unsupported formats
- Tidied up converter member typing & removed asserts

Signed-off-by: Panos Vagenas <[email protected]>
@vagenas vagenas force-pushed the cau/fix-stop-iteration branch from 495503f to 0bb1e20 Compare November 26, 2024 20:33
@@ -22,6 +22,7 @@ class ConversionStatus(str, Enum):
FAILURE = auto()
SUCCESS = auto()
PARTIAL_SUCCESS = auto()
UNSUPPORTED = auto()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this UNSUPPORTED_FORMAT to be a bit more explicit

Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make UNSUPPORTED into UNSUPPORTED_FORMAT to be a bit more explicit. All the rest is good!

@cau-git
Copy link
Contributor Author

cau-git commented Dec 2, 2024

We must agree on the naming, instead of UNSUPPORTED.

DROPPED
DISCARDED
SKIPPED # favourite

and add a reason that explains properly what went wrong as an ErrorItem in the errors log of ConversionResult.

@vagenas vagenas changed the title fix: Fixes and tests for StopIteration on .convert() fix: improve handling of disallowed formats Dec 2, 2024
@cau-git cau-git merged commit 34c7c79 into main Dec 3, 2024
9 checks passed
@cau-git cau-git deleted the cau/fix-stop-iteration branch December 3, 2024 11:45
ab-shrek pushed a commit to ab-shrek/docling that referenced this pull request Dec 6, 2024
* fix: Fixes and tests for StopIteration on .convert()

Signed-off-by: Christoph Auer <[email protected]>

* fix: Remove unnecessary case handling

Signed-off-by: Christoph Auer <[email protected]>

* fix: Other test fixes

Signed-off-by: Christoph Auer <[email protected]>

* improve handling of unsupported types

- Introduced new explicit exception types instead of `RuntimeError`
- Introduced new `ConversionStatus` value for unsupported formats
- Tidied up converter member typing & removed asserts

Signed-off-by: Panos Vagenas <[email protected]>

* robustify & simplify format option resolution

Signed-off-by: Panos Vagenas <[email protected]>

* rename new status, populate ConversionResult errors

Signed-off-by: Panos Vagenas <[email protected]>

---------

Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Co-authored-by: Panos Vagenas <[email protected]>
lucas-morin pushed a commit to lucas-morin/docling that referenced this pull request Dec 10, 2024
* fix: Fixes and tests for StopIteration on .convert()

Signed-off-by: Christoph Auer <[email protected]>

* fix: Remove unnecessary case handling

Signed-off-by: Christoph Auer <[email protected]>

* fix: Other test fixes

Signed-off-by: Christoph Auer <[email protected]>

* improve handling of unsupported types

- Introduced new explicit exception types instead of `RuntimeError`
- Introduced new `ConversionStatus` value for unsupported formats
- Tidied up converter member typing & removed asserts

Signed-off-by: Panos Vagenas <[email protected]>

* robustify & simplify format option resolution

Signed-off-by: Panos Vagenas <[email protected]>

* rename new status, populate ConversionResult errors

Signed-off-by: Panos Vagenas <[email protected]>

---------

Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Co-authored-by: Panos Vagenas <[email protected]>
cau-git added a commit that referenced this pull request Dec 17, 2024
* fix: Fixes and tests for StopIteration on .convert()

Signed-off-by: Christoph Auer <[email protected]>

* fix: Remove unnecessary case handling

Signed-off-by: Christoph Auer <[email protected]>

* fix: Other test fixes

Signed-off-by: Christoph Auer <[email protected]>

* improve handling of unsupported types

- Introduced new explicit exception types instead of `RuntimeError`
- Introduced new `ConversionStatus` value for unsupported formats
- Tidied up converter member typing & removed asserts

Signed-off-by: Panos Vagenas <[email protected]>

* robustify & simplify format option resolution

Signed-off-by: Panos Vagenas <[email protected]>

* rename new status, populate ConversionResult errors

Signed-off-by: Panos Vagenas <[email protected]>

---------

Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Co-authored-by: Panos Vagenas <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants