Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce lazy_import #5084

Merged
merged 17 commits into from
Jun 8, 2023
Merged

feat: introduce lazy_import #5084

merged 17 commits into from
Jun 8, 2023

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Jun 6, 2023

Related Issues

Proposed Changes:

  • Replace generalimport with lazy-imports
  • Add a small wrapper on top of it to be able to customize error messages.

How did you test it?

  • local import of Haystack
  • CI
  • The PDFToTextConverter import mechanism was explicitly tested and seems to be working fine.

Notes for the reviewer

lazy-imports is not as "magic" as generalimport, because it required the optional imports to be checked explicitly, and does not catch the imports globally by default. On the other hand, the library is way smaller, less magic, and more stable. At need, it could be easily included into Haystack itself.

Checklist

@coveralls
Copy link
Collaborator

coveralls commented Jun 6, 2023

Pull Request Test Coverage Report for Build 5209189448

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 419 unchanged lines in 6 files lost coverage.
  • Overall coverage increased (+0.03%) to 40.768%

Files with Coverage Reduction New Missed Lines %
nodes/file_converter/init.py 2 88.89%
nodes/file_converter/pdf.py 25 70.09%
utils/squad_data.py 44 53.25%
modeling/model/feature_extraction.py 50 35.92%
schema.py 146 63.52%
document_stores/sql.py 152 26.96%
Totals Coverage Status
Change from base Build 5202291866: 0.03%
Covered Lines: 9143
Relevant Lines: 22427

💛 - Coveralls

Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree lazy_import's api isn't that bad after all, I'm +1 to use it instead of generalimports. Left a couple of comments but overall it's good

haystack/document_stores/sql.py Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
haystack/document_stores/sql.py Outdated Show resolved Hide resolved
@ZanSara ZanSara requested a review from masci June 6, 2023 16:20
from haystack.nodes.file_converter.base import BaseConverter
from haystack.schema import Document
from haystack.lazy_imports import LazyImport

with LazyImport(message="Run 'pip install farm-haystack[pdf]' or 'pip install pymupdf'.") as fitz_import:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this - if the execution path is here, it means import fitz is working, as we check it here in the package init https://github.com/deepset-ai/haystack/pull/5084/files#diff-768944ba2fc553e20ba101bc350057d7287308033b4637aa117dcf4f09150b84R21

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but there's still the chance someone imports this class directly as from haystack.nodes.file_converter.pdf import PDFToTextConverter. While this is a weird way to import it, why not give a meaningful error message in this corner case? It doesn't cost us much and removes some "magic" from the failure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from haystack.nodes.file_converter.pdf import PDFToTextConverter would still trigger the code in haystack.nodes.file_converter.__init__

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I had a better look and we're both right and wrong 😄 We most likely don't need those lines and I'll remove them, but I have to re-work the import in __init__ to try-catch the first import. I was trying to implement in basically the opposite way, but this seems to work better.

@ZanSara ZanSara requested a review from masci June 7, 2023 08:11
@julian-risch
Copy link
Member

@ZanSara I am expecting this PR to also fix #5056

@ZanSara
Copy link
Contributor Author

ZanSara commented Jun 7, 2023

Btw I would like to take half a day to work through #4836 once for all... having all those messy imports is, well, a mess 😅 There are plenty of low-hanging fruit there that can be cleaned up fast.

Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One leftover and we 🚢

haystack/nodes/file_converter/pdf.py Outdated Show resolved Hide resolved
@masci masci merged commit 52e7a77 into main Jun 8, 2023
@masci masci deleted the lazy_import branch June 8, 2023 10:11
@julian-risch julian-risch mentioned this pull request Jun 8, 2023
1 task
masci added a commit that referenced this pull request Jun 15, 2023
* generalimport -> lazy-imports

* remove generalimport

* fix pdftotextconverter import check

* customize error messages

* pylint

* fix sql.py

* pylint

* Update haystack/document_stores/sql.py

Co-authored-by: Massimiliano Pippi <[email protected]>

* make contextmanager less verbose

* do not catch syntax errors

* review feedback

* Update haystack/nodes/file_converter/pdf.py

---------

Co-authored-by: Massimiliano Pippi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce lazy_import
4 participants