Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: skip unknown file types [DC-1108] #7542

Merged
merged 5 commits into from
Apr 12, 2024
Merged

Conversation

faymarie
Copy link
Contributor

@faymarie faymarie commented Apr 12, 2024

Related Issues

Proposed Changes:

Allow skipping of unspecified file types in FileClassifier using a newly introduced flag raise_on_error. If raise_on_error is set to False, the file will be skipped (sent to a dead edge). Additionally, a warning is logged.

Use cases:
We want to enable deepset Cloud users:

  • to upload other file types than .txt/.pdf in their workspace and run pipelines independent of file types available in their workspace
  • schedule directories of files for indexing (containing different file types) without pre-cleaning and matching for specific pipelines

How did you test it?

unit test

Notes for the reviewer

Checklist

@github-actions github-actions bot added the type:documentation Improvements on the docs label Apr 12, 2024
@faymarie faymarie changed the title feat: skip unknown file types feat: skip unknown file types [DC-1108] Apr 12, 2024
@masci masci added this to the 1.25.3 milestone Apr 12, 2024
paths[0],
self.supported_types,
)
return None, None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we want to let unsupported file types run against a dead end.

@faymarie faymarie marked this pull request as ready for review April 12, 2024 10:03
@shadeMe shadeMe requested a review from vblagoje April 12, 2024 10:03
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @faymarie

@faymarie faymarie merged commit 8f6f4fc into v1.x Apr 12, 2024
93 checks passed
@faymarie faymarie deleted the feat/skip-unknown-file-types branch April 12, 2024 14:33
vblagoje pushed a commit that referenced this pull request Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants