Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Improve & Modularize Preprocessing #378

Closed
tholor opened this issue Sep 16, 2020 · 1 comment
Closed

[Epic] Improve & Modularize Preprocessing #378

tholor opened this issue Sep 16, 2020 · 1 comment
Labels
epic type:feature New feature or request
Milestone

Comments

@tholor
Copy link
Member

tholor commented Sep 16, 2020

Let's improve the preprocessing part in Haystack as it has substantial impact on the final accuracy and speed ...

Main Subtasks

  1. Restructure "indexing module" (Rename and restructure modules  #379)
  2. Let file converters return dicts including meta data (Refactor FileConverters to return dicts incl metadata  #380)
  3. Introduce Preprocessor class that does splitting and cleaning (Introduce Preprocessor class that does splitting and cleaning  #381)
  4. Improve splitting and cleaning functionalities (e.g. splits that respect sentence boundaries ...) (Better splitting and cleaning functions #382)
@tholor
Copy link
Member Author

tholor commented Oct 19, 2020

Finished all subtasks

@tholor tholor closed this as completed Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant