Change return type of File Converters to `List[Document]` #1859

bogdankostic · 2021-12-08T09:25:23Z

Currently, the file converters are returning List[Dict]. We should make use of our Document primitive here and return List[Document] instead.

The text was updated successfully, but these errors were encountered:

tholor · 2021-12-16T15:48:28Z

Same refactoring is needed for the other nodes in our indexing pipeline (e.g. Preprocessor).

When working on this, we should take into account that we create the document's id at the moment when we instantiate the Document. If this now happens within different nodes, we probably need to add a param there to control the way how these ids are created (default: hash of content). For example, in DC we want to create the ID based on content + a pipeline_id. This can be done via id_hash_keys parameter in Document.init().

Ping @ArzelaAscoIi when kicking the work off here.

bogdankostic added the topic:file_converter label Dec 8, 2021

julian-risch assigned bogdankostic Mar 2, 2022

bogdankostic mentioned this issue Mar 21, 2022

Change return types of indexing pipeline nodes #2342

Merged

bogdankostic closed this as completed in #2342 Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change return type of File Converters to `List[Document]` #1859

Change return type of File Converters to `List[Document]` #1859

bogdankostic commented Dec 8, 2021

tholor commented Dec 16, 2021

Change return type of File Converters to List[Document] #1859

Change return type of File Converters to List[Document] #1859

Comments

bogdankostic commented Dec 8, 2021

tholor commented Dec 16, 2021

Change return type of File Converters to `List[Document]` #1859

Change return type of File Converters to `List[Document]` #1859