-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Rename FileExtensionRouter
to FileTypeRouter
, handle ByteStream(s)
#5998
Conversation
cc @masci I named this component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we change the name of the Python module to file_type_router.py
to stay consistent with the component name? (I know we have several discrepancies around, this is to not adding one more)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall very good, I left a couple of comments about improving the tests coverage
@pytest.mark.unit | ||
def test_run_with_bytestreams(self, preview_samples_path): | ||
""" | ||
Test if the component runs correctly with ByteStream inputs. | ||
""" | ||
file_paths = [ | ||
preview_samples_path / "txt" / "doc_1.txt", | ||
preview_samples_path / "txt" / "doc_2.txt", | ||
preview_samples_path / "audio" / "the context for this answer is here.wav", | ||
preview_samples_path / "images" / "apple.jpg", | ||
] | ||
mime_types = ["text/plain", "text/plain", "audio/x-wav", "image/jpeg"] | ||
# Convert file paths to ByteStream objects and set metadata | ||
byte_streams = [] | ||
for path, mime_type in zip(file_paths, mime_types): | ||
stream = ByteStream(path.read_bytes()) | ||
|
||
stream.metadata["content_type"] = mime_type | ||
|
||
byte_streams.append(stream) | ||
|
||
router = FileTypeRouter(mime_types=["text/plain", "audio/x-wav", "image/jpeg"]) | ||
output = router.run(sources=byte_streams) | ||
assert output | ||
assert len(output["text/plain"]) == 2 | ||
assert len(output["audio/x-wav"]) == 1 | ||
assert len(output["image/jpeg"]) == 1 | ||
assert not output.get("unclassified", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for completeness, let's also add a ByteStream with no content_type
and test that it goes under unclassified
assert len(output["audio/x-wav"]) == 1 | ||
assert len(output["image/jpeg"]) == 1 | ||
assert not output.get("unclassified", None) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a test that mixes Paths and ByteStreams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a small change from my side, all good otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good on my side
Why:
The main objective behind these commits was to refine the file routing mechanism by including support for
ByteStream
types. This initiative aims to broaden the spectrum of file types handled within the system, ensuring a more versatile and robust file routing solution.What:
FileExtensionRouter
toFileTypeRouter
to better reflect its functionality.ByteStream
handling within theFileTypeRouter
run
method, thereby enhancing the routing capabilities to cater to byte stream data along with other file types.How Did You Test It:
Ran a suite of unit and integration tests to validate the updated routing mechanism. Additionally, performed manual testing to verify the correct routing of
ByteStream
data and other file types through the updatedFileTypeRouter
. All tests passed successfully, affirming the reliability and efficiency of the changes made.Notes for Reviewer:
ByteStream
handling logic to ensure they align with the project's coding and naming conventions.ByteStream
is a crucial addition; any feedback on the implementation would be highly appreciated to ensure optimal performance and error handling.source
instead ofpaths
in therun
method signature. Aspaths
is kinda specific, I thought thatsources
better depicts this input parameter. We should standardize this name across all components that useUnion[str, Path, ByteStream]