Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement proposal: refactor elif statements for partitioners in auto.py #1716

Closed
Coniferish opened this issue Oct 11, 2023 · 5 comments
Closed

Comments

@Coniferish
Copy link
Collaborator

There's a lot of redundancy in auto.py when going through the elif statements to check the filetype and send the document to the correct partitioner. This proposes collecting all the options users can define into a single object that can be unpacked by each partitioner so partition calls can look something like this instead of having to explicitly pass all the arguments (but still explicitly pass all data source parameters):

if filetype == FileType.PDF:
        elements = partition_pdf(
            filename=filename,
            file=file,
            url=url,
            **kwargs,
        )
    if filetype == FileType.HTML:
        elements = partition_html
            filename=filename,
            file=file,
            url=url,
            **kwargs,
        )

We could create a PartitionerOptions(TypedDict) type that specifies the names and types of all the keyword-value pairs that can appear as options. This could help prevent typos and such and provides general type-safety.

@Coniferish
Copy link
Collaborator Author

@scanny Do you have any additional thoughts on what this could/should look like?

@newelh
Copy link
Contributor

newelh commented Oct 11, 2023

I sketched some ideas out in a branch here. It's far from complete, and probably too much of a refactor but some ideas along the same line of thinking

@Coniferish
Copy link
Collaborator Author

Oh nice. Is there an open issue that branch is being developed for? I admittedly didn't look for any open Issues related to this, but did just find this one:
#601

@newelh
Copy link
Contributor

newelh commented Oct 11, 2023

No, there wasn't a specific issue, mostly personal musings over a weekend. It's mildly related to #1521. e.g. reducing the amount of kwargs we're passing around, and long-term simplifying the user experience.

@scanny
Copy link
Collaborator

scanny commented Dec 18, 2024

Fixed by #3806.

@scanny scanny closed this as completed Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants