Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: better support of pathlib paths in document loaders #18396

Merged
merged 13 commits into from
Mar 26, 2024

Conversation

mmajewsk
Copy link
Contributor

@mmajewsk mmajewsk commented Mar 1, 2024

So this arose from the #18397 problem of document loaders not supporting pathlib.Path.

This pull request provides more uniform support for Path as an argument.
The core ideas for this upgrade:

  • if there is a local file path used as an argument, it should be supported as pathlib.Path
  • if there are some external calls that may or may not support Pathlib, the argument is immidiately converted to str
  • if there self.file_path is used in a way that it allows for it to stay pathlib without conversion, is is only converted for the metadata.

Twitter handle: https://twitter.com/mwmajewsk

Copy link

vercel bot commented Mar 1, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Mar 18, 2024 2:04pm

@mmajewsk mmajewsk changed the title community: better description for deeplake error community: better and more uniform support of pathlib paths Mar 1, 2024
@mmajewsk mmajewsk changed the title community: better and more uniform support of pathlib paths community: better support of pathlib paths in document loaders Mar 1, 2024
@mmajewsk mmajewsk marked this pull request as ready for review March 1, 2024 18:25
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. Ɑ: doc loader Related to document loader module (not documentation) 🔌: anthropic Primarily related to Anthropic integrations 🤖:improvement Medium size change to existing code to handle new use-cases labels Mar 1, 2024
@eyurtsev
Copy link
Collaborator

eyurtsev commented Mar 1, 2024

Looks good for scanning quickl

@mmajewsk
Copy link
Contributor Author

mmajewsk commented Mar 2, 2024

@eyurtsev i think this may be ready to merge now

@mmajewsk
Copy link
Contributor Author

mmajewsk commented Mar 7, 2024

@efriis I think this is ready to merge

@mmajewsk
Copy link
Contributor Author

I'm trying to mention someone else: @baskaryan can you merge this?

@mmajewsk
Copy link
Contributor Author

@eyurtsev so far i'm upkeeping with the changes in master, it would be nice to it merged

@eyurtsev eyurtsev self-requested a review March 19, 2024 02:16
@eyurtsev
Copy link
Collaborator

I'll try to merge tomorrow added myself as reviewer

@eyurtsev eyurtsev self-assigned this Mar 19, 2024
@eyurtsev eyurtsev merged commit f7a1fd9 into langchain-ai:master Mar 26, 2024
59 checks passed
@eyurtsev
Copy link
Collaborator

tomorrow is relative i guess

@eyurtsev
Copy link
Collaborator

sorry for the delay @mmajewsk and thanks for the contribution!

gkorland pushed a commit to FalkorDB/langchain that referenced this pull request Mar 30, 2024
…hain-ai#18396)

So this arose from the
langchain-ai#18397 problem of document
loaders not supporting `pathlib.Path`.

This pull request provides more uniform support for Path as an argument.
The core ideas for this upgrade: 
- if there is a local file path used as an argument, it should be
supported as `pathlib.Path`
- if there are some external calls that may or may not support Pathlib,
the argument is immidiately converted to `str`
- if there `self.file_path` is used in a way that it allows for it to
stay pathlib without conversion, is is only converted for the metadata.

Twitter handle: https://twitter.com/mwmajewsk
hinthornw pushed a commit that referenced this pull request Apr 26, 2024
So this arose from the
#18397 problem of document
loaders not supporting `pathlib.Path`.

This pull request provides more uniform support for Path as an argument.
The core ideas for this upgrade: 
- if there is a local file path used as an argument, it should be
supported as `pathlib.Path`
- if there are some external calls that may or may not support Pathlib,
the argument is immidiately converted to `str`
- if there `self.file_path` is used in a way that it allows for it to
stay pathlib without conversion, is is only converted for the metadata.

Twitter handle: https://twitter.com/mwmajewsk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔌: anthropic Primarily related to Anthropic integrations Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants