Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: [bugfix] fix source path for office files in O365 #28260

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MacanPN
Copy link
Contributor

@MacanPN MacanPN commented Nov 21, 2024

What problem are we fixing?

Currently documents loaded using O365BaseLoader fetch source from file.web_url (where file is <class 'O365.drive.File'>). This works well for .pdf documents. Unfortunately office documents (.xlsx, .docx ...) pass their web_url in following format:
https://sharepoint_address/sites/path/to/library/root/Doc.aspx?sourcedoc=%XXXXXXXX-1111-1111-XXXX-XXXXXXXXXX%7D&file=filename.xlsx&action=default&mobileredirect=true

This obfuscates the path to the file. This PR utilizes the parrent folder's path and file name to reconstruct the actual location of the file. Knowing the file's location can be crucial for some RAG applications (path to the file can carry information we don't want to loose).

@vbarda Could you please look at this one? I'm @-mentioning you since we've already closed some PRs together :-)

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Nov 21, 2024
Copy link

vercel bot commented Nov 21, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Nov 21, 2024 2:36pm

@dosubot dosubot bot added community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Nov 21, 2024
@MacanPN MacanPN changed the title cummunity: [bugfix] fix source path for office files in O365 community: [bugfix] fix source path for office files in O365 Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) size:S This PR changes 10-29 lines, ignoring generated files.
Projects
Status: Triage
Development

Successfully merging this pull request may close these issues.

1 participant