Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) #12308

Conversation

lumenintellects
Copy link

@lumenintellects lumenintellects commented Oct 25, 2023

Issue: [#3637]

Summary:
The codebase includes a private method _load_sheet_from_id that can handle Google Sheets, which is currently only invoked when loading documents from a folder. This PR aims to unify the document loading approach, ensuring both individual IDs and folders consider the mimeType to determine the document type.

Changes:

  1. Introduced _process_file_by_mimetype function to handle the processing logic based on file mimeType, reducing code repetition.
  2. Refactored _load_documents_from_ids and _load_documents_from_folder to utilize the new _process_file_by_mimetype function, ensuring consistent behavior.

Reminder
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17.

@vercel
Copy link

vercel bot commented Oct 25, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Oct 25, 2023 11:37pm

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Oct 25, 2023
@lumenintellects lumenintellects changed the title Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue [#3637]) Oct 26, 2023
@lumenintellects lumenintellects changed the title Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue [#3637]) Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue [#3637]]) Oct 26, 2023
@lumenintellects lumenintellects changed the title Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue [#3637]]) Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) Oct 26, 2023
@hwchase17 hwchase17 closed this Jan 30, 2024
@baskaryan baskaryan reopened this Jan 30, 2024
@lumenintellects
Copy link
Author

Hello @baskaryan, thank you for reopening this. If you plan to review and merge, I can resolve the conflict by moving the code to langchain_community accordingly. Let me know please, have a nice day!

@yj-ang
Copy link

yj-ang commented May 6, 2024

Thanks for creating this MR. I'm looking into read the spreadsheet with user input spreadsheet URL.

Wonder if we could get this MR review and merge into the master anytime soon. Otherwisem I might need to redo this implementation by my own.

@ccurme ccurme added the langchain Related to the langchain package label Jun 21, 2024
@ccurme ccurme added community Related to langchain-community and removed langchain Related to the langchain package labels Jul 19, 2024
.get(fileId=doc_id, fields="id,mimeType", supportsAllDrives=True)
.execute()
)
documents.extend(self._process_document_by_mimetype(file_data))
Copy link
Collaborator

@ccurme ccurme Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we went straight to _load_document_from_id for document types. Now we have some extra overhead where we fetch the file data, pluck out the id, and pass the ID to _load_document_from_id (which appears to pull the file again).

  1. Is this overhead necessary? Will it introduce latency for users?
  2. Is the new output from _load_documents_from_ids identical to previous behavior for existing supported document types?

@ccurme
Copy link
Collaborator

ccurme commented Aug 1, 2024

Closing but will re-open if there's desire to keep iterating, let me know!

@ccurme ccurme closed this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants