Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) #12308

lumenintellects · 2023-10-25T23:37:47Z

Issue: [#3637]

Summary:
The codebase includes a private method _load_sheet_from_id that can handle Google Sheets, which is currently only invoked when loading documents from a folder. This PR aims to unify the document loading approach, ensuring both individual IDs and folders consider the mimeType to determine the document type.

Changes:

Introduced _process_file_by_mimetype function to handle the processing logic based on file mimeType, reducing code repetition.
Refactored _load_documents_from_ids and _load_documents_from_folder to utilize the new _process_file_by_mimetype function, ensuring consistent behavior.

Reminder
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17.

vercel · 2023-10-25T23:37:52Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Oct 25, 2023 11:37pm

lumenintellects · 2024-01-30T20:14:34Z

Hello @baskaryan, thank you for reopening this. If you plan to review and merge, I can resolve the conflict by moving the code to langchain_community accordingly. Let me know please, have a nice day!

yj-ang · 2024-05-06T06:46:42Z

Thanks for creating this MR. I'm looking into read the spreadsheet with user input spreadsheet URL.

Wonder if we could get this MR review and merge into the master anytime soon. Otherwisem I might need to redo this implementation by my own.

ccurme · 2024-07-24T19:01:31Z

libs/langchain/langchain/document_loaders/googledrive.py

+                .get(fileId=doc_id, fields="id,mimeType", supportsAllDrives=True)
+                .execute()
+            )
+            documents.extend(self._process_document_by_mimetype(file_data))


Previously we went straight to _load_document_from_id for document types. Now we have some extra overhead where we fetch the file data, pluck out the id, and pass the ID to _load_document_from_id (which appears to pull the file again).

Is this overhead necessary? Will it introduce latency for users?

Is the new output from _load_documents_from_ids identical to previous behavior for existing supported document types?

ccurme · 2024-08-01T20:13:56Z

Closing but will re-open if there's desire to keep iterating, let me know!

Usage of _load_sheet_from_idwhen load documents by ids is now possible

fb72df3

dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Oct 25, 2023

lumenintellects changed the title ~~Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637)~~ Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue [#3637]) Oct 26, 2023

lumenintellects changed the title ~~Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue [#3637])~~ Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue [#3637]]) Oct 26, 2023

lumenintellects changed the title ~~Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue [#3637]])~~ Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) Oct 26, 2023

baskaryan assigned eyurtsev Oct 31, 2023

hwchase17 closed this Jan 30, 2024

baskaryan reopened this Jan 30, 2024

ccurme added the langchain Related to the langchain package label Jun 21, 2024

ccurme added community Related to langchain-community and removed langchain Related to the langchain package labels Jul 19, 2024

ccurme reviewed Jul 24, 2024

View reviewed changes

ccurme closed this Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) #12308

Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) #12308

lumenintellects commented Oct 25, 2023 •

edited

Loading

vercel bot commented Oct 25, 2023 •

edited

Loading

lumenintellects commented Jan 30, 2024

yj-ang commented May 6, 2024

ccurme Jul 24, 2024 •

edited

Loading

ccurme commented Aug 1, 2024

Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) #12308

Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) #12308

Conversation

lumenintellects commented Oct 25, 2023 • edited Loading

vercel bot commented Oct 25, 2023 • edited Loading

lumenintellects commented Jan 30, 2024

yj-ang commented May 6, 2024

ccurme Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

ccurme commented Aug 1, 2024

lumenintellects commented Oct 25, 2023 •

edited

Loading

vercel bot commented Oct 25, 2023 •

edited

Loading

ccurme Jul 24, 2024 •

edited

Loading