-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) #12308
Enhance GoogleDriveLoader to Support Spreadsheets when loading documents from ids (issue #3637) #12308
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Hello @baskaryan, thank you for reopening this. If you plan to review and merge, I can resolve the conflict by moving the code to langchain_community accordingly. Let me know please, have a nice day! |
Thanks for creating this MR. I'm looking into read the spreadsheet with user input spreadsheet URL. Wonder if we could get this MR review and merge into the master anytime soon. Otherwisem I might need to redo this implementation by my own. |
.get(fileId=doc_id, fields="id,mimeType", supportsAllDrives=True) | ||
.execute() | ||
) | ||
documents.extend(self._process_document_by_mimetype(file_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we went straight to _load_document_from_id
for document types. Now we have some extra overhead where we fetch the file data, pluck out the id
, and pass the ID to _load_document_from_id
(which appears to pull the file again).
- Is this overhead necessary? Will it introduce latency for users?
- Is the new output from
_load_documents_from_ids
identical to previous behavior for existing supported document types?
Closing but will re-open if there's desire to keep iterating, let me know! |
Issue: [#3637]
Summary:
The codebase includes a private method
_load_sheet_from_id
that can handle Google Sheets, which is currently only invoked when loading documents from a folder. This PR aims to unify the document loading approach, ensuring both individual IDs and folders consider themimeType
to determine the document type.Changes:
_process_file_by_mimetype
function to handle the processing logic based on filemimeType
, reducing code repetition._load_documents_from_ids
and_load_documents_from_folder
to utilize the new_process_file_by_mimetype
function, ensuring consistent behavior.Reminder
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17.