-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix file too large error #2799
Fix file too large error #2799
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This pull request addresses an issue in the Google Drive connector where large files (.doc, .ppt, or spreadsheets over 5MB) caused failures due to API limitations.
- Modified
_fetch_docs_from_drive
method inbackend/danswer/connectors/google_drive/connector.py
to handle export-related errors - Implemented graceful skipping of files exceeding 5MB limit for specific file types
- Enhanced error handling to prevent connector failure on oversized files
- Improved robustness of the Google Drive connector for processing large document sets
- Resolved user-reported issue from Slack (link provided in PR description)
1 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile
# these errors don't represent a failure in the connector, but simply files | ||
# that can't / shouldn't be indexed | ||
ERRORS_TO_CONTINUE_ON = [ | ||
"cannotExportFile", | ||
"exportSizeLimitExceeded", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider adding a constant or enum for these error types to improve maintainability
Previously would fail for .doc, .ppt, or spreadsheets that are over 5MB due to an API limitation.
https://danswer.slack.com/archives/C056265VB1N/p1728895185750569