Skip to content

Commit

Permalink
Change Default GoogleDriveLoader Behavior to not Load Trashed Files (…
Browse files Browse the repository at this point in the history
…issue langchain-ai#5104) (langchain-ai#5220)

# Change Default GoogleDriveLoader Behavior to not Load Trashed Files
(issue langchain-ai#5104)

Fixes langchain-ai#5104

If the previous behavior of loading files that used to live in the
folder, but are now trashed, you can use the `load_trashed_files`
parameter:

```
loader = GoogleDriveLoader(
    folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
    recursive=False,
    load_trashed_files=True
)
```

As not loading trashed files should be expected behavior, should we
1. even provide the `load_trashed_files` parameter?
2. add documentation? Feels most users will stick with default behavior

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

DataLoaders
- @eyurtsev

Twitter: [@nicholasliu77](https://twitter.com/nicholasliu77)
  • Loading branch information
NickL77 authored and Undertone0809 committed Jun 19, 2023
1 parent 2f36b0b commit 2cd21e6
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions langchain/document_loaders/googledrive.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
file_ids: Optional[List[str]] = None
recursive: bool = False
file_types: Optional[Sequence[str]] = None
load_trashed_files: bool = False

@root_validator
def validate_inputs(cls, values: Dict[str, Any]) -> Dict[str, Any]:
Expand Down Expand Up @@ -215,16 +216,17 @@ def _load_documents_from_folder(
_files = files

returns = []
for file in _files:
if file["mimeType"] == "application/vnd.google-apps.document":
for file in files:
if file["trashed"] and not self.load_trashed_files:
continue
elif file["mimeType"] == "application/vnd.google-apps.document":
returns.append(self._load_document_from_id(file["id"])) # type: ignore
elif file["mimeType"] == "application/vnd.google-apps.spreadsheet":
returns.extend(self._load_sheet_from_id(file["id"])) # type: ignore
elif file["mimeType"] == "application/pdf":
returns.extend(self._load_file_from_id(file["id"])) # type: ignore
else:
pass

return returns

def _fetch_files_recursive(
Expand All @@ -238,7 +240,7 @@ def _fetch_files_recursive(
pageSize=1000,
includeItemsFromAllDrives=True,
supportsAllDrives=True,
fields="nextPageToken, files(id, name, mimeType, parents)",
fields="nextPageToken, files(id, name, mimeType, parents, trashed)",
)
.execute()
)
Expand Down

0 comments on commit 2cd21e6

Please sign in to comment.