Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add owner, source and size to GoogleDriveLoaders Document's metadata. #179

Merged
merged 1 commit into from
Apr 26, 2024

Conversation

rahul-trip
Copy link
Contributor

@rahul-trip rahul-trip commented Apr 23, 2024

Description:
This PR will add 3 new fields to metadata of Documents loaded by GoogleDriveLoader.
A new argument, load_extended_matadata is also added to enable these fields.
Documentation:
Will be added in followup PR (on lagchain repo).

@rahul-trip rahul-trip force-pushed the rahul_owner_source_size branch 3 times, most recently from 1eeca09 to c6e75f3 Compare April 24, 2024 03:04
@lkuligin
Copy link
Collaborator

thanks for the contribution!
for next iterations, we should start adding unit and integration tests to this library.


def _get_owner_metadata_from_id(self, id: str) -> str:
"""Fetch the owner of the file."""
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: we should have a local _import_lib function to avoid copy-pasting here and there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to import all the components at the top of the file, (within a try-except block).

This will improve readability and clarity of scopes. On the other hand a common internal method like _import_lib will lead to importing the modules multiple times. Also the imports will be limited to the scope of _import_lib method (which we will have to tackle by returning to calling methods or making it global).

Hope this is fine?

creds = self._load_credentials()
service = build("drive", "v3", credentials=creds)
try:
file = service.files().get(fileId=id, fields="size").execute()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: this all can be a small local function (smth like _load_attribute and it can take fields and credentials as parameters)

@@ -231,6 +337,10 @@ def _load_sheet_from_id(self, id: str) -> List[Document]:
}
if self.load_auth:
metadata["authorized_identities"] = authorized_identities
if self.load_extended_metadata:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge this and above section maybe?
metadata["owner"] = self._get_owner_metadata_from_id(id)

@gmogr
Copy link
Collaborator

gmogr commented Apr 24, 2024

/gcbrun

@lkuligin
Copy link
Collaborator

@rahul-trip do you think you could address these minor comments please?

@rahul-trip
Copy link
Contributor Author

@rahul-trip do you think you could address these minor comments please?

Certainly, on it.

except ImportError as exc:
raise ImportError(
"You must run "
"`pip install --upgrade "
Copy link
Collaborator

@lkuligin lkuligin Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, note we have groups now, so the hint should look like pip install langchain-google-community[drive]

@lkuligin lkuligin merged commit c00a371 into langchain-ai:main Apr 26, 2024
14 of 15 checks passed
baskaryan added a commit to langchain-ai/langchain that referenced this pull request Apr 29, 2024
…#20950)

Document: Updated google_drive,ipynb for loading following extended
metadata.
 - full_path - Full path of the file/s in google drive.
 - owner - owner of the file/s.
 - size - size of the file/s.

Code changes:
[langchain-google/pull/179.](langchain-ai/langchain-google#179)

Signed-off-by: Rahul Tripathi <[email protected]>
Co-authored-by: Rahul Tripathi <[email protected]>
Co-authored-by: Bagatur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants