-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add owner, source and size to GoogleDriveLoaders Document's metadata. #179
Add owner, source and size to GoogleDriveLoaders Document's metadata. #179
Conversation
1eeca09
to
c6e75f3
Compare
thanks for the contribution! |
|
||
def _get_owner_metadata_from_id(self, id: str) -> str: | ||
"""Fetch the owner of the file.""" | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: we should have a local _import_lib
function to avoid copy-pasting here and there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am planning to import all the components at the top of the file, (within a try-except block).
This will improve readability and clarity of scopes. On the other hand a common internal method like _import_lib
will lead to importing the modules multiple times. Also the imports will be limited to the scope of _import_lib
method (which we will have to tackle by returning to calling methods or making it global).
Hope this is fine?
creds = self._load_credentials() | ||
service = build("drive", "v3", credentials=creds) | ||
try: | ||
file = service.files().get(fileId=id, fields="size").execute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: this all can be a small local function (smth like _load_attribute
and it can take fields
and credentials
as parameters)
@@ -231,6 +337,10 @@ def _load_sheet_from_id(self, id: str) -> List[Document]: | |||
} | |||
if self.load_auth: | |||
metadata["authorized_identities"] = authorized_identities | |||
if self.load_extended_metadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we merge this and above section maybe?
metadata["owner"] = self._get_owner_metadata_from_id(id)
/gcbrun |
@rahul-trip do you think you could address these minor comments please? |
cd16882
to
7cc873b
Compare
Certainly, on it. |
except ImportError as exc: | ||
raise ImportError( | ||
"You must run " | ||
"`pip install --upgrade " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, note we have groups now, so the hint should look like pip install langchain-google-community[drive]
Signed-off-by: Rahul Tripathi <[email protected]>
7cc873b
to
6e2ee6b
Compare
…#20950) Document: Updated google_drive,ipynb for loading following extended metadata. - full_path - Full path of the file/s in google drive. - owner - owner of the file/s. - size - size of the file/s. Code changes: [langchain-google/pull/179.](langchain-ai/langchain-google#179) Signed-off-by: Rahul Tripathi <[email protected]> Co-authored-by: Rahul Tripathi <[email protected]> Co-authored-by: Bagatur <[email protected]>
Description:
This PR will add 3 new fields to metadata of Documents loaded by GoogleDriveLoader.
A new argument,
load_extended_matadata
is also added to enable these fields.Documentation:
Will be added in followup PR (on lagchain repo).