Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache in getGoogleDriveObject, improved fixParents #10588

Merged
merged 9 commits into from
Feb 6, 2025

Conversation

tdraier
Copy link
Contributor

@tdraier tdraier commented Feb 6, 2025

Description

  • Optional cache in getGoogleDriveObject , based on timestamp
  • fixParents can compare with google and reupsert nodes if needed - rest of the method unchanged

Tests

Tested locally with small ds. Need to do some dryrun on small connectors

Risk

Messing up things in ds, would require a full sync

Deploy Plan

deploy connectors

@tdraier tdraier requested a review from philipperolet February 6, 2025 13:51
Copy link
Contributor

@philipperolet philipperolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 left 2 coms

Comment on lines 58 to 59
connectorId?: number,
memoizationKey?: string | number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would put that in a cache object => clearer readability, if cache / if not cache, + obvious that memoization&connectorId are here for the sole purpose of caching

Suggested change
connectorId?: number,
memoizationKey?: string | number
cache?: { connectorId: number,
memoizationKey: string | number }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely 💯

@@ -33,7 +32,9 @@ async function getFileParents(
while (currentObject.parent) {
const parent = await getGoogleDriveObject(
authCredentials,
currentObject.parent
currentObject.parent,
connectorId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i think with a cache object (and a global object argument) it would be more understandable

getGdriveObject({ auth, driveFileId: toto, cache: { blah}})

logger: Logger,
execute: boolean = true
) {
export async function fixParents({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this function supposed to be removed after?
I assume yes; let's add a TODO(nodes-core) clean up after project

(if not, we need to explain in detail what this does in a comment and why we had to resort to such extremities)

Copy link
Contributor

@philipperolet philipperolet Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe after we just remove the call in activities/garbage collector (but in that case would still need to explain what we do & why in a com)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it, at least for cli. For GC if everything is ok we can at some point remove it too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched it to dry run when running from the garbage collect, this will just log if we have inconsistent parent ids (not checking from google to avoid load)
Still the cli may be necessary until we're sure everything is fixed in the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we keep it then we need a good comment here because the function doesn't speak for itself

Comment on lines 250 to 258
await upsertDataSourceFolder({
dataSourceConfig,
folderId: file.dustFileId,
parents: parents.map((p) => getInternalId(p)),
parentId: dustParentId,
title: file.name ?? "",
mimeType: MIME_TYPES.GOOGLE_DRIVE.FOLDER,
sourceUrl: getSourceUrlForGoogleDriveFiles(file),
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to update the table under the file, if it's a spreadsheet file

Copy link
Contributor

@philipperolet philipperolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍 Pre-approving, left a nit, need to comment fix-parents too imo if we leave it in the cli

logger: Logger,
execute: boolean = true
) {
export async function fixParents({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we keep it then we need a good comment here because the function doesn't speak for itself

60 * 10 * 1000
);

export async function getGoogleDriveObject(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here what I had suggested was to move to an object argument, so we see in the calls that we are providing cache keys

@tdraier tdraier merged commit f55079e into main Feb 6, 2025
10 checks passed
@tdraier tdraier deleted the thomas/fix-parents-2 branch February 6, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants