Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to continue upload after browser restart for same user #7981

Merged
merged 29 commits into from
Aug 27, 2024

Conversation

MichaelBuessemeyer
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer commented Aug 7, 2024

URL of deployed dev instance (used for testing):

Steps to test:

  • Open the upload view on the dev instance: https://morerobustresumableupload.webknossos.xyz/datasets/upload?to=66b60af101000001003deac5
  • Open the dev tools. In the debugger open file dataset_upload_view.tsx and set a breakpoint in line 425 (before resumableUpload.upload(); is called upon the files added action.
  • Keep the dev tools open, upload some small dataset (to have less traffic)
    • Enter dataset name, allowed teams, target folder and drag & drop a dataset zip
  • Hit the upload button. The upload should be reserved and then the debugger should kick in an interrupt starting the upload process.
  • Close your browser and do not skip the breakpoint.
  • Reopen the browser and reopen the upload view.
  • Open the dev tools to track the network traffic
  • The view should now show the dataset name you just tried to upload but manually interrupted before starting the upload (stopping during the upload should also work. Click the continue upload
  • The UI should be autofilled with the previous values and disabled.
  • Drag & drop the same file as before and hit upload
  • The upload should happily continue and succeed :)
  • Now test the network traffic. There should be get requests to the /datasets route testing whether a chunk is already present. They should be returning 204 if no chunk was previously uploaded. If you tweaked the testing such that some chunks were already uploaded, some of these requests should answer 200 (signaling the chunk is present) (see below)
  • Test the uploaded dataset. It should be viewable as usual.

  • Alternatively, for the debugger interruption, choose a larger dataset to upload and interrupt the upload in time.

TODOs:

  • Test whether this works with multi-file upload (e.g. multiple tiffs at once)

Issues:


(Please delete unneeded items, merge only when none are left open)

  • Updated changelog
  • Needs datastore update after deployment

@MichaelBuessemeyer MichaelBuessemeyer changed the title implement test route checking if chunk is present Allow to continue upload after browser restart Aug 8, 2024
@MichaelBuessemeyer MichaelBuessemeyer changed the title Allow to continue upload after browser restart Allow to continue upload after browser restart for same user Aug 8, 2024
@MichaelBuessemeyer MichaelBuessemeyer marked this pull request as ready for review August 9, 2024 14:25
@MichaelBuessemeyer
Copy link
Contributor Author

@philippotto could you please review the frontend and @normanrz could you please review the backend code?

Copy link
Member

@philippotto philippotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool stuff!! didn't test yet but already left some comments :)

@@ -76,7 +76,7 @@
"start": "node tools/proxy/proxy.js",
"build": "node --max-old-space-size=4096 node_modules/.bin/webpack --env production",
"@comment build-backend": "Only check for errors in the backend code like done by the CI. This command is not needed to run WEBKNOSSOS",
"build-backend": "yarn build-wk-backend && yarn build-wk-datastore && yarn build-wk-tracingstore",
"build-backend": "yarn build-wk-backend && yarn build-wk-datastore && yarn build-wk-tracingstore && rm webknossos-tracingstore/conf/messages webknossos-datastore/conf/messages",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These message files are generated by the script as the tracingstore and datastore are built. When they are not deleted and one tries to run the script again the compilation of the two components somehow fails. After running into this multiple times (me having to manually delete these files) as well as taking care of not pushing these files, I simply wanted the script to also take care of cleaning up the auto-generated message files :)

@@ -1307,6 +1307,35 @@ export function reserveDatasetUpload(
);
}

export type OngoingUpload = {
uploadId: string;
dataSourceId: { name: string; organizationName: string };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually we call this datasetId in the frontend (see APIDatasetId). can we use that term here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thanks. missed that.

But in the backend the name is usually called dataSourceId? So we do have a naming mismatch between the frontend and backend here 🥴?

frontend/javascripts/admin/dataset/dataset_upload_view.tsx Outdated Show resolved Hide resolved
frontend/javascripts/admin/dataset/dataset_upload_view.tsx Outdated Show resolved Hide resolved
frontend/javascripts/admin/dataset/dataset_upload_view.tsx Outdated Show resolved Hide resolved
frontend/javascripts/admin/dataset/dataset_upload_view.tsx Outdated Show resolved Hide resolved
Copy link
Contributor Author

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback philipp :)

I applied / commented everything :)

@@ -1307,6 +1307,35 @@ export function reserveDatasetUpload(
);
}

export type OngoingUpload = {
uploadId: string;
dataSourceId: { name: string; organizationName: string };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thanks. missed that.

But in the backend the name is usually called dataSourceId? So we do have a naming mismatch between the frontend and backend here 🥴?

frontend/javascripts/admin/dataset/dataset_upload_view.tsx Outdated Show resolved Hide resolved
@@ -76,7 +76,7 @@
"start": "node tools/proxy/proxy.js",
"build": "node --max-old-space-size=4096 node_modules/.bin/webpack --env production",
"@comment build-backend": "Only check for errors in the backend code like done by the CI. This command is not needed to run WEBKNOSSOS",
"build-backend": "yarn build-wk-backend && yarn build-wk-datastore && yarn build-wk-tracingstore",
"build-backend": "yarn build-wk-backend && yarn build-wk-datastore && yarn build-wk-tracingstore && rm webknossos-tracingstore/conf/messages webknossos-datastore/conf/messages",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These message files are generated by the script as the tracingstore and datastore are built. When they are not deleted and one tries to run the script again the compilation of the two components somehow fails. After running into this multiple times (me having to manually delete these files) as well as taking care of not pushing these files, I simply wanted the script to also take care of cleaning up the auto-generated message files :)

// Rename "team" to "organization" as this is the actual used current naming.
return ongoingUploads.map(({ dataSourceId: { name, team }, ...rest }) => ({
return ongoingUploads.map(({ datasetId: { name, team }, ...rest }) => ({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tested this, and I think this needs to be changed back to dataSourceId here because the back-end uses that. otherwise, I get:

TypeError: Cannot read properties of undefined (reading 'name')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, thanks for spotting. Should be better now hopefully. will retest on monday

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested and worked :)

@MichaelBuessemeyer MichaelBuessemeyer requested review from fm3 and removed request for normanrz August 16, 2024 08:25
Copy link
Contributor Author

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback 🙏. It should be applied now except for one comment which I do not understand and need clarification. I commented the comment in question :)

Comment on lines 138 to 143
_ <- runningUploadMetadataStore.insert(
Json.stringify(Json.toJson(DataSourceId(reserveUploadInformation.name, reserveUploadInformation.organization))),
reserveUploadInformation.uploadId
)
_ <- runningUploadMetadataStore.insert(
redisKeyForLinkedLayerIdentifier(reserveUploadInformation.uploadId),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I dont full understand your comment 🤔

With

Now we are inserting two different types of data into the same set.

you mean, that now a stringified json object is used as a key while the other inserts always use a "string only" key?

And to not have this you would want to add a new store to redis just to save the necessary mapping from datasource id back to the ongoing upload info stored in redis? This acutally seemes a little overkill. I instead could implement something like redisKeyForDatasourceId or so and this would transform a given datasource id into a unique string. Does that sound better?

Comment on lines 159 to 161
foundOngoingUploads = maybeOngoingUploads.filter(idBox => idBox.isDefined).flatMap(idBox => idBox.value).collect {
case Some(value) => value
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a more consice version:

foundOngoingUploads = maybeOngoingUploads.collect { case Full(Some(value)) => value }

The type of maybeOngoingUploads is List[Box[Option[OngoingUpload]]]

@MichaelBuessemeyer
Copy link
Contributor Author

can the voxel size also be pre-filled? (for in needsConversion case)

This information is not present in the backend as it is only passed to the worker conversion job once the conversion job is started. Storing this in the backend would need some further implementation effort :)

would it be simple to do some sanity checks before I can hit upload again? e.g. number of files to upload is equal to that of the unfinished upload?

I think that would be possible. I would need to retrieve this information from Redis and then append it to the OngoingUploads case class objects (which will be renamed soon). I might need some time but definitely doable. The question: Should this be included in the first version?

@MichaelBuessemeyer
Copy link
Contributor Author

And now everything should be named consistently. At least I tried 🙈. I'll do retesting tomorrow :)

@MichaelBuessemeyer
Copy link
Contributor Author

I'll do retesting tomorrow :)

Should work :)

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied to the thread about redis keys. Other than that, the backend looks good to me now :)

@MichaelBuessemeyer
Copy link
Contributor Author

And what about

can the voxel size also be pre-filled? (for in needsConversion case)

This information is not present in the backend as it is only passed to the worker conversion job once the conversion job is started. Storing this in the backend would need some further implementation effort :)

would it be simple to do some sanity checks before I can hit upload again? e.g. number of files to upload is equal to that of the unfinished upload?

I think that would be possible. I would need to retrieve this information from Redis and then append it to the OngoingUploads case class objects (which will be renamed soon). I might need some time but definitely doable. The question: Should this be included in the first version?

?

@fm3
Copy link
Member

fm3 commented Aug 23, 2024

Ah, sorry, I missed that. Ok then let’s skip the pre-filling for voxel size. I think the little sanity check would be good to have already in this PR.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding that! Worked for me! Backend LGTM, leaving final approval to frontend folks

@MichaelBuessemeyer
Copy link
Contributor Author

I tweaked the frontend check a little (compared to the version from Friday):

  1. The frontend does no longer check for the exact order of files selected for the upload. It only ensures that files with the exact naming are present as from the initial upload try
  2. The frontend now lists all files required for the continued upload in the error message. Previously, it was only guess work and thus didn't feel so good / easy to me when one file was missing and one had to guess which file might be missing for a multifile upload

@MichaelBuessemeyer
Copy link
Contributor Author

Should be good now :)

Copy link
Member

@philippotto philippotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@philippotto philippotto dismissed fm3’s stale review August 27, 2024 13:02

last review said: "Backend LGTM, leaving final approval to frontend folks"

@MichaelBuessemeyer MichaelBuessemeyer merged commit 5ace6e2 into master Aug 27, 2024
2 checks passed
@MichaelBuessemeyer MichaelBuessemeyer deleted the more-robust-resumable-upload branch August 27, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make resumable dataset uploads robust against browser restarts
3 participants