Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow renaming datasets & dataset with duplicate names #8075

Merged
merged 151 commits into from
Nov 27, 2024

Conversation

MichaelBuessemeyer
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer commented Sep 12, 2024

Further Notes:

  • Quite some of the line changes are the result of moving the ObjectId class to the utils package so that all wk backend servers have access to this class.

URL of deployed dev instance (used for testing):

Steps to test:

  • Give two datasets the same name and check whether annotations and so on works
  • Test whether the task system still works with duplicate dataset names
  • check dataset upload
    • dataset upload
    • add remote
    • compose
  • check moving datasets between folders (single & multiple at once)
  • check whether the renaming works
  • when there are duplicate names, the correct dataset should open upon view it (not another one with the same name)
  • Test worker jobs (tested by @fm3)
  • Test legacy routes
  • And much much more 🙈

TODOs:

  • Add evolution and reversion
    • testing needed
  • Test uploading:
    • Report upload fails
  • Adjust worker to newest job arguments as the dataset name can no longer be used to uniquely identify a dataset
  • rename organization_name in worker to organization_id. see Rename organization_name to organization_id in worker args #8038
  • Dataset Name settings field has an unwanted spinner (see upload view)
  • Check the job list
  • Properly implement legacy searching for datasets when old URI param is used
  • Adjust legacy API routes to return dataset in old format
    • It is just an additional field. Thus, I would say it should be fine.
  • datasets appear to be duplicated in the db
    • Maybe these are created by jobs with an output dataset
  • Fix dataset insert
  • Skeleton & VolumeTracings address a dataset via its name
    • Not really used only during task / annotation creation
    • Use heuristic upon upload and temporary patch the Tracing case classes to carry the datasetId during the creation process once the dataset is identified once.
    • Task creation works
    • Needs testing
      • fix annotation upload
    • needs to support old nmls
  • Put datasetId into newly created nmls
  • In the backend LinkedLayerIdentifier still uses the datasetName as an identifier
    • used in wklibs, maybe just interpret the name as a path and work with this. in case it cannot be found the user needs to update wklibs. Add comment for this!
  • [ ] the dataset C555_tps_demo has quite some bucket loading errors. Unsure why some buckets do not work The dataset seems to be broken. Could reproduce this on other branches
  • Notion-style URLs are missing (i.e. -, but only the id part is actually used)
  • Maybe remove DatasetURIParser

I would also suggest to

  • Execute the screenshot tests once for this branch. Take care to change the branch name in the CI config (or local command), so that the tests are actually ran on this branch.
  • Looks much better now :) Did you re-test that exceptions are caught as expected?
  • undo application.conf & snapshot changes -> in progress

Issues:


(Please delete unneeded items, merge only when none are left open)

Comment on lines 49 to 51
"dataSet" -> dataset.name,
"datasetName" -> dataset.name,
"datasetId" -> dataset._id, // Only used for csv serialization in frontend.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

Comment on lines 465 to 469
creationInfo: null,
dataSet: '2012-06-28_Cortex',
datasetId: '570b9f4e4bb848d0885ee711',
datasetName: '2012-06-28_Cortex',
editPosition: [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fm3 I just noticed by check which files where updated. I thought we wanted to avoid having datasetName and dataSet in parallel?

Although checking: #8075 (comment) shows that we did not clearly have this decision. Should I include a legacy route for anything that returns / receives a task? There is already a legacy route for creating and updating a task, but the publicWrites in TaskService still returns

 "dataSet" -> dataset.name,
 "datasetName" -> dataset.name,
 "datasetId" -> dataset._id, // Only used for csv serialization in frontend.

Should I add a legacy route for this to not always send dataSet and datasetName?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be seen in tasks.e2e.js.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to send the old info only in legacy routes (if the frontend doesn’t use it now). However, leaving in the redundant property does not hurt much either. So I’ll leave this decision up to you. If you think it’s quick and easy enough to clean this up now, go for it. Otherwise, I’m also happy if we can merge this soon, and perhaps schedule a follow-up issue for this (then we’ll have to bump the api version again, but that should not be a problem).

Copy link
Member

@daniel-wer daniel-wer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the screenshot tests were successful at some point 👍

@@ -15,6 +15,7 @@ For upgrade instructions, please check the [migration guide](MIGRATIONS.released

### Changed
- Reading image files on datastore filesystem is now done asynchronously. [#8126](https://github.com/scalableminds/webknossos/pull/8126)
- Dataset can now be renamed and can have duplicate names. [#8075](https://github.com/scalableminds/webknossos/pull/8075)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Dataset can now be renamed and can have duplicate names. [#8075](https://github.com/scalableminds/webknossos/pull/8075)
- Datasets can now be renamed and can have duplicate names. [#8075](https://github.com/scalableminds/webknossos/pull/8075)

@MichaelBuessemeyer
Copy link
Contributor Author

I now removed the dataSet field in jsonified task object and added the legacy adaption.
I checked routes that needed adaptation and came up with that the following do not need adaptation or do they?

- /tasks/:id/assign                                     controllers.TaskController.assignOne
- /taskTypes/:id/tasks                                  controllers.TaskController.listTasksForType
- /tasks/list                                           controllers.TaskController.listTasks
- /user/tasks/peek                                      controllers.TaskController.peekNext
- /tasks/:id                                            controllers.TaskController.update
- /annotations/:id/addAnnotationLayer                   controllers.AnnotationController.addAnnotationLayerWithoutType
- /annotations/:typ/:id/addAnnotationLayer              controllers.AnnotationController.addAnnotationLayer
- /datasets/:datasetId/createExplorational              controllers.AnnotationController.createExplorational
- /annotations/:id/downsample                           controllers.AnnotationController.downsampleWithoutType
- /annotations/:typ/:id/downsample                      controllers.AnnotationController.downsample
- /annotations/:typ/:id/duplicate                       controllers.AnnotationController.duplicate
- /annotations/:typ/:id/editLockedState                 controllers.AnnotationController.editLockedState
- /annotations/:typ/:id/finish                          controllers.AnnotationController.finish
- /datasets/:datasetId/sandbox/:typ                     controllers.AnnotationController.getSandbox
- /annotations/:typ/:id/info                            controllers.AnnotationController.info
- /annotations/:typ/:id/makeHybrid                      controllers.AnnotationController.makeHybrid
- /annotations/:typ/:id/merge/:mergedTyp/:mergedId      controllers.AnnotationController.merge
- /annotations/:typ/:id/reopen                          controllers.AnnotationController.reopen
- /annotations/:typ/:id/reset                           controllers.AnnotationController.reset
- /annotations/:typ/:id/transfer                        controllers.AnnotationController.transfer

I'll do testing of the new legacy routes later

@fm3
Copy link
Member

fm3 commented Nov 26, 2024

I checked routes that needed adaptation and came up with that the following do not need adaptation or do they?

Looks right to me!

@MichaelBuessemeyer
Copy link
Contributor Author

Hi @fm3,

I added the required legacy routes to remove dataSet from the json serialized task objects in the newest API version. Please find the following links & curl script to test these routes with the dev instance. All ids included & dataset names should exist on the dev instance and therefore just clicking the links should work. As these are legacy routes, the results should include the legacy field dataSet in each task object. Some routes return an annotation which itself has a task.

For the curl commands, you first need to fill in the id cookie before you can run the scripts.

Before testing here is one more important thing to double check: The legacy routes now include both: dataSet and datasetName. Can wklibs handle / ignore new unexpected json fields?

Checklist for testing new Legacy Routes

Testing create & update task -> :shurg: I did this manually in firefox with the edit & resent feature on the dev instance.

In case you want to test more complicated routes:

curl command for checking task creation (setting id cookie is required)

curl 'https://allowdatasetrenaming.webknossos.xyz/api/tasks' \
  -H 'accept: application/json' \
  -H 'accept-language: en-US,en;q=0.9,de;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -H 'cookie: id=<fill-me-in>' \
  -H 'origin: https://allowdatasetrenaming.webknossos.xyz' \
  -H 'pragma: no-cache' \
  -H 'priority: u=1, i' \
  -H 'referer: https://allowdatasetrenaming.webknossos.xyz/tasks/create' \
  -H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' \
  --data-raw '[{"taskTypeId":"63721e2cef0100470266c485","neededExperience":{"domain":"sampleExp","value":1},"pendingInstances":1,"projectName":"sampleProject","boundingBox":null,"dataSet":"kiwi","editPosition":[0,0,0],"editRotation":[0,0,0],"baseAnnotation":null}]'

Test task update: (setting id cookie is required)

curl 'https://allowdatasetrenaming.webknossos.xyz/api/tasks/6746ecc6010000d80015ea8c' \
  -X 'PUT' \
  -H 'accept: application/json' \
  -H 'accept-language: en-US,en;q=0.9,de;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -H 'cookie: id=<fill-me-in>' \
  -H 'origin: https://allowdatasetrenaming.webknossos.xyz' \
  -H 'pragma: no-cache' \
  -H 'priority: u=1, i' \
  -H 'referer: https://allowdatasetrenaming.webknossos.xyz/tasks/6746ecc6010000d80015ea8c/edit' \
  -H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' \
  --data-raw '{"taskTypeId":"63721e2cef0100470266c485","neededExperience":{"domain":"sampleExp","value":1},"pendingInstances":2,"projectName":"sampleProject","boundingBox":null,"datasetId":"kiwi","editPosition":[0,0,0],"editRotation":[0,0,0]}'

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM & Works for me =) Thanks! I added two more small comments. Also, have a look at the merge conflicts. None of that should be much work. Let’s ship it! :shipit:

sil: Silhouette[WkEnv])(implicit ec: ExecutionContext, bodyParsers: PlayBodyParsers)
extends Controller {

/* to provide v8, remove legacy routes */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this comment. What legacy routes are removed? (I mean, the PR removes some, but this comment appears to be about the following code lines). Maybe it should be something like “to provide v8, add dataSet to task json?” Or maybe remove these comments altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah damn it. This is a mixture of a todo comment which I forgot to remove and copy & paste mistake.

I renamed all such occurrences to simple /* provide vX */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe remove these comments altogether.

Something like these comments were present previously (I think). Thus, I kept them to "order" the code a little :)

I can also remove them in case you prefer that :)

@@ -26,6 +27,7 @@ For upgrade instructions, please check the [migration guide](MIGRATIONS.released
- Fix a bug where dataset uploads would fail if the organization directory on disk is missing. [#8230](https://github.com/scalableminds/webknossos/pull/8230)

### Removed
- Removed legacy routes for versions 2,3 and 4. [#8075](https://github.com/scalableminds/webknossos/pull/8075)
Copy link
Member

@fm3 fm3 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Removed legacy routes for versions 2,3 and 4. [#8075](https://github.com/scalableminds/webknossos/pull/8075)
- Removed support for HTTP API versions 3 and 4. [#8075](https://github.com/scalableminds/webknossos/pull/8075)

(v2 was already removed previously) Could you also mention this in the migration guide please? (If someone has old code interacting with the HTTP API, they need to update)

Michael Büßemeyer added 2 commits November 27, 2024 11:26
- fix changelog entry
- add migration entry about dropped api versions
@MichaelBuessemeyer
Copy link
Contributor Author

merge conflicts & your two comments are done now. So it's ready for 🚢

@fm3
Copy link
Member

fm3 commented Nov 27, 2024

🎉 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants