-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix already upload of zarr datasets with already existing datasource-properties.json
file
#8268
Conversation
📝 WalkthroughWalkthroughThe pull request introduces several updates to the WEBKNOSSOS project, including the addition of a tooltip displaying the total volume of datasets, renaming capabilities for datasets, and improved default colors for skeleton trees. The terminology "resolution" has been replaced with "magnification." Enhancements to AI model training accommodate differently sized bounding boxes. Multiple bug fixes address issues with dataset listing, uploads, annotation layer name restrictions, and synchronization of dataset settings. Additionally, support for HTTP API versions 3 and 4 has been removed, affecting backward compatibility. Changes
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🔇 Additional comments (2)CHANGELOG.unreleased.md (1)
The changelog entry accurately describes the fix for uploading zarr datasets with existing webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
The reordering of checks to prioritize Let's verify that this is the only location where we determine the data source type: ✅ Verification successfulLGTM! The data source type determination is correctly isolated. The search results confirm that:
🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Description: Check for other instances of data source type determination
# Expect: Only this implementation should handle the logic
# Search for similar data source type determination logic
rg -A 5 "looksLikeExploredDataSource|looksLikeZarrArray|FILENAME_DATASOURCE_PROPERTIES_JSON"
# Search for other implementations of guessTypeOfUploadedDataSource
ast-grep --pattern 'def guessTypeOfUploadedDataSource'
Length of output: 12855 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
469-471
: Consider adding debug logging for dataset type detection.To improve debuggability, consider adding debug-level logging that shows which conditions were checked and why a dataset was classified as a particular type. This would help diagnose similar issues in the future.
Example implementation:
private def guessTypeOfUploadedDataSource(dataSourceDir: Path): UploadedDataSourceType.Value = if (looksLikeExploredDataSource(dataSourceDir).openOr(false)) { + logger.debug(s"Dataset at $dataSourceDir identified as already explored") UploadedDataSourceType.EXPLORED } else if (looksLikeZarrArray(dataSourceDir, maxDepth = 2).openOr(false)) { + logger.debug(s"Dataset at $dataSourceDir identified as zarr array") UploadedDataSourceType.ZARR
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(1 hunks)
🔇 Additional comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
469-471
: LGTM! The fix correctly prioritizes checking for explored datasets.
The reordering of checks ensures that datasets with existing datasource-properties.json
are properly identified as already explored, preventing unnecessary re-exploration attempts. This fixes the issue with zarr dataset uploads.
Let's verify the fix handles the described scenario correctly:
✅ Verification successful
Let me generate additional verification scripts to check the implementation details of these methods and their usage:
This will help us verify:
- The actual implementation of the file checking logic
- The constant definitions to ensure correct file names are being checked
- The existence of test cases validating this behavior
Let me try one more verification to check the implementation details:
This will help us verify:
- The constant definitions used for file checks
- The implementation of the file matching logic
- Any related test cases in the test directory
Based on the verification results, I can now generate the final response:
The fix correctly prioritizes checking for explored datasets and is properly implemented.
The code shows that:
FILENAME_DATASOURCE_PROPERTIES_JSON
is a well-defined constant used consistently across the codebase- The
containsMatchingFile
helper method is used to check for file existence with proper depth control - The order of checks ensures that explored datasets (with
datasource-properties.json
) are identified before attempting zarr detection
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the handling of zarr datasets with existing datasource-properties.json
# Test 1: Check if there are any zarr datasets with datasource-properties.json
rg -l --type-add 'zarr:*.zarray' --type zarr . | while read -r file; do
dir=$(dirname "$file")
if [ -f "$dir/datasource-properties.json" ]; then
echo "Found zarr dataset with properties: $dir"
cat "$dir/datasource-properties.json"
fi
done
# Test 2: Verify the exploration logic order in other files
rg -A 5 "looksLikeExploredDataSource|looksLikeZarrArray" --type scala
Length of output: 4766
Script:
#!/bin/bash
# Check the implementation of containsMatchingFile and the constants used
ast-grep --pattern 'private def containsMatchingFile'
ast-grep --pattern 'val FILENAME_DATASOURCE_PROPERTIES_JSON ='
ast-grep --pattern 'val FILENAME_DOT_ZARRAY ='
# Check for any test cases related to this functionality
rg -l "guessTypeOfUploadedDataSource|looksLikeExploredDataSource|looksLikeZarrArray" --type scala "*Test.scala" "*Spec.scala"
Length of output: 66266
Script:
#!/bin/bash
# Search for the constant definitions and method implementations
rg -A 2 "FILENAME_DATASOURCE_PROPERTIES_JSON|FILENAME_DOT_ZARRAY|containsMatchingFile" --type scala
# Look for test files specifically in the test directory
fd -e scala -e test "UploadService" "test/"
Length of output: 13549
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find!
Heap overflow doesn’t sound good. Could you create an issue with steps to reproduce (maybe link to such a json as a slack link?)
This PR fixes the upload of zarr datasets which already have a
datasource-properties.json
file present. The bug originates from (I'd say) wrong prioritization during finishing a dataset upload. First, it was checked whether the uploaded DS is in zarr format and only if that failed, the backend checked if there is adatasource-properties.json
present (meaning the dataset is already ready to be used). This PR moves the checking for adatasource-properties.json
first position. Thus, a full dataset with an already existingdatasource-properties.json
is just assumed as already explored an no additional effored it tried to generate adatasource-properties.json
file for the dataset.Previously, the backend would detect such a dataset as zarr and that it needed a
datasource-properties.json
whose contents needs to be guessed. And this is currently not supported by the zarr after upload exploration code. Thus, the uploads failed.I hope this is the correct way of fixing this. An alternative (which I tried first before locating the wrong ordering of determining the ds format) was to fix the zarr upload exploration. But this created strange datasets -> all mags were interpreted as layers. Which lead to a wrong
datasource-properties.json
and moreover: !! @fm3 The server crashed with a heap overflow when trying to view the dataset. IMO this should never happen in case somedatasource-properties.json
is misconfigured.URL of deployed dev instance (used for testing):
Steps to test:
datasource-properties.json
Issues:
(Please delete unneeded items, merge only when none are left open)