Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log data loading errors in conversion / mapping application #8202

Merged
merged 10 commits into from
Nov 27, 2024

Conversation

frcroth
Copy link
Member

@frcroth frcroth commented Nov 18, 2024

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • Add an exception to the conversion

Issues:


(Please delete unneeded items, merge only when none are left open)

  • Updated changelog
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

Copy link

coderabbitai bot commented Nov 18, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces several enhancements and bug fixes to the WEBKNOSSOS project. It updates the changelog to reflect changes in asynchronous image file handling, improved error messages for job initiation, and enhanced logging for dataset loading errors. The BinaryDataService class has been modified to improve error handling, including detailed logging and a new method for centralized error logging. Additionally, performance issues related to tree deletions and NML file imports have been addressed, ensuring better functionality and data integrity.

Changes

File Change Summary
CHANGELOG.unreleased.md Updated to reflect changes in functionality, including asynchronous file handling and error messages.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala Enhanced error handling in convertIfNecessary and handleBucketRequest, updated method signature for better logging.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala Added withErrorLogging method for improved error logging handling various result cases.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala Removed applicationHealthService parameter from constructor, simplifying dependency management.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala Simplified instantiation of binaryDataService by removing an unnecessary parameter.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala Simplified instantiation of binaryDataService by reducing the number of parameters.

Assessment against linked issues

Objective Addressed Explanation
Log all data loading errors (#8081) The changes do not address logging errors occurring after bucket requests.

Possibly related PRs

Suggested labels

bug, enhancement, backend

Suggested reviewers

  • fm3
  • dieknolle3333

Poem

In the datastore's burrow, changes bloom,
Asynchronous whispers chase away the gloom.
Errors now logged with a hop and a skip,
Bugs fixed with care, on a debugging trip.
Hooray for the fixes, let the data flow free,
A brighter tomorrow for you and for me! 🐇✨

Warning

Rate limit exceeded

@frcroth has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 20 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 921ec9c and 49226a9.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@frcroth frcroth self-assigned this Nov 18, 2024
@frcroth frcroth requested a review from fm3 November 18, 2024 10:18
@frcroth frcroth marked this pull request as ready for review November 18, 2024 10:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
CHANGELOG.unreleased.md (1)

18-18: Enhance the changelog entry to be more specific and user-focused.

While the entry correctly indicates improved logging, it could be more specific about the types of errors being logged and the user benefit. Consider expanding it to better reflect the changes:

- Improved logging for errors when loading datasets and problems arise during a conversion step. [#8202](https://github.com/scalableminds/webknossos/pull/8202)
+ Enhanced error logging for dataset loading failures, particularly for bucket requests and data conversion steps, to improve debugging capabilities. [#8202](https://github.com/scalableminds/webknossos/pull/8202)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (2)

58-78: LGTM: Comprehensive error handling with detailed logging.

The enhanced error handling with detailed logging aligns well with the PR objectives. The pattern matching is exhaustive and provides appropriate context in error messages.

Consider extracting the error message formatting to a separate private method to improve readability:

+  private def formatErrorMessage(request: DataServiceDataRequest, cuboid: String, additionalInfo: String): String =
+    s"Failed to convert data for layer ${request.dataLayer.name} of dataset ${request.dataSource.id.team}/${request.dataSource.id.name} at $cuboid: $additionalInfo"

   case Empty =>
-    logError(
-      request,
-      s"Failed to convert data for layer ${request.dataLayer.name} of dataset ${request.dataSource.id.team}/${request.dataSource.id.name} at ${request.cuboid}, result is Empty"
-    )
+    logError(request, formatErrorMessage(request, request.cuboid.toString, "result is Empty"))

220-225: Add documentation and tests for the new logError method.

While the implementation looks good, please consider:

  1. Adding ScalaDoc to document the method's purpose and parameters
  2. Adding unit tests to verify the logging behavior with different dataset configurations

Add documentation:

+  /**
+   * Logs error messages for dataset operations if logging is enabled for the dataset.
+   * 
+   * @param request The data request containing dataset and layer information
+   * @param msg The error message to log
+   */
   def logError(request: DataServiceDataRequest, msg: String): Unit =

Would you like me to help generate unit tests for this new functionality?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ec3015f and 826d938.

📒 Files selected for processing (2)
  • CHANGELOG.unreleased.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (5 hunks)
🔇 Additional comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (3)

13-13: LGTM: Import additions align with new error handling.

The addition of Empty and EmptyBox imports supports the enhanced error handling implementation.


121-125: LGTM: Enhanced error logging in bucket request handling.

The improved error logging provides comprehensive context about bucket loading failures, which directly addresses the PR objectives.


Line range hint 82-95: Verify error handling in handleDataRequests.

The method handles multiple data requests and includes error handling for agglomerate mapping and half-byte conversion. Let's verify that errors are properly propagated and logged.

✅ Verification successful

Error handling in handleDataRequests is properly implemented

The code inspection reveals that error handling is comprehensive and well-structured:

  • handleDataRequest has proper validation for cuboid dimensions
  • convertIfNecessary includes detailed error handling with logging
  • Errors are properly propagated through the Fox monad chain
  • The method handles both single bucket and multi-bucket scenarios safely
  • Error cases for agglomerate mapping and half-byte conversion are explicitly handled
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for error handling patterns in handleDataRequests implementations
# across the codebase to ensure consistent error handling.

# Look for similar error handling patterns
rg -A 10 "handleDataRequests.*Fox\[(Array\[Byte\]" 

# Check for tests covering error scenarios
rg -A 10 "\"handleDataRequests.*should.*error" --type=scala

Length of output: 229


Script:

#!/bin/bash
# Search for handleDataRequests method implementation
ast-grep --pattern 'def handleDataRequests'

# Search for error handling patterns in the file
rg -A 10 "handleDataRequest.*" webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala

# Look for test files related to BinaryDataService
fd "BinaryDataService.*Test.*scala"

Length of output: 1979

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, does the trick!
I wonder if we can make this a little dryer. Maybe the datasetErrorLoggingService could have a single public function, something like
withErrorLogging[T](dataSourceId: DataSourceId, label: String, resultFox: Fox[T]): Fox[T]. This could be called like
datasetErrorLoggingService.withErrorLogging(request.dataSource.id, "Converting data", conversionFunc(inputArray)). What do you think? (Note that I did not try this, so it may not work exactly in this way).

@frcroth
Copy link
Member Author

frcroth commented Nov 20, 2024

Nice, does the trick! I wonder if we can make this a little dryer. Maybe the datasetErrorLoggingService could have a single public function, something like withErrorLogging[T](dataSourceId: DataSourceId, label: String, resultFox: Fox[T]): Fox[T]. This could be called like datasetErrorLoggingService.withErrorLogging(request.dataSource.id, "Converting data", conversionFunc(inputArray)). What do you think? (Note that I did not try this, so it may not work exactly in this way).

I assume this would then be used for both conversion and handleBucketRequest?
One problem is that the applicationHealthService is not available there, so should this be passed in? Should there still be a matching for internalError then?
Additional notes:

  • This needs to check for datasetErrorLoggingService exists (is an option)
  • T needs to be Array[Byte] so that we can still check for zero length array
  • convertIfNecessary needs to return Fox

I implemented it in such a way. What do you think?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)

70-79: Consider simplifying the nested error handling structure

While the error handling improvements are good, the code could be more maintainable:

Consider this structure for better readability:

- mappedDataFox <- agglomerateServiceOpt.map { agglomerateService =>
-   convertIfNecessary(
-     request.settings.appliedAgglomerate.isDefined && request.dataLayer.category == Category.segmentation && request.cuboid.mag.maxDim <= MaxMagForAgglomerateMapping,
-     data,
-     agglomerateService.applyAgglomerate(request),
-     request
-   )
- }.fillEmpty(Fox.successful(data)) ?~> "Failed to apply agglomerate mapping"
- mappedData <- mappedDataFox
+ shouldApplyAgglomerate = request.settings.appliedAgglomerate.isDefined && 
+   request.dataLayer.category == Category.segmentation && 
+   request.cuboid.mag.maxDim <= MaxMagForAgglomerateMapping
+ mappedData <- agglomerateServiceOpt match {
+   case Some(agglomerateService) if shouldApplyAgglomerate =>
+     convertIfNecessary(true, data, agglomerateService.applyAgglomerate(request), request)
+       ?~> s"Failed to apply agglomerate mapping for dataset ${request.dataSource.id}"
+   case _ => 
+     Fox.successful(data)
+ }
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (1)

77-77: Clarify error message for empty failure

The error message "Error while $label for $dataSourceId, Empty failure" might be unclear to users or developers reading the logs. Consider rephrasing the message to provide more context or detail about the failure.

Possible improvement:

-          logger.error(s"Error while $label for $dataSourceId, Empty failure")
+          logger.error(s"Error while $label for $dataSourceId: Received an empty Failure without exception")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f7febd4 and a56da43.

📒 Files selected for processing (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (2 hunks)
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (3)

13-13: LGTM: Import cleanup aligns with Fox-based error handling

The removal of the Failure import while keeping Box/Full imports is consistent with the shift towards Fox for error handling.


58-63: LGTM: Enhanced error handling with centralized logging

The changes improve error tracking by:

  1. Making request context available for logging
  2. Using Fox for consistent error handling
  3. Centralizing error logging through datasetErrorLoggingService

Verify that all callers of this method have been updated to handle the Fox return type appropriately.


100-108: LGTM: Comprehensive error logging implementation

The error logging implementation is well-structured with:

  • Detailed context including layer name and bucket position
  • Proper error propagation to the application health service
  • Consistent error handling pattern
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (1)

49-82: New method withErrorLogging enhances error handling

The addition of the withErrorLogging method centralizes error handling and logging for dataset operations. This improves maintainability and ensures consistent error reporting across the codebase.

@fm3
Copy link
Member

fm3 commented Nov 20, 2024

Yes, that’s what I had in mind :)

Let’s remove applicationHealthService from the BinaryDataService and inject it in DatasetErrorLoggingService directly instead. That should make this a little more simple.

The other points are correct (if a little unfortunate). Let’s go with that :)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (1)

50-80: LGTM: Comprehensive error handling with rate limiting.

The implementation effectively addresses the PR objectives by:

  • Handling all error cases (zero-length arrays, internal errors, exceptions, empty failures)
  • Including rate limiting to prevent log spam
  • Providing context-rich error messages
  • Properly integrating with ApplicationHealthService for internal errors

Consider these potential future improvements:

  1. Add metrics/monitoring for error rates
  2. Consider implementing a circuit breaker pattern for datasets with persistent errors
  3. Add structured logging fields for better log aggregation
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (3)

57-62: Consider enhancing error message specificity in withErrorLogging

The integration with datasetErrorLoggingService is well-implemented, but the "Conversion" label could be more descriptive about what's being converted.

-          case Some(value) => value.withErrorLogging(request.dataSource.id, "Conversion", conversionFunc(inputArray))
+          case Some(value) => value.withErrorLogging(request.dataSource.id, s"Data conversion for ${request.dataLayer.name}", conversionFunc(inputArray))

69-78: Consider flattening nested Fox operations for better error handling

The current implementation nests Fox operations, which could make error tracking more complex. Consider flattening the operations for clearer error propagation.

-          mappedDataFox <- agglomerateServiceOpt.map { agglomerateService =>
-            convertIfNecessary(
-              request.settings.appliedAgglomerate.isDefined && request.dataLayer.category == Category.segmentation && request.cuboid.mag.maxDim <= MaxMagForAgglomerateMapping,
-              data,
-              agglomerateService.applyAgglomerate(request),
-              request
-            )
-          }.fillEmpty(Fox.successful(data)) ?~> "Failed to apply agglomerate mapping"
-          mappedData <- mappedDataFox
+          shouldApplyAgglomerate = request.settings.appliedAgglomerate.isDefined && 
+                                  request.dataLayer.category == Category.segmentation && 
+                                  request.cuboid.mag.maxDim <= MaxMagForAgglomerateMapping
+          mappedData <- agglomerateServiceOpt match {
+            case Some(agglomerateService) if shouldApplyAgglomerate =>
+              convertIfNecessary(true, data, agglomerateService.applyAgglomerate(request), request) ?~> 
+                "Failed to apply agglomerate mapping"
+            case _ => Fox.successful(data)
+          }

99-106: Consider structuring error message for better log parsing

While the error message includes good context, consider structuring it in a more log-parser-friendly format.

-            s"Bucket loading for layer ${request.dataLayer.name} at ${readInstruction.bucket}, cuboid: ${request.cuboid}",
+            s"bucket_loading layer=${request.dataLayer.name} bucket=${readInstruction.bucket} cuboid=${request.cuboid}",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a56da43 and 1d6219a.

📒 Files selected for processing (5)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala (0 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (3 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1 hunks)
💤 Files with no reviewable changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala
✅ Files skipped from review due to trivial changes (1)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (3)

5-10: LGTM: Imports are well-organized and necessary.

All new imports are utilized in the implementation and properly organized.


19-19: LGTM: Appropriate dependency injection for error handling.

The ApplicationHealthService injection aligns with the PR's objective to improve error logging and is used correctly in the new implementation.


50-80: Verify consistent error handling across the codebase.

Let's ensure this error handling pattern is consistently used across related components.

✅ Verification successful

Based on the search results, I can now make a definitive assessment of the error handling patterns across the codebase:

Error handling is consistent with the reviewed code

The codebase shows a consistent pattern of error handling using the Fox monad, where:

  1. The withErrorLogging method in DatasetErrorLoggingService follows the established pattern of:

    • Rate-limited logging through shouldLog
    • Proper error context with dataSourceId and operation label
    • Handling of different error cases (empty data, internal errors, exceptions)
    • Consistent use of Fox.failure for error cases
  2. The pattern is consistently used across the codebase as evidenced by:

    • Similar error handling in other services (DataSourceService, UploadService, etc.)
    • Consistent use of Fox.failure for error cases with proper context
    • Rate-limited logging in places where repeated errors could occur
  3. The implementation properly integrates with the application's health monitoring through applicationHealthService.pushError

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for potential inconsistent error handling patterns

# Look for direct error logging without rate limiting
rg "logger\.(error|warn)" --type scala

# Look for other Fox[Array[Byte]] return types that might need similar error handling
ast-grep --pattern 'def $FUNC_NAME($$$): Fox[Array[Byte]]'

# Look for potential places where withErrorLogging should be used
rg "Fox\.(failure|successful)" --type scala

Length of output: 75542

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (2)

13-13: LGTM: Import changes align with error handling approach

The import changes appropriately support the new error handling mechanism using Fox and Box types.


Line range hint 57-106: Verify comprehensive error logging coverage

The implementation looks good but let's verify that all error cases are properly logged.

✅ Verification successful

Based on the search results, I can see that error logging is comprehensively implemented in the codebase:

  1. The DatasetErrorLoggingService is properly injected and used in BinaryDataService for both conversion and bucket loading operations.
  2. The service has proper error handling with different cases (internal errors, exceptions, empty failures) and includes stack traces when needed.
  3. The service implements rate limiting to prevent log spam (via errorCountThresholdPerDataset).
  4. Both conversion and bucket loading operations are wrapped with error logging through withErrorLogging method.

Error logging coverage is complete and properly implemented

The implementation successfully logs errors for:

  • Data conversion operations via convertIfNecessary
  • Bucket loading operations via handleBucketRequest
  • Includes proper context (dataset ID, operation type)
  • Handles different error types appropriately
  • Implements rate limiting to prevent log spam
  • Provides stack traces for debugging
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential error cases that might not be logged

# Look for Fox.failure calls that might need logging
rg "Fox\.failure" --type scala

# Look for error-related terms in method calls that might need logging
rg "(?i)(error|exception|fail|throw)" --type scala -g '!test/**'

# Look for try/catch blocks that might need logging
ast-grep --pattern 'try {
  $$$
} catch {
  $$$
}'

Length of output: 104665

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1)

91-91: Verify error logging capabilities after removing applicationHealthService.

The removal of applicationHealthService from BinaryDataService initialization aligns with the PR's objective to improve error logging. However, we should verify that error logging is properly handled through the new centralized logging mechanism.

Consider documenting the new error logging flow in the class documentation to help maintainers understand where and how errors are now being logged.

✅ Verification successful

Error logging is properly handled through DatasetErrorLoggingService

The verification shows that error logging has been properly migrated to a centralized DatasetErrorLoggingService:

  • BinaryDataService now accepts an optional datasetErrorLoggingService parameter and uses it for error logging in critical operations:
    • During data conversion: withErrorLogging(request.dataSource.id, "Conversion", ...)
    • During bucket loading: withErrorLogging(request.dataSource.id, "Bucket loading for layer...", ...)
  • The service is properly integrated through BinaryDataServiceHolder which provides the necessary dependencies
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if error logging is properly handled in the new implementation.

# Test 1: Search for error logging implementations in BinaryDataService
ast-grep --pattern 'class BinaryDataService {
  $$$
  def handleDataRequests($$$) {
    $$$
    error($$$)
    $$$
  }
  $$$
}'

# Test 2: Search for the new error logging mechanism
rg -A 5 'datasetErrorLoggingService'

Length of output: 6086

@frcroth frcroth requested a review from fm3 November 25, 2024 10:01
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, works for me :) I just added some small suggestions for smoother grammar in the logging output.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (2)

50-50: Add method documentation

Consider adding ScalaDoc to document the method's purpose, parameters, and behavior, especially regarding error handling and rate limiting. This will help other developers understand when and how to use this method.

Example documentation:

/** Executes the given operation with error logging and rate limiting.
  *
  * @param dataSourceId The identifier of the data source being processed
  * @param label Description of the operation for logging purposes
  * @param resultFox The operation to execute and monitor
  * @return A Fox containing either the successful result or a failure with appropriate error logging
  */

53-53: Extract zero-length check to a constant

Consider extracting the zero-length array check to a named constant for better maintainability and clarity of intent.

+ private val ValidDataMinLength = 1
  def withErrorLogging(dataSourceId: DataSourceId, label: String, resultFox: Fox[Array[Byte]]): Fox[Array[Byte]] =
    resultFox.futureBox.flatMap {
      case Full(data) =>
-       if (data.length == 0) {
+       if (data.length < ValidDataMinLength) {
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (3)

57-62: LGTM: Improved error logging with graceful fallback

The implementation properly integrates error logging while maintaining backward compatibility when the service is unavailable.

Consider extracting the error message to a constant for reusability:

+  private val ConvertingBucketDataMessage = "converting bucket data"
   def convertIfNecessary(...): Fox[Array[Byte]] =
     if (isNecessary) datasetErrorLoggingService match {
-      case Some(value) => value.withErrorLogging(request.dataSource.id, "converting bucket data", conversionFunc(inputArray))
+      case Some(value) => value.withErrorLogging(request.dataSource.id, ConvertingBucketDataMessage, conversionFunc(inputArray))

69-78: Enhance error message for agglomerate mapping failure

The error handling chain is well-structured, but the failure message could be more descriptive.

Consider enhancing the error message:

-          }.fillEmpty(Fox.successful(data)) ?~> "Failed to apply agglomerate mapping"
+          }.fillEmpty(Fox.successful(data)) ?~> s"Failed to apply agglomerate mapping for layer ${request.dataLayer.name}"

99-106: LGTM: Comprehensive error logging for bucket requests

The implementation provides detailed context for bucket loading errors, which directly addresses the PR objectives.

Consider extracting the error message formatting to a method for consistency and reusability:

+  private def formatBucketLoadingMessage(request: DataServiceDataRequest, readInstruction: DataReadInstruction): String =
+    s"loading bucket for layer ${request.dataLayer.name} at ${readInstruction.bucket}, cuboid: ${request.cuboid}"

   datasetErrorLoggingService match {
     case Some(d) =>
       d.withErrorLogging(
         request.dataSource.id,
-        s"loading bucket for layer ${request.dataLayer.name} at ${readInstruction.bucket}, cuboid: ${request.cuboid}",
+        formatBucketLoadingMessage(request, readInstruction),
         bucketProvider.load(readInstruction)
       )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1d11d3d and 921ec9c.

📒 Files selected for processing (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (3 hunks)
🧰 Additional context used
📓 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8202
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala:68-72
Timestamp: 2024-11-25T10:02:03.702Z
Learning: In `DatasetErrorLoggingService.scala`, prefer using `TextUtils.stackTraceAsString(exception)` when logging exceptions instead of passing the exception directly to `logger.error`.
🔇 Additional comments (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (3)

5-10: LGTM: Imports are well-organized and necessary

All new imports are utilized in the implementation and properly organized.


19-19: LGTM: ApplicationHealthService injection is appropriate

The addition of applicationHealthService aligns with the PR discussion about simplifying error reporting and centralizing health monitoring.


50-80: LGTM: Comprehensive error handling implementation

The implementation successfully addresses the PR objectives with:

  • Proper error categorization and handling
  • Informative error messages with stack traces
  • Rate-limited logging to prevent spam
  • Integration with health monitoring for internal errors
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (2)

13-13: LGTM: Import changes align with Fox transition

The removal of the Failure import while retaining Box is consistent with the transition to Fox for error handling.


Line range hint 91-92: Verify null dataSource handling

The code includes a comment about null dataSource for volume tracings. Let's verify this edge case.

✅ Verification successful

Based on the search results, I can see clear evidence about null dataSource handling in BinaryDataService.scala:

// dataSource is null and unused for volume tracings. Insert dummy DataSourceId (also unused in that case)
val dataSourceId = if (request.dataSource != null) request.dataSource.id else DataSourceId("", "")

This shows that null dataSource is an expected case for volume tracings and is properly handled by using a dummy DataSourceId when dataSource is null. The code has appropriate null checks and fallback behavior.

Null dataSource handling is properly implemented

The implementation correctly handles null dataSource for volume tracings by:

  • Explicitly checking for null using request.dataSource != null
  • Providing a dummy DataSourceId when null
  • Documenting the behavior in code comments
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for volume tracing related code to verify null dataSource handling
rg -l "volume.*trac" | xargs rg "dataSource"

# Search for tests related to volume tracings
fd -e scala -e java | xargs rg "test.*volume.*trac"

Length of output: 17901

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (3)

57-63: Enhance error message specificity in withErrorLogging

While the error logging integration is good, consider making the error message more specific by including the conversion type or operation being performed.

-          value.withErrorLogging(request.dataSource.id, "converting bucket data", conversionFunc(inputArray))
+          value.withErrorLogging(
+            request.dataSource.id,
+            s"converting bucket data for layer ${request.dataLayer.name} (${if (request.settings.halfByte) "halfByte" else "regular"})",
+            conversionFunc(inputArray)
+          )

70-79: Improve error message for agglomerate mapping failure

The error message for agglomerate mapping failure could be more descriptive to aid in debugging.

-          }.fillEmpty(Fox.successful(data)) ?~> "Failed to apply agglomerate mapping"
+          }.fillEmpty(Fox.successful(data)) ?~> s"Failed to apply agglomerate mapping for layer ${request.dataLayer.name} at mag ${request.cuboid.mag}"

Line range hint 57-107: Consider edge cases mentioned in PR description

The error logging improvements look good, but as mentioned in the PR description, common edge cases haven't been considered yet. Consider handling:

  1. Network timeouts during bucket loading
  2. Corrupt data during conversion
  3. Memory constraints during large data operations

Would you like help in implementing these edge case handlers?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 921ec9c and 49226a9.

📒 Files selected for processing (2)
  • CHANGELOG.unreleased.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.unreleased.md
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (2)

13-13: LGTM: Import changes align with error handling improvements

The shift from Failure to Box/Full imports reflects the centralization of error handling in DatasetErrorLoggingService.


100-107: Verify that this addresses issue #8081

The error logging integration looks good and provides detailed context. Let's verify that this change properly logs errors occurring after the initial request handling.

✅ Verification successful

Based on the implementation of DatasetErrorLoggingService and its integration in BinaryDataService, I can now generate the final response:

The error logging implementation properly addresses error handling concerns

The changes properly handle error logging by:

  • Capturing all types of failures (empty arrays, internal errors, and general exceptions)
  • Including detailed context in error messages (layer name, bucket position, cuboid info)
  • Implementing error rate limiting (max 5 errors per dataset with 60-minute reset)
  • Maintaining proper error propagation through the Fox monad

The implementation ensures that errors occurring during bucket loading are:

  • Properly logged with context
  • Rate-limited to prevent log spam
  • Tracked for application health monitoring
  • Propagated correctly to the caller
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that errors are properly logged after initial request handling

# Search for all error logging calls in the codebase
rg "withErrorLogging.*loading bucket" -A 5

# Search for any remaining direct error logging that might bypass the new mechanism
rg "logger\.(error|warn).*bucket" -A 5

Length of output: 85


Script:

#!/bin/bash
# Let's try a broader search to understand the error logging implementation

# Search for withErrorLogging implementation and usage
ast-grep --pattern 'def withErrorLogging'

# Search for BinaryDataService class implementation
ast-grep --pattern 'class BinaryDataService'

# Search for related error handling patterns
rg "datasetErrorLoggingService" -A 3

# Look for issue reference in git history
git log --grep="#8081" --oneline

Length of output: 29907


Script:

#!/bin/bash
# Let's search for issue #8081 and related error logging implementation

# Search for DatasetErrorLoggingService implementation
ast-grep --pattern 'class DatasetErrorLoggingService'

# Search for withErrorLogging method implementation
ast-grep --pattern 'def withErrorLogging'

# Look for any references to issue #8081 in commit messages or code comments
rg "#8081"

# Search for error logging related tests
rg "DatasetErrorLoggingService.*Test" -A 5

Length of output: 10261

@frcroth frcroth merged commit 5d3d66d into master Nov 27, 2024
3 checks passed
@frcroth frcroth deleted the log-all-data-loading-errors branch November 27, 2024 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

log all data loading errors
2 participants