-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve logs for spark task failure #106
Comments
Real stage failures are equally useless in the logs:
The part about org.openeo.geotrellis.geotiff.package.saveRDD is not really necessary. The real error is hidden deeper in the stack:
A spark listener could also be an option? Or can we walk down the exception stack and print the error of the root cause? |
FYI I removed
|
The
|
@EmileSonneveld This is the method that improves exceptions for synchronous calls: I suggest reusing it for batch jobs, and maybe extending if necessary. |
Here's another example that we could deal with better: |
done |
openEO now returns logs like (j-dc9f81221c3f4b9cb3bb46bc97832213):
{
"id": "[1673357256568, 6457]",
"time": "2023-01-10T13:27:36.568Z",
"level": "error",
"message": "Exception in task 0.0 in stage 45.0 (TID 503)"
},
This is a task failure, not a stage failure, and the final job was succesfull.
The real error was:
Error while reading GridBounds(0,0,1023,1023) from: https://services.terrascope.be/download/WORLDCOVER/ESA_WORLDCOVER_10M_2021_V200/MAP/ESA_WorldCover_10m_2021_v200_N39W006_Map/ESA_WorldCover_10m_2021_v200_N39W006_Map.tif
Can we improve this to log the real message, and perhaps reduce task failures to 'warning' level? Users tend to think that an error is always a hard failure.
Note that the useful message is in ES, under exc_info, maybe we can somehow append that to the message, or the first line?
The text was updated successfully, but these errors were encountered: