Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve (unified) GC error messages #8517

Open
yonipeleg33 opened this issue Jan 20, 2025 · 1 comment
Open

Improve (unified) GC error messages #8517

yonipeleg33 opened this issue Jan 20, 2025 · 1 comment
Labels
good first issue Good for newcomers

Comments

@yonipeleg33
Copy link
Contributor

We recently stumbled upon an error:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 88 in stage 7.0 failed 4 times, most recent failure: Lost task 88.3 in stage 7.0 (TID 42180) ([2a05:d018:179b:7f01:d8e3:50aa:4d29:3899] executor 121): io.treeverse.jpebble.BadFileFormatException: Bad magic 37 66 30 31 22 0a 7d 0a: wrong bytes

This error is missing an important piece of information: the file name. Knowing this would've helped us identify this issue faster.

[@arielshaqed please add more detail here]

@arielshaqed
Copy link
Contributor

Why this is not trivial

Fun fact: this Spark code runs on the executors. But the Spark driver reports exceptions. Make sure to add the information to the exception that the driver shows -- this is what is easiest to see in services such as EMR.

Workarounds

  • Read logs from the executors, not only from the driver.
  • The "bad magic bytes" message includes the last 8 bytes. If you can decode these, you may be able to guess what kind of file it is. (This works for formats that use trailers, but sometimes also for ASCII-based formats such as CSV and JSON!)

@itaiad200 itaiad200 added the good first issue Good for newcomers label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants