-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupted status files on chrysalis (NING bug) #242
Comments
Also tagging @xylar out of pure curiosity. Have you ever seen anything like that? |
Also tagging @rljacob. Do you know what was done during the Chrysalis maintenance last Monday? Any upgrades to bash or the file system? In this particular case, the bug is mostly harmless. But if it is a symptom of a larger issue, it could be quite serious (file integrity). |
I would say this is almost certainly a race condition. This call: zppy/zppy/templates/mpas_analysis.bash Line 23 in 335b33a
most likely happens at the same time as this call: zppy/zppy/templates/mpas_analysis.bash Line 388 in 335b33a
But I can't immediately see why that would happen. |
I don't have much experience with bash redirects like this. I use python almost exclusively for logging. |
Thanks, Xylar. In most cases, the first and last updates to status files should be many minutes apart. |
Adding @amametjanov who might know more about what was done during maintenance. |
Based on |
There was an update to the GPFS software. You should send details about file corruption to [email protected]. |
I agree a readable string at the end of a file doesn't look like corruption. File corruption usually results in unprintable characters when you try to more/cat/edit the file. |
@golaz and I noticed the issue in different branches (#227 and #237) and hadn't seen it before. That leads me to believe it's a machine issue. That said, I can try running the last release to see if the error still occurs. If the last release still works fine, I can run |
@golaz @rljacob I re-ran
|
@amametjanov Did Chrysalis upgrade the |
I'm also not seeing this bug when running on Compy |
Please see Slack channel chrysalis-users about OS upgrade around April 28: from CentOS 8 to RHEL 8.5 https://acmeclimate.slack.com/archives/C01ER9J9TEJ/p1651178895618699
|
The compute nodes still have the old image. Try running zppy in an interactive session. Bash is a little different: New: |
This is likely a new issue that appeared after the monthly maintenance on Chrysalis this past Monday.
For certain zppy tasks (mpas_analysis, tc_analysis), the status files upon successful completion of a task are corrupt. The content of the status file should simply be
but instead, it looks something like
The bash line of code that updates the file looks like
I don't understand how the status file can become corrupt and I have not been able to reproduce in a simple test (but the full task will repeatedly and consistently produce the erroneous status).
A simple workaround that appears effective is to first delete the file:
The text was updated successfully, but these errors were encountered: