Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Windows session clean-up fails to delete session folder #410

Open
epmog opened this issue Sep 4, 2024 · 3 comments
Open

Bug: Windows session clean-up fails to delete session folder #410

epmog opened this issue Sep 4, 2024 · 3 comments
Labels
bug Something isn't working needs triage A new issue that needs a first look

Comments

@epmog
Copy link
Contributor

epmog commented Sep 4, 2024

Describe Behaviour

Worker session cleanup encounters an error on windows when trying to delete the session directory

Expected Behaviour

The worker session clean-up should complete with no error on removing folders. Or if it's unable to, it should be clear as to why it can't clean up those files.

Current Behaviour

INFO ==============================================
INFO --------- Session Cleanup
INFO ==============================================
INFO Deleting working directory: C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p
INFO Running command powershell -Command Remove-Item -Recurse -Force "C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\embedded_filespx71o14w, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmp0m__4qks.json, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmp18ls93eq.json, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmp1mc0r0wh.json, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmp5fmt4tnm.json, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmp7lekhm28.json, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmp7s2752oy.json, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmp7yd2dznk.json, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmp88j9pq7r.json, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmphpdg2flf.json, C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p\tmptanj6h6p.json"
INFO Process failed to start: [WinError -2147024809] The parameter is incorrect.
ERROR Files within temporary directory C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p could not be deleted.
C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p
Traceback (most recent call last):
  File "C:\Program Files\Python312\Lib\site-packages\openjd\sessions\_session.py", line 438, in cleanup
    self._working_dir.cleanup()
  File "C:\Program Files\Python312\Lib\site-packages\openjd\sessions\_tempdir.py", line 151, in cleanup
    raise RuntimeError(
RuntimeError: Files within temporary directory C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p could not be deleted.
C:\ProgramData\Amazon\OpenJD\session-5a318d14c9f246b099284f1904b7f8f7lie11m7p

Reproduction Steps

  1. Launch a fleet with a Windows CMF worker
  2. Submit a job to a queue used by that fleet that contains embedded files/folders
  3. View session log and observe error

Possible Solution

At first we thought it was due to the way the paths were specified as a command separated string, but that command similar commands work locally. It's also does not appear to be a command-line length issue either. If the user doesn't have permissions to clean-up this folder (shouldn't be possible in normal operations), then you DO receive an Access to the path is denied. from the command.

If I had to guess, there might be something around files in the session directory still being open?

Exact version of the worker is unknown.

Package Version

0.27.X?

Language Version

Python 3.12

Dependencies

No response

Operating System

Windows

Other information

No response

@epmog epmog added bug Something isn't working needs triage A new issue that needs a first look labels Sep 4, 2024
@RaiaN
Copy link

RaiaN commented Feb 6, 2025

This error happens when I cancel jobs via Deadline Monitor @epmog

Any workaround?

@ttblanchard
Copy link
Contributor

Hi there, we aren't aware of a workaround at this moment. To help us understand this issue future could you provide details of where you are encountering this, ie. Service Managed Fleet or Customer Managed? Are you using one of our first party integrations, eg. maya-openjd or others? Could you share the versions of the worker-agent and any other software versions on the Worker that would help understand this particular case?

Also what happens after you get the error?

Thank you!

@epmog
Copy link
Contributor Author

epmog commented Feb 6, 2025

From what I remember, this was more benign than anything and didn't affect the workloads. I just happened to see it in the worker agent logs and I was wondering if something was going wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage A new issue that needs a first look
Projects
None yet
Development

No branches or pull requests

3 participants