You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While looking at the logs of a portal-backend pod in the midst of spinning up, I see the following stack trace:
[2024-12-06 17:36:39 +0000] [40] [ERROR] Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 693, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/usr/local/lib/python3.9/contextlib.py", line 181, in __aenter__
return await self.gen.__anext__()
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.9/contextlib.py", line 181, in __aenter__
return await self.gen.__anext__()
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.9/contextlib.py", line 181, in __aenter__
2024-12-06T17:36:39.975833334Z return await self.gen.__anext__()
File "/app/nmdc_server/app.py", line 45, in lifespan
generate_and_mount_static_files()
File "/app/nmdc_server/app.py", line 40, in generate_and_mount_static_files
2024-12-06T17:36:39.975869755Z generate_submission_schema_files(directory=static_path)
File "/app/nmdc_server/static_files.py", line 43, in generate_submission_schema_files
shutil.copyfile(str(gold_tree_path), out_dir / "GoldEcosystemTree.json")
File "/usr/local/lib/python3.9/shutil.py", line 266, in copyfile
with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: 'static/submission_schema/GoldEcosystemTree.json'
This is also getting caught by sentry.
I believe that what's happening is several workers are all trying to generate and mount the static files into a directory, but one of the steps taken to do so is to delete the existing directory. This would lead to a race condition between the workers:
Worker A starts by removing the existing static files directory
Worker A creates a new static files directory
Worker B starts by removing the existing static files directory
Worker A tries to add a file to the not-yet-created static files directory
A better approach might be to move the creating of the static directory and static submission schema files into a CLI subcommand, and call that in prestart.sh, this way the static mounting only happens once.
The text was updated successfully, but these errors were encountered:
While looking at the logs of a
portal-backend
pod in the midst of spinning up, I see the following stack trace:This is also getting caught by sentry.
I believe that what's happening is several workers are all trying to generate and mount the static files into a directory, but one of the steps taken to do so is to delete the existing directory. This would lead to a race condition between the workers:
The functions in question are located in nmdc_server/static_files.py, and get called in nmdc_server/app.py::create_app.
One quick fix might be to enable creating parent directories when creating the submission schema directory, i.e.
becomes
A better approach might be to move the creating of the static directory and static submission schema files into a CLI subcommand, and call that in
prestart.sh
, this way the static mounting only happens once.The text was updated successfully, but these errors were encountered: