Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage-collection problem #238

Open
buchmann-dhi opened this issue Dec 2, 2024 · 10 comments
Open

Garbage-collection problem #238

buchmann-dhi opened this issue Dec 2, 2024 · 10 comments
Labels

Comments

@buchmann-dhi
Copy link

There seems to be some kind of problem with the garbage collection / SystemExit of copernicusmarine toolbox.
I have tested with different versions - including 1.3.4 - and I have now a minimal program to show the issue.
The effect is that when I call "exit(0)" after retrieving a subset (see attached py script), then windows OS reports exit code -1073741819 (translates to uint 0xC0000005).

If I change "exit(0)" to "os._exit(0)" then all is well, except that it bypasses any and all garbage collection (done in copernicusmarine).

Note that I have seen this funky exit code when running the CLI tool, and I have already reported it to the operators.

EXAMPLE CODE:

import copernicusmarine
from datetime import datetime

ofil= copernicusmarine.subset(
            # Subset args:    "minimum_latitude": -13.0,
            minimum_longitude=-13,
            maximum_longitude=5.0,
            minimum_latitude=47.0,
            maximum_latitude=64.0,
            start_datetime=datetime.strptime('2024120100', '%Y%m%d%H'),
            end_datetime=datetime.strptime('2024120300', '%Y%m%d%H'),
            minimum_depth=0.49,
            maximum_depth=5727.92,
            # Dataset args:
            dataset_id='cmems_mod_glo_phy_anfc_0.083deg_P1D-m',
            variables=['zos'],
            output_filename='zos.nc',
            # Forcing and other args:
            force_download = True,
            overwrite_output_data = True,
            disable_progress_bar = True,
            # Cache args.
            no_metadata_cache = True,
            # Credentials
            credentials_file = 'copernicusmarine-credentials.txt',
        )
print("Download completed")
exit(0)

Here is the output from a windows CMD using a copernicusmarine v 1.3.4 virtual environment:

C:\SCRATCH>C:\PYVENV\cmems134\Scripts\python.exe "copernicus-subset-minimal.py"
WARNING - 2024-12-02T12:44:59Z - Deprecation warning for option 'no_metadata_cache'. This option will no longer be available in copernicusmarine>=2.0.0. Please refer to the documentation when the new major version is released for more information.
INFO - 2024-12-02T12:45:11Z - Dataset version was not specified, the latest one was selected: "202406"
INFO - 2024-12-02T12:45:11Z - Dataset part was not specified, the first one was selected: "default"
INFO - 2024-12-02T12:45:13Z - Service was not specified, the default one was selected: "arco-geo-series"
INFO - 2024-12-02T12:45:15Z - Downloading using service arco-geo-series...
INFO - 2024-12-02T12:45:17Z - Estimated size of the dataset file is 0.509 MB.
INFO - 2024-12-02T12:45:17Z - Writing to local storage. Please wait...
INFO - 2024-12-02T12:45:22Z - Successfully downloaded to zos.nc
Download completed

C:\SCRATCH>echo %errorlevel%
-1073741819
@buchmann-dhi
Copy link
Author

BTW: This is a problem because it makes it impossible to differentiate errors from non-errors.

@alcoat
Copy link

alcoat commented Dec 2, 2024

Hi, did you try "sys.exit(0)" ?

@buchmann-dhi
Copy link
Author

buchmann-dhi commented Dec 3, 2024

Hi, did you try "sys.exit(0)" ?

I did now. Same result as for exit(0) directly. In fact, I believe that with "import sys" exit() really is sys.exit().
To summarize (this is on windows cmd, using echo %errorlevel% to capture the actual exit status:

exit(0)     # Nonzero exit code -1073741819
sys.exit(0) # Nonzero exit code -1073741819
os._exit(0) # Zero exit code 0

Note that while exit() and sys.exit() in general allows a graceful termination of the program by

  • Raises SystemExit
  • Executes atexit handlers
  • Flushes buffers

os._exit() does none of those things - it just makes an immediate termination, and could leave info (eg buffers) hanging.

@uriii3
Copy link
Collaborator

uriii3 commented Dec 3, 2024

We will check it! Thanks!

@uriii3
Copy link
Collaborator

uriii3 commented Dec 5, 2024

Hello again @buchmann-dhi ,

I've tried to reproduce the error in one of our Windows OS VMs, but I wasn't able to get the same result.

In my case, I tried both the pre-release (2.0.0.a4) and the latest v1.3.5, both worked as expected. I'm using Python 3.12.8 in a Windows 11.

Could you specify also these information (python version and windows OS) so I can try to reproduce the error?

This would be the updated code to be able to run it on v2.0.0.a4 (and when we release too):

import copernicusmarine
from datetime import datetime

ofil= copernicusmarine.subset(
            # Subset args:
            minimum_longitude=-13,
            maximum_longitude=5.0,
            minimum_latitude=47.0,
            maximum_latitude=64.0,
            start_datetime=datetime.strptime('2024120100', '%Y%m%d%H'),
            end_datetime=datetime.strptime('2024120300', '%Y%m%d%H'),
            minimum_depth=0.49,
            maximum_depth=5727.92,
            # Dataset args:
            dataset_id='cmems_mod_glo_phy_anfc_0.083deg_P1D-m',
            variables=['zos'],
            output_filename='zos.nc',
            overwrite = True,
            # Forcing and other args:
            disable_progress_bar = True,
        )
print("Download completed")
exit(0)

@buchmann-dhi
Copy link
Author

buchmann-dhi commented Dec 6, 2024

Hi @uriii3

The present system here is Windows 11, and as I stated tested on 1.3.4. Python version is 3.11.7, but really that aught not to matter. This particular virtual environment has only installed whatever packages was necessary to install copernicus (in addition to a few select base packages). Pip freeze output will be submitted:

C:\Users\BJB>C:\PYVENV\cmems134\Scripts\python.exe -m pip freeze

cmems134-freeze-bjb.txt

I have seen the same problematic exit code on the CLI-version on several installations - spanning several versions of the Copernicus marine 1.x - varying across different VM installations. It does seem to be inconsistent in that it may give a zero exit code on some installations and non-zero on others. I had hoped that I found why or at least a consistent/reproducible error case, but apparently not so.

Please note that our use case is operational forecasting, so we will not be installing anything even remotely close to bleeding edge. Stability is king. Thus, we will not move to your v2 for the first few months after its release. In fact, I am presently considering downgrading to a version that does not write obsoletion-messages to STDERR, as it is messing with our production.

@buchmann-dhi
Copy link
Author

Hi @uriii3

Just an update. Made a clean-install of 1.3.5 directly from native py 3.11.7 venv. Same result. It is still quite possible that there are system-depending stuff going on, so I cannot say for sure that this is reproducible. But it is as "clean" as can be made on my dev PC.

Log below (with some output skipped):

C:\SCRATCH>C:\Users\BJB\AppData\Local\Programs\Python\Python311\python.exe -m venv C:\PYVENV\cmems135

C:\SCRATCH>C:\PYVENV\cmems135\Scripts\python.exe -m pip install --upgrade pip
<snip>
Successfully installed pip-24.3.1

C:\SCRATCH>C:\PYVENV\cmems135\Scripts\python.exe -m pip install copernicusmarine==1.3.5
Collecting copernicusmarine==1.3.5
  Downloading copernicusmarine-1.3.5-py3-none-any.whl.metadata (34 kB)
<snip>
Successfully installed aiohttp-3.9.5 aiosignal-1.3.1 asciitree-0.3.3 attrs-24.2.0 boto3-1.35.76 botocore-1.35.76 cachier-3.1.2 certifi-2024.8.30 cftime-1.6.4.post1 charset-normalizer-3.4.0 click-8.1.7 cloudpickle-3.1.0 colorama-0.4.6 copernicusmarine-1.3.5 dask-2024.12.0 fasteners-0.19 frozenlist-1.5.0 fsspec-2024.10.0 idna-3.10 importlib_metadata-8.5.0 jmespath-1.0.1 locket-1.0.0 lxml-5.3.0 multidict-6.1.0 nest-asyncio-1.6.0 netCDF4-1.7.2 numcodecs-0.14.1 numpy-1.26.4 packaging-24.2 pandas-2.2.3 partd-1.4.2 portalocker-3.0.0 propcache-0.2.1 pystac-1.11.0 python-dateutil-2.9.0.post0 pytz-2024.2 pywin32-308 pyyaml-6.0.2 requests-2.32.3 s3transfer-0.10.4 semver-3.0.2 setuptools-75.6.0 six-1.17.0 toolz-1.0.0 tqdm-4.67.1 tzdata-2024.2 urllib3-2.2.3 watchdog-6.0.0 xarray-2024.11.0 yarl-1.18.3 zarr-2.18.3 zipp-3.21.0

C:\SCRATCH>C:\PYVENV\cmems135\Scripts\copernicusmarine.exe --version
copernicusmarine, version 1.3.5

C:\SCRATCH>C:\PYVENV\cmems135\Scripts\python.exe "C:\Users\BJB\source\copernicus-model\copernicus-subset-minimal.py"
WARNING - 2024-12-06T09:48:01Z - Deprecation warning for option 'no_metadata_cache'. This option will no longer be available in copernicusmarine>=2.0.0. Please refer to the documentation when the new major version is released for more information.
WARNING - 2024-12-06T09:48:01Z - Deprecation warning for option 'overwrite_output_data'. This option will no longer be available in copernicusmarine>=2.0.0. Please refer to the documentation when the new major version is released for more information.
WARNING - 2024-12-06T09:48:01Z - Deprecation warning for option 'force_download'. This option will no longer be available in copernicusmarine>=2.0.0. Please refer to the documentation when the new major version is released for more information.
INFO - 2024-12-06T09:48:16Z - Dataset version was not specified, the latest one was selected: "202406"
INFO - 2024-12-06T09:48:16Z - Dataset part was not specified, the first one was selected: "default"
INFO - 2024-12-06T09:48:19Z - Service was not specified, the default one was selected: "arco-geo-series"
INFO - 2024-12-06T09:48:20Z - Downloading using service arco-geo-series...
INFO - 2024-12-06T09:48:22Z - Estimated size of the dataset file is 0.509 MB.
INFO - 2024-12-06T09:48:22Z - Writing to local storage. Please wait...
INFO - 2024-12-06T09:48:27Z - Successfully downloaded to zos.nc
Download completed

c:\SCRATCH>echo %errorlevel%
-1073741819

I dont know exactly what goes on, but copernicusmarine has a hefty dependency track, which also means that I/we will not reuse copernicus marine venvs for other jobs/scripts.

@uriii3
Copy link
Collaborator

uriii3 commented Dec 10, 2024

Hello @buchmann-dhi ,

We were able to reproduce the error with v1 but it looks like in v2 it is not reproducible, so it looks like it is fixed on v2. We suspect a problem with the cachier dependency. This will probably not be fixed on the v1 though.

We understand about the operational uses and you can move to v2 when you find adequate. However, could you tell us if you can reproduce this error with the prerelease 2.0.0a4?

@buchmann-dhi
Copy link
Author

Dear @uriii3 / Oriol,

I have now tested with a "clean" version of 2.0.0a4, and I did not get the same error on exit status (for this particular setup).
However, as I have seen this error on and off over various versions and machines, it comforts me only part-way:

We suspect a problem with the cachier dependency.

OK. But since the bug has not actually been tracked down, it could perceivably be a coincidence that I/we do not see the problem in this particular case.
You also tested on a system with 1.3.5 and failed to see the problem (just saying), so we should agree that this bug is problematic (inconsistent).
This said in the utmost respect: I do hope that the bug is gone.

This will probably not be fixed on the v1 though.

While I understand your priorities, this is a problem that I have tried to escalate for quite some time (through the mail support). I should like it resolved at least in a way, so that we (you, me, both) are confident that it does not persist into v2.0.x. I reported this issue as far back as v1.0.0 (2024-02-20) with the CLI-version, and then we (me, this end) have been plagued with it on and off through the various versions inconsistently across machines.

you can move to v2 when you find adequate

Roger that. If you leave v1 burning (no fix), then it could potentially increase my willingness to switch quickly (although not my happiness with the service as a whole). For operational forecasting/downstream services, I should really like a stable version of v1 before we make a controlled switch - with breaking changes - to a stable version 2.x.

I hope this makes sense from a user-land perspective at least.

Best,
/Bjarne

@renaudjester
Copy link
Collaborator

Hi @buchmann-dhi,

As far as we could reproduce the bug, it seems to be that it comes from cachier library, which is not in v2.
As long as the bug cannot be reproduced with v2 (with the same actions that reproduced it with v1), we can consider that it's solved in v2.

Let's see if we want to solve this problem for the v1. This decision will probably be taken after the release of the v2.0.0. I, personally, am not able to reproduce this problem (not using windows and our windows VM does not have this problem).

Have great winter holidays :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants