Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_netcdf failure in echopype_tour notebook under echopype-examples #789

Closed
erikatbeof opened this issue Aug 18, 2022 · 17 comments · Fixed by #843
Closed

to_netcdf failure in echopype_tour notebook under echopype-examples #789

erikatbeof opened this issue Aug 18, 2022 · 17 comments · Fixed by #843
Assignees
Milestone

Comments

@erikatbeof
Copy link

erikatbeof commented Aug 18, 2022

Hi,
I running the echopype_tour example in the echopype-examples repo.

At panel 9 and line

ed.to_netcdf(save_path=converted_dpath, overwrite=True)

I get:

08:35:29 parsing file Summer2017-D20170728-T181619.raw, time of first ping: 2017-Jul-28 18:16:19 08:36:38 overwriting exports/Summer2017-D20170728-T181619.nc Failed to process raw file Summer2017-D20170728-T181619.raw: NetCDF: Filter error: bad id or parameters or duplicate filter ...

I'm running echopype-0.6.0.

I'm I doing something wrong or is this a bug?

/Erik

@leewujung
Copy link
Member

@erikatbeof : please upgrade to >=v0.6.1 and you should be fine.

@erikatbeof
Copy link
Author

@leewujung thanks for quick reply! I tried v0.6.2 and the issue persist.

@leewujung
Copy link
Member

leewujung commented Aug 18, 2022

Oh interesting. From the message it seems that it could be due to something mismatched or duplicated in the file.

Can you share a small example file with us? For parsing issues we cannot debug without accessing the content. Thanks!

@erikatbeof
Copy link
Author

I'm just running the echopype_tour notebook example in the echopype-examples repo.

https://github.com/OSOceanAcoustics/echopype-examples/blob/main/notebooks/echopype_tour.ipynb

@leewujung
Copy link
Member

leewujung commented Aug 18, 2022

Ok that is unexpected! On which platform are you running this?

Are you able to run the other notebooks in that repo?

We are pretty tied up this week due to hosting a week-long workshop, so it may take some time before we can look into this.

@erikatbeof
Copy link
Author

Ubuntu 20.04.

@leewujung leewujung changed the title to_netcdf failure to_netcdf failure in echopype_tour notebook under echopype-examples Aug 18, 2022
@leewujung leewujung moved this to Todo in Echopype Aug 18, 2022
@leewujung leewujung added this to the 0.6.3 milestone Aug 18, 2022
@emiliom
Copy link
Collaborator

emiliom commented Aug 18, 2022

I'll try to replicate the problem by early next week. I have Ubuntu 21 and 22. In the meantime, could you tell us whether you installed echopype using conda or pip?

@erikatbeof
Copy link
Author

erikatbeof commented Aug 18, 2022

I installed echopype using pip.

The problem thought seems to be related to xarray.Dataset.to_netcdf having trouble compressing unicode strings.

I modified the encoder in ~/echopype/echopype/utils/io.py save_file function to:

encoding = dict()
    if compression_settings is not None:
        for k, v in ds.items():
            encoding[k] = compression_settings
            if v.dtype.kind == "U":
                encoding[k]["dtype"] = "S"

And that seems to solve the issue.

@emiliom
Copy link
Collaborator

emiliom commented Aug 18, 2022

Thanks. One more request: which xarray version was installed when you installed echopype?

@erikatbeof
Copy link
Author

I tried version 2022.3.0 and 2022.6.0.

@emiliom
Copy link
Collaborator

emiliom commented Aug 19, 2022

Thanks for all the helpful background information @erikatbeof. I'll get back to you in a couple of days with the results of my diagnostics.

@emiliom
Copy link
Collaborator

emiliom commented Aug 22, 2022

Hi @erikatbeof . I've tested running the echopype_tour.html notebook with a freshly created conda environment, and it worked w/o problems. This is on Ubuntu 22 (Pop!_OS 22.04, to be more precise; which is built on Ubuntu 22.04)

Here's exactly how I built the conda environment:

  • Use the provided conda environment file at https://github.com/OSOceanAcoustics/echopype-examples/blob/main/binder/environment.yml. Note: we mention this environment file in the example notebooks, but the url path listed in the notebooks is actually wrong! (note the use of echopype_paper rather than the correct main in the url path). We'll fix that incorrect url soon.
  • That environment file is pinned to echopype version 0.6.0. We pin the version to ensure that the current version of the examples works with the specified echopype version.
  • I first tried creating the environment with conda using this statement: conda env create -f binder/environment.yml (after cloning or downloading https://github.com/OSOceanAcoustics/echopype-examples. But it was taking too long and I ran out of patience, so I used mamba instead; same statement, just replace conda with mamba after installing mamba.

I don't think we've tried running these notebooks with echopype installed with pip. That seems to be the source of the problem you're seeing, though I don't know exactly why the problem is happening. We should run a test with a pip installation to help us anticipate problems such as the ones you're encountering, but we likely won't get around to it until next week or later.

I haven't tried running the notebook with echopype 0.6.1 or 0.6.2. We'll do that probably next week, as it's now time to update the notebooks to the latest echopype version.

@emiliom
Copy link
Collaborator

emiliom commented Sep 2, 2022

I've successfully run the notebook with a conda environment identical to the one I used before, except with echopype 0.6.2. Also created using mamba, as before.

@erikatbeof
Copy link
Author

I have not done much more on this lately. However I see you use Ubuntu version 22.04. I have only tried on 20.04 so fare. I will upgrade to 22.04 and let you know how it goes.

@emiliom
Copy link
Collaborator

emiliom commented Sep 2, 2022

I've run that notebook successfully with echopype 0.6.0 in Ubuntu 21 in the last 4 months (all with conda), in two different machines. I also ran it on earlier versions of echopype in Ubuntu 20, prior to that. I think others in the team have run it in other machines with different OS's. I really doubt it's an issue with the Ubuntu version.

Have you tried it on a conda environment? My best guesses at this point are that there is something going on with a pip installed echopype or there's something very quirky in your setup.

The error you're seeing ("NetCDF: Filter error: bad id or parameters or duplicate filter ... ") is generated by the netcdf library. The error is listed here. I think this suggests that your netcdf library installation may have issues, such as its unicode string handling may be unexpected (per your R&D). If so, that's the sort of problem that is minimized with conda.

You can try reducing the test to the bare minimum, something like this:

import echopype as ep

ed = ep.open_raw(
    f"s3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170728-T181619.raw",
    sonar_model='EK60',
    storage_options={'anon': True}
)

ed.to_netcdf(save_path=".", overwrite=True)

We'll try to run that minimal code with a pip installed echopype set up to see if we can reproduce the error.

@erikatbeof
Copy link
Author

Thank for the simple example. I now work with a clean Ubuntu 22.04 installation. Still failing in the same way. Again, it seems as if xarray.Dataset.to_netcdf having trouble compressing unicode strings.

The problem is traced down to the 'sentence_type' dataarray, with dtype="U".

This simple freestanding code illustrates the problem:

import numpy as np
import xarray as xr

x = np.array([1,2],dtype='U') #This results in RuntimeError.
#x = np.array([1,2],dtype='f') #This is working well.
da = xr.DataArray(x,name="x")

path = "output.nc"
encoding={'x': {'zlib': True, 'complevel': 4}}
da.to_netcdf(path=path, encoding=encoding)

Toggle between line 4 and 5.

Any ideas?

@emiliom
Copy link
Collaborator

emiliom commented Sep 5, 2022

Thanks for reporting your additional test with that simple code. That was very helpful. I've tested it using a Python environment where I installed echopype using pip install, and I was finally able to reproduce your error!

Here's how I created the environment that reproduced the error using your sample code, using pip install:

conda create -n echopype_pip -c conda-forge python=3.9 ipykernel
conda activate echopype_pip
pip install echopype

In contrast, when I created a simple environment using conda as follows, the same code ran without errors:

conda create -n echopype_conda -c conda-forge python=3.9 ipykernel echopype
conda activate echopype_conda

So, it's clear there is a problem with the netcdf dependency that's installed when using pip install. We'll need to look into it. In the meantime, please try to use conda instead.

I've created a new issue to help us track this problem, #801

@lsetiawan lsetiawan linked a pull request Oct 11, 2022 that will close this issue
Repository owner moved this from Todo to Done in Echopype Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants