Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you open older .BAG files? #90

Open
heathhenley opened this issue Mar 7, 2024 · 15 comments · Fixed by #104
Open

How do you open older .BAG files? #90

heathhenley opened this issue Mar 7, 2024 · 15 comments · Fixed by #104
Assignees
Labels
bug Something isn't working

Comments

@heathhenley
Copy link
Contributor

I'm looking to read in some surveys and I'm exploring using the your python bindings here to do it. I came across this xml parse error (Extra content at the end of the document) trying to read .BAGs from H12025 and W00426.

I'm assuming there's some error with my set up or understanding of how to use it, as I see the same behavior with the sample-1.5.0.bag in the repo's examples:

>>> import bagPy
>>> dataset = bagPy.Dataset.openDataset(r"C:\Users\heath\Desktop\BAGS\bag_repo_samples\sample.bag", bagPy.BAG_OPEN_READ_WRITE)
>>> dataset = bagPy.Dataset.openDataset(r"C:\Users\heath\Desktop\BAGS\bag_repo_samples\sample-2.0.1.bag", bagPy.BAG_OPEN_READ_WRITE)
>>> dataset = bagPy.Dataset.openDataset(r"C:\Users\heath\Desktop\BAGS\bag_repo_samples\sample-1.5.0.bag", bagPy.BAG_OPEN_READ_WRITE)
Entity: line 17: parser error : Extra content at the end of the document
erNote></gmd:MD_SecurityConstraints></gmd:metadataConstraints></gmi:MI_Metadata>
                                                                               ^
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\heath\miniconda3\envs\bag_test_env\Lib\site-packages\bagPy\__init__.py", line 9401, in openDataset
    return _bagPy.Dataset_openDataset(fileName, openMode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bagPy.ErrorLoadingMetadata

If anyone can help me understand what I've missed I would appreciate it.

I'm seeing the same error with the real data, but not failing on the same xml tag, for example:

>>> dataset = bagPy.Dataset.openDataset(r"C:\Users\heath\Desktop\BAGS\W00426_MB_4m_MLLW_1of3.bag", bagPy.BAG_OPEN_READ_WRITE)
Entity: line 29: parser error : Extra content at the end of the document

^
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\heath\miniconda3\envs\bag_test_env\Lib\site-packages\bagPy\__init__.py", line 9401, in openDataset
    return _bagPy.Dataset_openDataset(fileName, openMode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bagPy.ErrorLoadingMetadata

I am able to parse and manipulate both of those BAGs with the tools in https://github.com/hydroffice/hyo2_bag - so that's an option too, but this package just seemed to be more actively maintained and documented.

@selimnairb selimnairb self-assigned this Mar 8, 2024
@selimnairb selimnairb added the bug Something isn't working label Mar 8, 2024
@selimnairb
Copy link
Collaborator

Hi @heathhenley thanks for posting this. This looks like a bug. I'll have a look over the next few working days and get back to you.

@selimnairb
Copy link
Collaborator

@heathhenley I finally had a chance to look at this, sorry for the delay. Can you tell me what version of Python you are using? When I open examples/sample-data/sample-1.5.0.bag using Python 3.11 on my Mac, it opens fine. I'm away from the office right now, so I don't have my Windows machine to test on, but wanted to see if this was a Python version issue in the meantime. Thanks!

@heathhenley
Copy link
Contributor Author

heathhenley commented May 30, 2024

Thanks for looking into this! I'm still not convinced that I didn't just miss something, I don't usually use conda so I'm winging it there. I know at least in experience, windows is always weird with open source gis tools (gdal etc) too.

I must have torched my set up from last time, so I set up to try it again today on windows 11, here's what I did:

  • run conda install command: conda install conda-forge::bagpy
  • pip installed the test deps
  • tried to run the tests, to get them running:

I can run the tests, but I get 4 fails and an error, I haven't dug into them at all:

================================================================= short test summary info =================================================================
FAILED python/test_compat_gdal.py::TestCompatGDAL::test_gdal_create_simple - SystemError: _PyErr_SetObject: exception <class 'bagPy.MetadataNotFound'> is not a BaseException subclass
FAILED python/test_dataset.py::TestDataset::testGetLayerTypes - bagPy.ErrorLoadingMetadata
FAILED python/test_interleavedlegacylayer.py::TestInterleavedLegacyLayer::testGetLayerAndRead - bagPy.ErrorLoadingMetadata
FAILED python/test_simplelayer.py::TestSimpleLayer::testRead - bagPy.ErrorLoadingMetadata
ERROR python/test_compat_gdal.py::TestCompatGDAL::test_gdal_create_simple - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\heath\\AppData\\Local\\Temp\...
=================================================== 4 failed, 91 passed, 20 warnings, 1 error in 1.54s ====================================================

But three of them maybe related to the original problem I had. There was a problem in the geodjango /gdal tests on windows related to files not being allowed to be opened more than once on windows (without being closed), mac/linux doesn't care, maybe that's what's going on with the permission error, I didn't dig in.

To actually answer your question I'm using 3.12.2:

(base) c:\dev\BAG>python
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:42:31) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import bagPy
>>> dataset = bagPy.Dataset.openDataset(r"C:\dev\BAG\examples\sample-data\sample-2.0.1.bag", bagPy.BAG_OPEN_READ_WRITE)
>>> dataset = bagPy.Dataset.openDataset(r"C:\dev\BAG\examples\sample-data\sample-1.5.0.bag", bagPy.BAG_OPEN_READ_WRITE)
Entity: line 17: parser error : Extra content at the end of the document
erNote></gmd:MD_SecurityConstraints></gmd:metadataConstraints></gmi:MI_Metadata>
                                                                               ^
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\heath\miniconda3\Lib\site-packages\bagPy\__init__.py", line 9401, in openDataset
    return _bagPy.Dataset_openDataset(fileName, openMode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bagPy.ErrorLoadingMetadata

@stephen-patterson-noaa
Copy link

Greetings,
I was noticing the same issue as this open bug. I am using a Windows laptop and just trying to get familiar with the bagPy library by opening a file and I receive the following error: "Entity: line 33: parser error : Extra content at the end of the document"
Some stackoverflow articles mentioned that XML needs a parent tag and it will throw that error if missing one.

Here is the code I was running:
dataset = bagPy.Dataset.openDataset(bag_file_path, bagPy.BAG_OPEN_READ_WRITE)

I tried it with Python versions 3.8, 3.9, and 3.12. Seeing the same error with all of those conda environments.

@stephen-patterson-noaa
Copy link

Greetings, I was noticing the same issue as this open bug. I am using a Windows laptop and just trying to get familiar with the bagPy library by opening a file and I receive the following error: "Entity: line 33: parser error : Extra content at the end of the document" Some stackoverflow articles mentioned that XML needs a parent tag and it will throw that error if missing one.

Here is the code I was running: dataset = bagPy.Dataset.openDataset(bag_file_path, bagPy.BAG_OPEN_READ_WRITE)

I tried it with Python versions 3.8, 3.9, and 3.12. Seeing the same error with all of those conda environments.

I was able to open a 2022 BAG file, but nothing older than that as far as I know.

@giumas
Copy link
Member

giumas commented Jul 9, 2024

@selimnairb, I had a similar issue in Python using lxml to parse the xml in the BAG files.
Given that lxml is wrapping libxml2, it is likely the same change in behavior that was introduced with a recent version of the library.
The simple solution was to strip the retrieved string in order to remove the trailing characters.

@selimnairb
Copy link
Collaborator

selimnairb commented Jul 11, 2024

@giumas @stephen-patterson-noaa Can you confirm whether you are seeing this error with any of the sample BAGs in examples/sample-data?

@stephen-patterson-noaa
Copy link

I tried to open each of the sample BAG files in examples/sample-data and view their layers.
Only 2 files(bag_georefmetadata_layer.bag, sample-2.0.1.bag) opened and printed the layer names:

bag_georefmetadata_layer.bag
     Elevation
     Uncertainty
     Elevation
example_w_qc_layers.bag
    - bagPy.ErrorLoadingMetadata
metadata_layer_example.bag
    - OSError: [Errno 0] Error
nominal_only.bag
    - bagPy.ErrorLoadingMetadata
sample-1.5.0.bag
    - bagPy.ErrorLoadingMetadata
sample-2.0.1.bag
     Elevation
     Uncertainty
     Nominal_Elevation
     Surface_Correction
true_n_nominal.bag
    - bagPy.ErrorLoadingMetadata

@giumas
Copy link
Member

giumas commented Jul 12, 2024

@selimnairb, this minimal script replicates the issue without using BagPy:

from h5py import File
from lxml import etree, __version__ as lxml_version

bag_path = r"C:\code\cpp\BAG\examples\sample-data\sample-1.5.0.bag"

strip_x00 = False

print("lxml version: %s" % lxml_version)
print("libxml version: %s" % (etree.LIBXML_COMPILED_VERSION, ))

bag = File(bag_path, 'r')
xml = bag["BAG_root/metadata"][:].tobytes()
if strip_x00:
    xml = xml.strip(b'\x00')
xml_tree = etree.fromstring(xml)

If I run it using old versions of libxml, it works as is (strip_x00 = False). Example output:

lxml version: 4.7.1
libxml version: (2, 9, 12)

However, the same script fails when executed in a more recent version of libxml:

lxml version: 5.1.0
libxml version: (2, 12, 3)
Traceback (most recent call last):
  File "C:\code\hyo2\hyo2_bag\examples\workground\open_bag_metadata_xml.py", line 15, in <module>
    xml_tree = etree.fromstring(xml)
               ^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/etree.pyx", line 3264, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1989, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1164, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 633, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 743, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 672, in lxml.etree._raiseParseError
  File "<string>", line 17
lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 17, column 6165

However, if you switch the flag (strip_x00 = True), it works again:

lxml version: 5.1.0
libxml version: (2, 12, 3)

So my suggestion is to always strip(b'\x00') in bagPy library.

@selimnairb
Copy link
Collaborator

Thank you @giumas for the helpful test case. @stephen-patterson-noaa @heathhenley we're working on a bug-fix release and will address this issue. This will likely happen as part of this PR. We hope to have this ready in the next few of weeks. Thanks for your interest and contributions!

@selimnairb
Copy link
Collaborator

@heathhenley @giumas @stephen-patterson-noaa I merged #104, which fixes the Python wheel testing on Windows and adds a test case for opening a BAG 1.5 file. Can you give the latest code in master a try and reconfirm the problems you were seeing before?

You can now relatively easily build BAG C++ and Python libraries without using Conda on Windows using the scripts documented here. Let me know if you run into any problems. Thanks!

@stephen-patterson-noaa
Copy link

@selimnairb I reviewed the changes a couple different ways. The "test_compat_bag15.py" passes when run inside the docker container. I also tried to update/reinstall the bagPy library v2.0.3(build py312h4cf4972_1) through conda and it is still showing me the same error message below about extra content when using Python 3.12.2 on Windows:

Entity: line 17: parser error : Extra content at the end of the document
erNote></gmd:MD_SecurityConstraints></gmd:metadataConstraints></gmi:MI_Metadata>
^
______________________________________________________ TestCompatBAG15.test_open_read_write _______________________________________________________

self = <test_compat_bag15.TestCompatBAG15 testMethod=test_open_read_write>

def test_open_read_write(self):
    bag_filename = str(Path(self.datapath, 'sample-1.5.0.bag'))
  dataset = BAG.Dataset.openDataset(bag_filename, BAG.BAG_OPEN_READ_WRITE)

@selimnairb
Copy link
Collaborator

Thanks @stephen-patterson-noaa. I'll test using conda. It may be related to the version(s) of libxml2 that conda includes...

@selimnairb
Copy link
Collaborator

@stephen-patterson-noaa @giumas I have a PR that fixes the libxml2 issue you were running into. Once this is merged and a new release is made, this will make its way into the conda packages. In the mean time, I wanted you to be aware of the fix in case you needed to deploy from GitHub.

@selimnairb
Copy link
Collaborator

@heathhenley @stephen-patterson-noaa @giumas BAG [release-2.0.4[(https://github.com/OpenNavigationSurface/BAG/releases/tag/release-2.0.4) was released yesterday, which should fix the libxml2 incompatibility. It is available for install via conda-forge. When you have a chance, please re-run the tests you were running above to verify that things are now working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants