-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support dataset creation options (use metadata to split dataset/layer creation options) #189
ENH: support dataset creation options (use metadata to split dataset/layer creation options) #189
Conversation
…layer creation options)
elif k in layer_option_names: | ||
layer_kwargs[k] = v | ||
else: | ||
raise ValueError(f"unrecognized option '{k}' for driver '{driver}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we OK with raising an error here in case an option is not supported for the driver in question?
At the moment, we pass it through and the the GDAL warning message gets printed (so in effect we ignore the option).
For example:
In [18]: df = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))
In [19]: pyogrio.write_dataframe(df, "test.geojson", spatial_index=False)
Warning 6: dataset test.geojson does not support layer creation option SPATIAL_INDEX
But with this change, we actually raise an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising an error seems reasonable because users have to pass in the options intentionally, which means it is easy to fix by omitting the unsupported option and trying again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, also added a test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @jorisvandenbossche ! A few minor comments to consider, but otherwise this is ready to merge.
We should add an example to docs/introduction.md
showcasing these options (added #193 to track that) in a separate PR.
pyogrio/raw.py
Outdated
if xml: | ||
root = ET.fromstring(xml) | ||
for option in root.iter("Option"): | ||
if option.attrib.get("scope", "vector") == "raster": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
if option.attrib.get("scope", "vector") != "raster":
options.append(option.attrib["name"])
pyogrio/raw.py
Outdated
to `SPATIAL_INDEX="YES"`. | ||
""" | ||
if not isinstance(options, dict): | ||
raise TypeError(f"Expected a dict as options, got {type(options)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise TypeError(f"Expected a dict as options, got {type(options)}") | |
raise TypeError(f"Expected options to be a dict, got {type(options)}") |
elif k in layer_option_names: | ||
layer_kwargs[k] = v | ||
else: | ||
raise ValueError(f"unrecognized option '{k}' for driver '{driver}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising an error seems reasonable because users have to pass in the options intentionally, which means it is easy to fix by omitting the unsupported option and trying again.
pyogrio/tests/test_geopandas_io.py
Outdated
@@ -471,21 +471,64 @@ def test_write_read_empty_dataframe_unsupported(tmp_path, ext): | |||
_ = read_dataframe(filename) | |||
|
|||
|
|||
def test_write_dataframe_gdalparams(tmp_path, naturalearth_lowres): | |||
original_df = read_dataframe(naturalearth_lowres) | |||
def test_write_dataframe_gdal_options(tmp_path, naturalearth_lowres): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test could probably be parametrized instead, perhaps something like (untested)
@pytest.mark.parametrize("spatial_index", [False, True])
def test_write_dataframe_gdal_options(tmp_path, naturalearth_lowres, spatial_index):
df = read_dataframe(naturalearth_lowres)
outfilename1 = tmp_path / "test1.shp"
write_dataframe(df, outfilename1, SPATIAL_INDEX="YES" if spatial_index else "NO")
assert outfilename.exists() is True
index_filename1 = tmp_path / "test1.qix"
assert index_filename1.exists() is spatial_index
# using explicit layer_options instead
outfilename2 = tmp_path / "test2.shp"
write_dataframe(df, outfilename2, layer_options=dict(spatial_index=spatial_index))
assert outfilename.exists() is True
index_filename2 = tmp_path / "test2.qix"
assert test_noindex_index_filename.exists() is spatial_index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks better!
@brendan-ward thanks for the review! |
Closes #177
This PR adds a minimal implementation of querying metadata items (based on Toblerity/Fiona#950), not yet publicly exposed (we should do that as well at some point), but just enough to use internally.
We then use the metadata to check the dataset and layer creation options for the specific driver, so we can split the
**kwargs
in separate layer and dataset creation kwargs.