-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support writing object columns with np.nan values #118
Changes from all commits
da48362
ce1b68a
530b5d4
744a031
6c38520
cb28a2e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -704,3 +704,25 @@ def test_custom_crs_io(tmpdir, naturalearth_lowres_all_ext): | |
assert crs["lat_2"] == 51.5 | ||
assert crs["lon_0"] == 4.3 | ||
assert df.crs.equals(expected.crs) | ||
|
||
|
||
def test_write_read_null(tmp_path): | ||
from shapely.geometry import Point | ||
|
||
output_path = tmp_path / f"test_write_nan.gpkg" | ||
geom = Point(0, 0) | ||
test_data = { | ||
"geometry": [geom, geom, geom], | ||
"float64": [1.0, None, np.nan], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't have any effect in practice, as that will get converted to twice a NaN by pandas:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's indeed the case. Nonetheless I think it is transparant that it is explicitly in the test? But obviously if you think that's better I can put 2 times np.nan or whatever... |
||
"object_str": ["test", None, np.nan], | ||
} | ||
test_gdf = gp.GeoDataFrame(test_data, crs="epsg:31370") | ||
write_dataframe(test_gdf, output_path) | ||
result_gdf = read_dataframe(output_path) | ||
assert len(test_gdf) == len(result_gdf) | ||
assert result_gdf["float64"][0] == 1.0 | ||
assert pd.isna(result_gdf["float64"][1]) | ||
assert pd.isna(result_gdf["float64"][2]) | ||
assert result_gdf["object_str"][0] == "test" | ||
assert result_gdf["object_str"][1] is None | ||
assert result_gdf["object_str"][2] is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we combine this with the
field_value is None
check above to set FieldNull in a single place?(or move that check here, as currently the only way to get a None is through object dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the check for
isnan
should be in theelif field_type == OFTReal:
block? The field type for the incoming column should befloat32
/float64
rather thanobject
(though I haven't verified this)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brendan-ward note that the specific case that is being fixed here is if you have an
object
dtype column (but which contains NaN, pandas doesn't really distinguish None vs NaN in object columns)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure... if you prefer that... I moved the "is None" check as it is redundant for column types other than string/object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brendan-ward in addition to what @jorisvandenbossche wrote: np.nan values in an OFTReal (float) column are automatically treated correctly (as null) by GDAL (OGR_F_SetFieldDouble), so no need to explicitly call OGR_F_SetFieldNull for OFTReal columns. Only for object columns this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that's not fully correct I think. They are written as NaN and not as Null. Since numpy/pandas only support NaN in float arrays, that's not an issue for correct rountrip for geopandas->gdal->geopandas, though.
Small illustrations:
And check both using GDAL's
ogrinfo
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #122 to further track this