Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: expose additional kwargs in explode method to mirror GeoPandas API #246

Merged
merged 16 commits into from
May 6, 2024

Conversation

ebkurtz
Copy link
Contributor

@ebkurtz ebkurtz commented Mar 30, 2023

Expose index_parts, column, and ignore_index parameters in the .explode() method to mirror GeoPandas API. This allows supressing FutureWarning for index_parts, see #245

@ebkurtz
Copy link
Contributor Author

ebkurtz commented Mar 30, 2023

Failing on GeoSeries since the Column argument is only valid for DataFrame.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 30, 2023

The CI failure looks unrelated. I'll take a look at that later.

It might be worth adding a small unit test, just to verify that things are passing through correctly (and to make sure they don't break in the future).

@ebkurtz
Copy link
Contributor Author

ebkurtz commented Mar 31, 2023

Thanks for the tip on defining them separately. I'll look at creating a unit test.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The code looks good, would indeed be good to add a basic test using one of the keywords.

dask_geopandas/core.py Outdated Show resolved Hide resolved
dask_geopandas/core.py Outdated Show resolved Hide resolved
ebkurtz and others added 2 commits March 31, 2023 15:33
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
@ebkurtz
Copy link
Contributor Author

ebkurtz commented Apr 3, 2023

Would I add that as a new function in test_core.py are as additional code in the existing functions? I haven't done much with unit tests.

@TomAugspurger
Copy link
Contributor

@ebkurtz
Copy link
Contributor Author

ebkurtz commented Apr 6, 2023

The unit tests for the ignore_index argument are failing. With ignore_index set to True, the index is reset for each map partition so the resulting GeoDataFrame index has a non-unique index that doesn't match the vanilla GeoPandas call:

e.g. with

def test_explode_geoseries_ignore_index():
    s = geopandas.GeoSeries(
        [MultiPoint([(0, 0), (1, 1)]), MultiPoint([(2, 2), (3, 3), (4, 4)])]
    )
    # Change index from range index to test ignore index argument
    s.index = [3, 4]
    original = s.explode(ignore_index=True)
    dask_s = dask_geopandas.from_geopandas(s, npartitions=2)
    daskified = dask_s.explode(ignore_index=True)
    assert isinstance(daskified, dask_geopandas.GeoSeries)
    assert all(original == daskified.compute())

returns error
ValueError: Can only compare identically-labeled Series objects

self = 0    POINT (0.00000 0.00000)
1    POINT (1.00000 1.00000)
2    POINT (2.00000 2.00000)
3    POINT (3.00000 3.00000)
4    POINT (4.00000 4.00000)
dtype: geometry

other = 0    POINT (0.00000 0.00000)
1    POINT (1.00000 1.00000)
0    POINT (2.00000 2.00000)
1    POINT (3.00000 3.00000)
2    POINT (4.00000 4.00000)

My inclination is to reset the index after calling compute on the Dask geoseries:

 assert all(original == daskified.compute().reset_index(drop=True))

but I'm not familiar enough with unit testing and dask to know if that's an appropriate way to handle it.

@ebkurtz
Copy link
Contributor Author

ebkurtz commented Oct 13, 2023

Getting a lot of test failures after refreshing my fork. Looks like something with pyarrow strings?

@jorisvandenbossche jorisvandenbossche changed the title Expose additional kwargs in explode method to mirror GeoPandas api ENH: expose additional kwargs in explode method to mirror GeoPandas API May 6, 2024
@jorisvandenbossche
Copy link
Member

Getting a lot of test failures after refreshing my fork. Looks like something with pyarrow strings?

Apologies for the slow follow-up. Indeed those failures were cased by something else, which has been fixed in the meantime.

I also added the equivalent change to the new dask-expr based implementation.

@jorisvandenbossche
Copy link
Member

The unit tests for the ignore_index argument are failing.

Your fix to create the expected result manually with the different index was still failing for the older geopandas or dask version. But so just added a skip for those older versions, as I don't think it is worth complicating the test for that.

@jorisvandenbossche jorisvandenbossche merged commit 2607381 into geopandas:main May 6, 2024
12 checks passed
@jorisvandenbossche
Copy link
Member

@ebkurtz Thanks again for the PR and apologies it took so long to get merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants