Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Serialization of shapely objects in dataframes creates "intermediate" products #2289

Open
6 tasks done
dlohmeier opened this issue May 22, 2024 · 2 comments · May be fixed by #2319
Open
6 tasks done

[bug] Serialization of shapely objects in dataframes creates "intermediate" products #2289

dlohmeier opened this issue May 22, 2024 · 2 comments · May be fixed by #2319
Assignees

Comments

@dlohmeier
Copy link
Collaborator

Bug report checklis

  • Searched the issues page for similar reports

  • Read the relevant sections of the documentation

  • Browse the tutorials and tests for usefull code snippets and examples of use

  • Reproduced the issue after updating with pip install --upgrade pandapower (or git pull)

  • Tried basic troubleshooting (if a bug/error) like restarting the interpreter and checking the pythonpath

Reproducible Example

import pandas as pd
import pandapower as pp
import shapely

df = pd.DataFrame({"a": [1, 2], "b": [shapely.Point([1, 4]), shapely.LineString([[1, 2], [4, 6]])]})
json_str = pp.to_json(df)
df2 = pp.from_json_string(json_str)

print(df)
print(df2)

import geopandas as gpd

df2  = pd.DataFrame({"a": [1, 2], "b": [shapely.Point([1, 4]), shapely.LineString([[1, 2], [4, 6]])], "c": [shapely.Point([1, 9]), shapely.LineString([[1, 2], [4, 4]])]})
gdf = gpd.GeoDataFrame(df2, geometry="c")
json_str_gdf = pp.to_json(gdf)
gdf2 = pp.from_json_string(json_str)

Issue Description and Traceback

When running the above code, the shapely data is transferred into the internal pandapower serialization format. Upon deserialization, this format cannot be converted back, but is kept as a dict with multiple "useless" entries, such as "_module" or "_class". I assume that the reason behind this is that we pass the pandapower to_serializable handler as default_handler to pandas upon serialization, but we can't hand over a registry or decode-hook upon de-serialization. Is that correct? Do you have any idea of how to overcome this problem?
I know that serializing a dataframe is not a good usecase for the pandapower.to_json function, but in some cases, I do store shapely data inside my net dataframes without making them geopandas dataframes. Additionally, I sometimes use more than just one column with geodata. For such cases, I added the geopandas part of the code. It is completely impossibly to encode GeoDataFrames that contain more geodata than just that inside the "geometry" column, as the following error occurs:

Traceback (most recent call last):
  File "/home/daniel/workspace/pandapower/pandapower/io_utils.py", line 448, in default
    s = to_serializable(o)
	^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/functools.py", line 909, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
	   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/daniel/workspace/pandapower/pandapower/io_utils.py", line 985, in json_geodataframe
    d = with_signature(obj, obj.to_json())
	                    ^^^^^^^^^^^^^
  File "/home/daniel/.virtualenvs/retoflow/lib/python3.11/site-packages/geopandas/geodataframe.py", line 782, in to_json
    return json.dumps(
	   ^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
	   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
	     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
	   ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Point is not JSON serializable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/daniel/.virtualenvs/retoflow/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-24-0801906cf0dd>", line 1, in <module>
    gdf_str = pp.to_json({"geo": gdf})
	      ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/daniel/workspace/pandapower/pandapower/file_io.py", line 132, in to_json
    json_string = json.dumps(net, cls=io_utils.PPJSONEncoder, indent=2)
	          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
	  ^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 202, in encode
    chunks = list(chunks)
	     ^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 432, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.11/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.11/json/encoder.py", line 439, in _iterencode
    o = _default(o)
	^^^^^^^^^^^
  File "/home/daniel/workspace/pandapower/pandapower/io_utils.py", line 451, in default
    return json.JSONEncoder.default(self, o)
	   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type GeoDataFrame is not JSON serializable

Expected Behavior

It would be great to retrieve shapely data even from withtin dataframes or geodataframes outside the geometry column. Any ideas on that?

Installed Versions

INSTALLED VERSIONS

commit : 2e218d10984e9919f0296931d92ea851c6a6faf5
python : 3.11.9.final.0
python-bits : 64
OS : Linux
OS-release : 6.5.0-35-generic
Version : #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : de_DE.UTF-8
LOCALE : de_DE.UTF-8
pandas : 1.5.3
numpy : 1.23.5
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 70.0.0
pip : 24.0
Cython : 3.0.9
pytest : 8.1.1
hypothesis : 6.82.7
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.2.0
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.9
jinja2 : 3.1.3
IPython : 8.23.0
pandas_datareader: None
bs4 : 4.12.3
bottleneck : None
brotli : 1.1.0
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.3
numba : 0.59.1
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 15.0.2
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None
tzdata : 2024.1

Label

  • Relevant labels are selected
@vogt31337
Copy link
Contributor

I assume that the reason behind this is that we pass the pandapower to_serializable handler as default_handler to pandas upon serialization, but we can't hand over a registry or decode-hook upon de-serialization. Is that correct? Do you have any idea of how to overcome this problem?

I think it is possible to register new handler. But you have to write one. I don't know how the impact is on loading times and co. I think the same is true for the geometry column, since this is handled by a custom loading hook.

@vogt31337
Copy link
Contributor

@dlohmeier

@dlohmeier dlohmeier linked a pull request Aug 19, 2024 that will close this issue
@dlohmeier dlohmeier linked a pull request Aug 19, 2024 that will close this issue
@vogt31337 vogt31337 self-assigned this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants