[bug] Serialization of shapely objects in dataframes creates "intermediate" products #2289

dlohmeier · 2024-05-22T12:56:04Z

Bug report checklis

Searched the issues page for similar reports
Read the relevant sections of the documentation
Browse the tutorials and tests for usefull code snippets and examples of use
Reproduced the issue after updating with pip install --upgrade pandapower (or git pull)
Tried basic troubleshooting (if a bug/error) like restarting the interpreter and checking the pythonpath

Reproducible Example

import pandas as pd
import pandapower as pp
import shapely

df = pd.DataFrame({"a": [1, 2], "b": [shapely.Point([1, 4]), shapely.LineString([[1, 2], [4, 6]])]})
json_str = pp.to_json(df)
df2 = pp.from_json_string(json_str)

print(df)
print(df2)

import geopandas as gpd

df2  = pd.DataFrame({"a": [1, 2], "b": [shapely.Point([1, 4]), shapely.LineString([[1, 2], [4, 6]])], "c": [shapely.Point([1, 9]), shapely.LineString([[1, 2], [4, 4]])]})
gdf = gpd.GeoDataFrame(df2, geometry="c")
json_str_gdf = pp.to_json(gdf)
gdf2 = pp.from_json_string(json_str)

Issue Description and Traceback

When running the above code, the shapely data is transferred into the internal pandapower serialization format. Upon deserialization, this format cannot be converted back, but is kept as a dict with multiple "useless" entries, such as "_module" or "_class". I assume that the reason behind this is that we pass the pandapower to_serializable handler as default_handler to pandas upon serialization, but we can't hand over a registry or decode-hook upon de-serialization. Is that correct? Do you have any idea of how to overcome this problem?
I know that serializing a dataframe is not a good usecase for the pandapower.to_json function, but in some cases, I do store shapely data inside my net dataframes without making them geopandas dataframes. Additionally, I sometimes use more than just one column with geodata. For such cases, I added the geopandas part of the code. It is completely impossibly to encode GeoDataFrames that contain more geodata than just that inside the "geometry" column, as the following error occurs:

Traceback (most recent call last):
  File "/home/daniel/workspace/pandapower/pandapower/io_utils.py", line 448, in default
    s = to_serializable(o)
	^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/functools.py", line 909, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
	   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/daniel/workspace/pandapower/pandapower/io_utils.py", line 985, in json_geodataframe
    d = with_signature(obj, obj.to_json())
	                    ^^^^^^^^^^^^^
  File "/home/daniel/.virtualenvs/retoflow/lib/python3.11/site-packages/geopandas/geodataframe.py", line 782, in to_json
    return json.dumps(
	   ^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
	   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
	     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
	   ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Point is not JSON serializable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/daniel/.virtualenvs/retoflow/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-24-0801906cf0dd>", line 1, in <module>
    gdf_str = pp.to_json({"geo": gdf})
	      ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/daniel/workspace/pandapower/pandapower/file_io.py", line 132, in to_json
    json_string = json.dumps(net, cls=io_utils.PPJSONEncoder, indent=2)
	          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
	  ^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 202, in encode
    chunks = list(chunks)
	     ^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 432, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.11/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.11/json/encoder.py", line 439, in _iterencode
    o = _default(o)
	^^^^^^^^^^^
  File "/home/daniel/workspace/pandapower/pandapower/io_utils.py", line 451, in default
    return json.JSONEncoder.default(self, o)
	   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type GeoDataFrame is not JSON serializable

Expected Behavior

It would be great to retrieve shapely data even from withtin dataframes or geodataframes outside the geometry column. Any ideas on that?

Installed Versions

INSTALLED VERSIONS

commit : 2e218d10984e9919f0296931d92ea851c6a6faf5
python : 3.11.9.final.0
python-bits : 64
OS : Linux
OS-release : 6.5.0-35-generic
Version : #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : de_DE.UTF-8
LOCALE : de_DE.UTF-8
pandas : 1.5.3
numpy : 1.23.5
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 70.0.0
pip : 24.0
Cython : 3.0.9
pytest : 8.1.1
hypothesis : 6.82.7
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.2.0
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.9
jinja2 : 3.1.3
IPython : 8.23.0
pandas_datareader: None
bs4 : 4.12.3
bottleneck : None
brotli : 1.1.0
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.3
numba : 0.59.1
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 15.0.2
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None
tzdata : 2024.1

Label

Relevant labels are selected

The text was updated successfully, but these errors were encountered:

vogt31337 · 2024-06-13T07:13:35Z

I assume that the reason behind this is that we pass the pandapower to_serializable handler as default_handler to pandas upon serialization, but we can't hand over a registry or decode-hook upon de-serialization. Is that correct? Do you have any idea of how to overcome this problem?

I think it is possible to register new handler. But you have to write one. I don't know how the impact is on loading times and co. I think the same is true for the geometry column, since this is handled by a custom loading hook.

vogt31337 · 2024-07-26T12:55:11Z

@dlohmeier

dlohmeier added bug fileIO labels May 22, 2024

dlohmeier linked a pull request Aug 19, 2024 that will close this issue

applying hooks to all dataframe columns #2319

Open

vogt31337 self-assigned this Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Serialization of shapely objects in dataframes creates "intermediate" products #2289

[bug] Serialization of shapely objects in dataframes creates "intermediate" products #2289

dlohmeier commented May 22, 2024

vogt31337 commented Jun 13, 2024

vogt31337 commented Jul 26, 2024

[bug] Serialization of shapely objects in dataframes creates "intermediate" products #2289

[bug] Serialization of shapely objects in dataframes creates "intermediate" products #2289

Comments

dlohmeier commented May 22, 2024

Bug report checklis

Reproducible Example

Issue Description and Traceback

Expected Behavior

Installed Versions

INSTALLED VERSIONS

Label

vogt31337 commented Jun 13, 2024

vogt31337 commented Jul 26, 2024