Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialization methods for List and StructDtype #8441

Merged
merged 10 commits into from
Jun 21, 2021
56 changes: 55 additions & 1 deletion python/cudf/cudf/core/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import decimal
import pickle
from typing import Any, Optional, Tuple
from typing import Any, Dict, List, Optional, Tuple

import numpy as np
import pandas as pd
Expand All @@ -12,6 +12,7 @@

import cudf
from cudf._typing import Dtype
from cudf.core.buffer import Buffer


class _BaseDtype(ExtensionDtype):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should _BaseDtype extend Serializable now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the changes, you've just added it should now. I'm actually not 100% sure what we should and shouldn't do with _BaseDtype. Ashwin is probably the right person to ask about this, though IMO this should be fine 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would agree. Our dtypes should probably all be Serializable.

Expand Down Expand Up @@ -186,6 +187,29 @@ def __repr__(self):
def __hash__(self):
return hash(self._typ)

def serialize(self) -> Tuple[dict, list]:
header: Dict[str, Dtype] = {}
frames = []
if isinstance(self.element_type, _BaseDtype):
header["element-type-cls"] = self.element_type.__class__
(
header["element-type-header"],
frames,
) = self.element_type.serialize()
else:
header["element-type"] = self.element_type
return header, frames

@classmethod
def deserialize(cls, header: dict, frames: list):
if "element-type-cls" in header:
element_type = header["element-type-cls"].deserialize(
header["element-type-header"], frames
)
else:
element_type = header["element-type"]
return cls(element_type)


class StructDtype(_BaseDtype):

Expand Down Expand Up @@ -237,6 +261,36 @@ def __repr__(self):
def __hash__(self):
return hash(self._typ)

def serialize(self) -> Tuple[dict, list]:
header = {}
frames: List[Buffer] = []
for k, dtype in self.fields.items():
if isinstance(dtype, _BaseDtype):
dtype_header, dtype_frames = dtype.serialize()
header[k] = {
"cls": dtype.__class__,
"header": dtype_header,
"frames": (len(frames), len(frames) + len(dtype_frames)),
}
frames.extend(dtype_frames)
else:
header[k] = dtype

return header, frames

@classmethod
def deserialize(cls, header: dict, frames: list):
fields = {}
for k, dtype in header.items():
if isinstance(dtype, dict):
fields[k] = dtype["cls"].deserialize(
dtype["header"],
frames[dtype["frames"][0] : dtype["frames"][1]],
)
else:
fields[k] = dtype
return cls(fields)


class Decimal64Dtype(_BaseDtype):

Expand Down