Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pathlib.Path type #97

Closed
odelalleau opened this issue Dec 12, 2019 · 12 comments · Fixed by #873
Closed

Add support for pathlib.Path type #97

odelalleau opened this issue Dec 12, 2019 · 12 comments · Fixed by #873
Labels
Milestone

Comments

@odelalleau
Copy link
Collaborator

Once stronger typing is implemented in the library, consider adding support for the pathlib.Path type.

Motivation: paths are often found in configs and pathlib.Path usage is spreading across the Python ecosystem.

@soupault
Copy link

soupault commented Mar 6, 2020

I am considering switching from click to hydra. In my work I have a lot of code fragments similar to this:

@click.command()
@click.option('--path_experiment_root', type=click.Path())
def main(**config):
    config['path_weights'] = Path(config['path_experiment_root'], 'weights')
    config['path_logs'] = Path(config['path_experiment_root'], 'logs')

The direct and convenient translation to hydra would be:

@hydra.main(config_path="config.yaml")
def main(config):
    config.path_weights = Path(config.path_experiment_root, "weights")
    config.path_logs = Path(config.path_experiment_root, "logs")

However, I am getting the following error:

ValueError: key path_weights: PosixPath is not a primitive type

Would you consider allowing pathlib.Path et al types for hydra records? Pathlib is a pretty common these days, and the serialization of Path-object can be simply done via str(path_obj). I think it would be great to allow Path-objects so that the usercode is not polluted with constant conversion to string.

@omry
Copy link
Owner

omry commented Mar 6, 2020

Hi @soupault.
It does seem to make sense, which is why this issue is here.
This is something I will look into for the next version of OmegaConf (after 2.0.0).
The current one is already a very large change (essentially doubling the code size) and I am not planning to add any new features to it at this point with the exception of maybe #152.

Thanks for your patience.

@omry omry added this to the OmegaConf 2.1 milestone Mar 6, 2020
@omry omry added the enhancement New feature or request label Mar 24, 2020
@odelalleau
Copy link
Collaborator Author

Out of curiosity, what would be easier: adding pathlib.Path as a new base type, or adding a more general mechanism to allow users to register their own custom types?

@omry
Copy link
Owner

omry commented Jul 15, 2020

Gut feeling is that adding support for custom types would be harder.

@omry
Copy link
Owner

omry commented Feb 3, 2021

Clearing the milestone, 2.1 is already too much. we can revisit later.

@omry omry removed this from the OmegaConf 2.1 milestone Feb 3, 2021
@anordin95
Copy link

Taking advantage of dataclass's __post_init__() method I found a way to convert types from an OmegaConf object to any dataclass class and back. In my example I used some nested objects along with uuid.UUID and pathlib.Path.

It's not really a true solution, but it can provide a convenient workaround.

btp_config.yaml

uploader:
  queue_path: /Users/anordin/.test_upload_queue
  cache_path: /Users/anordin/.test_upload_cache
  dead_letter_path: /Users/anordin/.dead_leter_queue
test_ids:
  test_run_id: "???"
  test_iteration_id: "???"
test_run_type: FT1

conf.py

import uuid
import dataclasses
from omegaconf import OmegaConf
from enum import Enum

from pathlib import Path

class TestRunType(Enum):
  FT1 = 1
  FT2 = 2
  ATP = 3
  CP = 4
  TransducerRDT = 5

@dataclasses.dataclass
class Uploader:
  queue_path: Path
  cache_path: Path
  dead_letter_path: Path

  def __post_init__(self):
    print("Uploader post_init...")
    for field in dataclasses.fields(self):
      value = getattr(self, field.name)
      cls = field.type
      if not isinstance(value, cls):
        setattr(self, field.name, cls(value))

@dataclasses.dataclass
class TestIDs:
  test_run_id: uuid.UUID
  test_iteration_id: uuid.UUID

  def __post_init__(self):
    print("TestIDs post_init...")
    for field in dataclasses.fields(self):
      value = getattr(self, field.name)
      cls = field.type
      if not isinstance(value, cls):
        setattr(self, field.name, cls(value))

@dataclasses.dataclass
class TestUploaderConfiguration:
  test_ids: TestIDs = None
  uploader: Uploader = None
  test_run_type: TestRunType = None

  def __post_init__(self):
    print("TestUploaderConfiguration post_init...")

    for field in dataclasses.fields(self):
      cls = field.type 
      val = getattr(self, field.name)

      if isinstance(val, cls):
        # no need for type-conversion
        continue
      if issubclass(cls, Enum):
        # enum's have a different init procedure
        obj = cls[val]
      else:
        obj = cls(**val)

      setattr(self, field.name, obj)

def class_to_dict(obj):
  primitives = (int, str, bool, float, dict)
  serializable_objs = (Path, uuid.UUID, Enum)

  dict_class = {}

  for k, v in vars(obj).items():
    cls = type(v)
    
    if cls in primitives:
      continue
    if any([issubclass(cls, obj) for obj in serializable_objs]):
      if issubclass(cls, Enum):
        # special-case handling for Enum to exclude Enum class-name in str
        dict_class[k] = v.name
      else:
        dict_class[k] = str(v)
    else:
      dict_class[k] = class_to_dict(v)

  return dict_class

def obj_to_omega_conf(obj):
  dict_from_obj = class_to_dict(obj)
  config_from_dict = OmegaConf.create(dict_from_obj)

  return config_from_dict

def omega_conf_to_obj(omega_conf, cls):
  dict_from_config = OmegaConf.to_container(omega_conf)
  obj = cls(**dict_from_config)

  return obj

if __name__ == '__main__':
  config = OmegaConf.load("btp_config.yaml")

  config.test_ids.test_run_id = str(uuid.uuid4())
  config.test_ids.test_iteration_id = str(uuid.uuid4())

  obj = omega_conf_to_obj(config, TestUploaderConfiguration)
  config_clone = obj_to_omega_conf(obj)

  print(config_clone == config)

You can test it out yourself with python conf.py.

Also, P.S. I'm new to this library. I wanted to say thank you for the very clean API and easily understandable docs!

@odelalleau
Copy link
Collaborator Author

Taking advantage of dataclass's __post_init__() method I found a way to convert types from an OmegaConf object to any dataclass class and back.

Interesting, there is a related issue #472 and PR #502

@anordin95
Copy link

Just updating with a much simplified code-block that captures the essential functionality.

def post_init_type_cast(dataclass):
    """
    recursive invocation to transform all 
    fields of a dataclass, including 
    nested dataclasses, into their typehinted types.
    """
    if not dataclasses.is_dataclass(dataclass):
        raise Exception("Can only type-cast dataclass classes.")

    for field in dataclasses.fields(dataclass):
        value = getattr(dataclass, field.name)
        typehint_cls = field.type

        if value is None:
            # no value specified to type-convert
            continue

        elif isinstance(value, typehint_cls):
            # no need for type-conversion
            continue

        elif isinstance(value, dict):
            """
            if execution gets here, we know 
            value is not an instance of typehinted-type but
            is a dictionary. It contains the contents
            of a nested dataclass
            """
            obj = typehint_cls(**value)

            # recursively perform type casting
            post_init_type_cast(obj)

        elif issubclass(typehint_cls, Enum):
            # enum's have a different init procedure
            obj = typehint_cls[value]

        else:
            # simply type-cast the object
            obj = typehint_cls(value)

        setattr(dataclass, field.name, obj)

@lebrice
Copy link

lebrice commented Feb 6, 2022

Any updates on this?

@odelalleau
Copy link
Collaborator Author

odelalleau commented Feb 7, 2022

Any updates on this?

Currently you can use the following resolver-based workaround:

from dataclasses import dataclass
from pathlib import Path
from typing import Any

from omegaconf import OmegaConf

@dataclass
class Config:
    model: str = "/path/to/model.pkl"
    model_path: Any = "${path:${.model}}"

OmegaConf.register_new_resolver("path", Path, replace=True)
cfg = OmegaConf.structured(Config)
assert isinstance(cfg.model_path, Path)

Edit: note that you could use a single variable, but it would be less convenient to override on the command line

@rsokl
Copy link

rsokl commented Apr 13, 2022

@odelalleau you might be interested in mit-ll-responsible-ai/hydra-zen#257 -- we are looking to let users add support for custom types in Hydra, but via hydra-zen.

@odelalleau
Copy link
Collaborator Author

odelalleau commented Apr 13, 2022

@odelalleau you might be interested in mit-ll-responsible-ai/hydra-zen#257 -- we are looking to let users add support for custom types in Hydra, but via hydra-zen.

Thanks @rsokl, hydra-zen seems interesting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants