Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the way YAMLLoader resolves some properties #572

Merged
merged 4 commits into from
May 26, 2016

Conversation

ftsamis
Copy link
Member

@ftsamis ftsamis commented May 25, 2016

When specifying an implicit resolver for a YAML property, a regular expression to match this property type is required. That's what I did in #515 for parsing quantities as objects.

The problem with this is that we need a fairly complex regular expression to match all cases, and in my regex I had indeed missed one: infinity valued quantities weren't parsed as quantities. Also, quantity units weren't checked if they were valid, since this would lead to a huge regex (for all units), and reinventing the wheel.

In a discussion with @wkerzendorf we both agreed that just correcting the regex to recognize infinity as a valid quantity value might not completely solve the issue, since it is possible that my regex can still miss something.

So, I implemented his proposal which was a minimal mock class for re patterns which instead of checking if a string could be quantity based on a regular expression it would actually try to convert it to a Quantity.

It needs to be mentioned that this is quite hackish, and while it is much more reliable in telling if a string can be a quantity, it is also 2-3 times slower than regex matching.

ftsamis added 3 commits May 25, 2016 17:58
Instead of using a regular expression to implicitly resolve
quantities, actually try to convert the property to
astropy.units.Quantity and see if that succeeds.

While this is more reliable, it is also much slower.
Instead of using yaml's faulty regular expression to implicitly
resolve floats, actually try to convert the property to
float and see if that succeeds.

With this change values like 5e3 will successfully get
parsed as floats, while pyyaml required a format of 5.e+3

While this is more reliable, it is also much slower.
@wkerzendorf
Copy link
Member

@ftsamis travis didn't run on that. I wonder what is wrong.

@ftsamis
Copy link
Member Author

ftsamis commented May 25, 2016

I wonder if a couple of quick rebases I did messed it up. I'll try and get Travis run on this.

@ftsamis
Copy link
Member Author

ftsamis commented May 25, 2016

@wkerzendorf Got it!

data = self.construct_scalar(node)
return quantity_from_str(data)

YAMLLoader.add_constructor(u'!quantity', YAMLLoader.construct_quantity)
Copy link
Contributor

@yeganer yeganer May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a dump question but can't we do this in YAMLLoader.__init__?
I'm talking about all the 3 lines, sry if the location of the comment is bad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly because these class methods (add_constructor, add_implicit_resolver) alter the class itself, not its instances. They change YAMLLoader.yaml_constructors YAMLLoader.yaml_implicit_resolvers which are attributes of the class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then one would need to hook into __new__ or something?

Copy link
Member Author

@ftsamis ftsamis May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is the proper way to do it. __new__ is used to create a new instance (and __init__ to initialize it). A senseful alternative to how I currently do it would be something like:

class YAMLLoader(yaml.Loader):
    yaml_implicit_resolvers = yaml.Loader.yaml_implicit_resolvers + new_implicit_resolvers # Pseudocode
    yaml_constructors = yaml.Loader.yaml_constructors + new_constructors # Pseudocode

but this is not what I chose to do because there are some extra things that add_implicit_resolver takes care of and because this is why it is there for.

This is a common thing when you need to change a class. If you choose to call these classmethods on instance creation/initialization you'll find yourself adding the same stuff multiple times since they actually change the class attributes, which are shared between all instances. Then you'd have to do a check like if my_new_resolver in YAMLLoader.yaml_implicit_resolvers then don't_add_it_again which doesn't make any sense to do.

This is also the reason I created a separate class. I could simply do yaml.Loader.add_constructor/resolver() but then anything that would use yaml.load() from this point and on would be forced to have my customizations.

If pyyaml had the resolvers/constructors as instance attributes instead of class attributes there would be no problem in doing what you said, or even

myloader = yaml.Loader(...)
myloader.add_constructor/resolver(...)

without ever needing to subclass yaml.Loader

Lastly, this is how even pyyaml itself does it for its default Resolver class:

class BaseResolver(object):

    DEFAULT_SCALAR_TAG = u'tag:yaml.org,2002:str'
    DEFAULT_SEQUENCE_TAG = u'tag:yaml.org,2002:seq'
    DEFAULT_MAPPING_TAG = u'tag:yaml.org,2002:map'

    yaml_implicit_resolvers = {}
    yaml_path_resolvers = {}

    def __init__(self):
        self.resolver_exact_paths = []
        self.resolver_prefix_paths = []
(snip)
class Resolver(BaseResolver):
    pass

Resolver.add_implicit_resolver(
        u'tag:yaml.org,2002:bool',
        re.compile(ur'''^(?:yes|Yes|YES|no|No|NO
                    |true|True|TRUE|false|False|FALSE
                    |on|On|ON|off|Off|OFF)$''', re.X),
        list(u'yYnNtTfFoO'))

Resolver.add_implicit_resolver(
        u'tag:yaml.org,2002:float',
        re.compile(ur'''^(?:[-+]?(?:[0-9][0-9_]*)\.[0-9_]*(?:[eE][-+][0-9]+)?
                    |\.[0-9_]+(?:[eE][-+][0-9]+)?
                    |[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\.[0-9_]*
                    |[-+]?\.(?:inf|Inf|INF)
                    |\.(?:nan|NaN|NAN))$''', re.X),
        list(u'-+0123456789.'))
(snip)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, luck brought it and I pasted pyyaml's faulty regex for floats in the example above (the last part), which this PR replaces with MockRegexPattern(float)

@wkerzendorf wkerzendorf merged commit 681bf59 into tardis-sn:master May 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants