-
-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the way YAMLLoader resolves some properties #572
Conversation
Instead of using a regular expression to implicitly resolve quantities, actually try to convert the property to astropy.units.Quantity and see if that succeeds. While this is more reliable, it is also much slower.
Instead of using yaml's faulty regular expression to implicitly resolve floats, actually try to convert the property to float and see if that succeeds. With this change values like 5e3 will successfully get parsed as floats, while pyyaml required a format of 5.e+3 While this is more reliable, it is also much slower.
@ftsamis travis didn't run on that. I wonder what is wrong. |
I wonder if a couple of quick rebases I did messed it up. I'll try and get Travis run on this. |
@wkerzendorf Got it! |
data = self.construct_scalar(node) | ||
return quantity_from_str(data) | ||
|
||
YAMLLoader.add_constructor(u'!quantity', YAMLLoader.construct_quantity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a dump question but can't we do this in YAMLLoader.__init__
?
I'm talking about all the 3 lines, sry if the location of the comment is bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly because these class methods (add_constructor
, add_implicit_resolver
) alter the class itself, not its instances. They change YAMLLoader.yaml_constructors
YAMLLoader.yaml_implicit_resolvers
which are attributes of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then one would need to hook into __new__
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, this is the proper way to do it. __new__
is used to create a new instance (and __init__
to initialize it). A senseful alternative to how I currently do it would be something like:
class YAMLLoader(yaml.Loader):
yaml_implicit_resolvers = yaml.Loader.yaml_implicit_resolvers + new_implicit_resolvers # Pseudocode
yaml_constructors = yaml.Loader.yaml_constructors + new_constructors # Pseudocode
but this is not what I chose to do because there are some extra things that add_implicit_resolver
takes care of and because this is why it is there for.
This is a common thing when you need to change a class. If you choose to call these classmethods on instance creation/initialization you'll find yourself adding the same stuff multiple times since they actually change the class attributes, which are shared between all instances. Then you'd have to do a check like if my_new_resolver in YAMLLoader.yaml_implicit_resolvers then don't_add_it_again
which doesn't make any sense to do.
This is also the reason I created a separate class. I could simply do yaml.Loader.add_constructor/resolver()
but then anything that would use yaml.load()
from this point and on would be forced to have my customizations.
If pyyaml had the resolvers/constructors as instance attributes instead of class attributes there would be no problem in doing what you said, or even
myloader = yaml.Loader(...)
myloader.add_constructor/resolver(...)
without ever needing to subclass yaml.Loader
Lastly, this is how even pyyaml itself does it for its default Resolver class:
class BaseResolver(object):
DEFAULT_SCALAR_TAG = u'tag:yaml.org,2002:str'
DEFAULT_SEQUENCE_TAG = u'tag:yaml.org,2002:seq'
DEFAULT_MAPPING_TAG = u'tag:yaml.org,2002:map'
yaml_implicit_resolvers = {}
yaml_path_resolvers = {}
def __init__(self):
self.resolver_exact_paths = []
self.resolver_prefix_paths = []
(snip)
class Resolver(BaseResolver):
pass
Resolver.add_implicit_resolver(
u'tag:yaml.org,2002:bool',
re.compile(ur'''^(?:yes|Yes|YES|no|No|NO
|true|True|TRUE|false|False|FALSE
|on|On|ON|off|Off|OFF)$''', re.X),
list(u'yYnNtTfFoO'))
Resolver.add_implicit_resolver(
u'tag:yaml.org,2002:float',
re.compile(ur'''^(?:[-+]?(?:[0-9][0-9_]*)\.[0-9_]*(?:[eE][-+][0-9]+)?
|\.[0-9_]+(?:[eE][-+][0-9]+)?
|[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\.[0-9_]*
|[-+]?\.(?:inf|Inf|INF)
|\.(?:nan|NaN|NAN))$''', re.X),
list(u'-+0123456789.'))
(snip)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, luck brought it and I pasted pyyaml's faulty regex for floats in the example above (the last part), which this PR replaces with MockRegexPattern(float)
☺
When specifying an implicit resolver for a YAML property, a regular expression to match this property type is required. That's what I did in #515 for parsing quantities as objects.
The problem with this is that we need a fairly complex regular expression to match all cases, and in my regex I had indeed missed one: infinity valued quantities weren't parsed as quantities. Also, quantity units weren't checked if they were valid, since this would lead to a huge regex (for all units), and reinventing the wheel.
In a discussion with @wkerzendorf we both agreed that just correcting the regex to recognize infinity as a valid quantity value might not completely solve the issue, since it is possible that my regex can still miss something.
So, I implemented his proposal which was a minimal mock class for re patterns which instead of checking if a string could be quantity based on a regular expression it would actually try to convert it to a Quantity.
It needs to be mentioned that this is quite hackish, and while it is much more reliable in telling if a string can be a quantity, it is also 2-3 times slower than regex matching.