-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YAML: let String handle numbers too #7809
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve been using String | Int64 on mappings and then a to_s manually. So 👍 on my side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This change is unacceptable. It makes it impossible to use type safe YAML parsing. The previous behaviour is not a bug. It doesn't need to be fixed. |
@straight-shoota Could you provide a use case where this behavior is not desired? |
I'm also having a hard time understanding why this is something desirable:
|
Or another question: how would you map the |
There is https://crystal-lang.org/api/master/String/RawConverter.html, but it is currently only for JSON. Having that support YAML as well would allow for this. They would just have to specify that converter. |
Right, but in JSON every type has a different representation. In YAML that's not exactly like that because you have schemas, and also a proof of this is that using |
Specifying the type should be enough in resolving ambiguities, if there are such. There's nothing wrong in having user-friendly, loose (in terms of YAML spec) parser, which works "just right" in such cases, with no negative side-effects (perhaps I'm wrong here, but I don't see any...). |
Now if you parse with But I think I'm just repeating what I said in the other thread... |
libyaml emits a scalar, but this is very low level. YAML is based on elaborate schemas which describe how to interpret scalars. While having the same string representation, The "misinterpretation" described in #5798 is actually a mismatch between data format and schema.
Every time someone intends to use YAML as a type safe data format. Even YAML's null value is suddenly interpreted as a string. Mapping (String | Time).from_yaml("2019-05-27 16:38:42.536517000").class # => String
(String | UInt32).from_yaml("1").class # => String Crystal uses the Core schema as default and according to that schema, these values should be interpreted as For the use case this PR intends to solve, using the Failsafe schema would be an appropriate solution. This schema only contains the most basic data types sequence, mapping and scalar. Any value, that is not a sequence or mapping is interpreted as a string. This is somewhat similar to Currently, the Failsafe schema is only supported for parsing a YAML::Schema::FailSafe.parse("2.8").as_s # => "2.8" A logical expansion would be to provide a way to use the Failsafe scheme with This PR should be reverted because it introduces incorrect behaviour. |
Exactly! There's the core schema, there's the fail-safe schema. And in my mind,
Yes, in the Core schema and when you do
I think
I will say that union types in YAML are pretty rare. I wouldn't expect, ever, to have a value that can either be a String or a Time. It's always one value. When it's a union it's usually a composite type. But that type is then usually distinguished by another key. And we can always fix this by considering
I think the problem of wanting to read something as a String but not being able to do that because it looks like a number is not specific to a specific key-value. It's a general problem. Finally: why C#, Java and Go behave like this PR? |
Until this PR, you could only map matching types. And that's a great default because it fails when there is an unexpected value instead of simply digesting it, which might result in unexpected behaviour later on.
I don't follow. What does
Yes, you wouldn't use the Failsafe schema for mapping a specific key, but for parsing the entire YAML document.
Do you have examples for that? |
I don't have time to set up a Java project right now, but if you can try it I'm pretty sure mapping a String works like this PR. |
I just tried it in Java: import org.yaml.snakeyaml.Yaml;
import org.yaml.snakeyaml.constructor.Constructor;
public class Main {
public static class Customer {
public String firstName;
}
public static void main(String[] args) throws Exception {
new Main().main();
}
public void main() throws Exception {
Yaml yaml = new Yaml(new Constructor(Customer.class));
Customer customer = (Customer) yaml.load("firstName: 123");
System.out.print(customer.firstName);
}
} Works just fine :-) |
Ooops, sry. I couldn't recall these examples. Thanks for pointing it out. At least YAMLDotNet and snakeyaml only implement YAML 1.1, which means untagged nodes are not resolved and their interpretation is completely application-defined. This mostly resembles the fail safe schema of YAML 1.2 For example, an explicitly tagged string can be mapped to a time instance: time := time.Time{}
yaml.Unmarshal([]byte("!!str 2006-01-02T15:04:05Z"), &time) In Crystal, this does not work, because Time.from_yaml("!!str 2006-01-02T15:04:05Z") # Error: Expected Time, not 2006-01-02T15:04:05Z But String.from_yaml("!!float not even a valid float") # => "not even a valid float" This should definitely be an error. I suppose this could be fixed, though. But what about this: String.from_yaml("!!float 1.00") # => "1.00"
String.from_yaml("!!float 1") # => "1" According to the YAML specification, YAMLDotNet for example treats them equally: deserializer.Deserialize<String>(new StringReader("!!float 1.00")) // => "1"
deserializer.Deserialize<String>(new StringReader("!!float 1")) // => "1" This means it correctly resolves the value according to the specified tag and then maps that value (a YAML float) to a .NET string. The approach of using the fail safe schema + custom type mappings can certainly be useful. But Crystal's entire YAML library is based on the core schema which is recommended as default schema. It provides tag resolution for the most common data types. The current state after this PR is an inconclusive mixture between failsafe and core schema, which is not a desirable state. It should be either one, or even better, optional support for both of them (plus eventually JSON schema). |
Maybe we should fail on explicitly tagged valued. But I disagree about everything else. We should take pragmatism into account. For me that @bcardiff had to use a union and then call to_s is not acceptable, and that people keep bumping into this issue isn't acceptable either. Let's keep this PR merged for a couple of releases. I'm pretty sure nobody will complain about this anymore, practically (theoretically we could continue discussing what's right or wrong). |
I'm not at all against a pragmatic solution. And I totally agree that union + to_s is not practical, it can even be incorrect (when a value is normalized). But please let's introduce a standard-conform solution, not a weird hack that essentially creates a hybrid between fail safe and core schema, specific to the A proper solution is to fully support YAML parsing using the fail safe schema. This would fix #5798 and work similar to the implementations in other languages as mentioned above, being fully compliant to the YAML specification. |
@straight-shoota But doesn't the failsafe schema just support strings? Not even floats or ints. I really don't think there's any conflict here. They are two different schemas with no interaction. I also want to add the note that tagging isn't common at all and I've never seen it used (mainly because YAML is mainly for humans, and humans don't worry/think about tags). So yes, we could theoretically worry about it and try to make a best effort, but maybe it's not worth it. |
No. The failsafe schema supports any data type in YAML. It just doesn't provide any default tag resolution for untagged scalars and leaves that to the application. So, for example when the scalar value That's how go-yaml, YAMLDotNet and snakeyaml work and the proper implementation for the problem described in #5798. In the core schema on the other hand,
For example:
Exactly. But this PR combines behaviour of the fail safe schema (all plain scalars map to string) with the core schema (plain scalars resolve to a specific type depending on their format). |
However, when I parse with But maybe I shouldn't base my thoughts on how it should work based on other implementations. I guess... I'll stop commenting here. Feel free to come up with a good implementation and submit a PR or an RFC. Thank you! |
Fixes #5798
I already gave a lot of reasons why this is a good/correct change (every other statically typed language does this), plus we recently bumped into this with something like:
If I say I want to read the version as a string, there's no reason why I have to go and modify the yaml file to add an annotation like
!string
or put it in quotes. People don't do that, at all. And there's no point forcing people to do that (and sometimes it's impossible because the yaml is not in their control).Or imagine have some kind of code:
That looks like a String. Now this (imagine someone is editing it manually, typing it):
Oops! You just input something that looks like a floating point number. You should be more careful when that happens and put quotes around it. Right? No... people don't do that, they don't care about how the machine will work in this case, they are just editing a string in their mind. And I think that's fine.