-
Notifications
You must be signed in to change notification settings - Fork 520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate keys are not handled properly #165
Comments
some discussion of workarounds here: https://gist.github.com/pypt/94d747fe5180851196eb |
I am also getting same issue but its slight different click here: gunthercox/ChatterBot#1640 could any one find the solution |
* Update placement from branch 'master' - Merge "Fix a bad granular gabbi test" - Fix a bad granular gabbi test A gabbi test for multiple member_of<N> was using the `query_parameters` keyword to construct the querystring; but it was trying to test what happens when member_of<N> is specified multiple times. In this test, N was the same for both, but the query parameters were being entered as separate dict keys; so the querystring was only being constructed with the latter value, because yaml [1]. The test was passing spuriously because it happens to be the case that the result would be the same if only the latter value is specified. (If you reversed the order of the two member_ofZ, the test failed.) This change re-YAMLs the query_parameters to use list syntax for the values of the member_of key, from which gabbi will dtrt in constructing the querystring. NB: It might be nice to find a test scenario where the false positive wouldn't have been possible; but that would be a bigger effort that could possibly entail reswizzling the GranularFixture and therefore the whole gabbit. Done this way, we're at least sure that both values are making it to the handler; and switching the order in the querystring has no effect (though the order is apparently not guaranteed/deterministic anyway [2]). [1] yaml/pyyaml#165 [2] https://gabbi.readthedocs.io/en/latest/example.html (search for query_parameters) Change-Id: I10f28d8c21643be69b67f25dcc043cd9640eac42
A gabbi test for multiple member_of<N> was using the `query_parameters` keyword to construct the querystring; but it was trying to test what happens when member_of<N> is specified multiple times. In this test, N was the same for both, but the query parameters were being entered as separate dict keys; so the querystring was only being constructed with the latter value, because yaml [1]. The test was passing spuriously because it happens to be the case that the result would be the same if only the latter value is specified. (If you reversed the order of the two member_ofZ, the test failed.) This change re-YAMLs the query_parameters to use list syntax for the values of the member_of key, from which gabbi will dtrt in constructing the querystring. NB: It might be nice to find a test scenario where the false positive wouldn't have been possible; but that would be a bigger effort that could possibly entail reswizzling the GranularFixture and therefore the whole gabbit. Done this way, we're at least sure that both values are making it to the handler; and switching the order in the querystring has no effect (though the order is apparently not guaranteed/deterministic anyway [2]). [1] yaml/pyyaml#165 [2] https://gabbi.readthedocs.io/en/latest/example.html (search for query_parameters) Change-Id: I10f28d8c21643be69b67f25dcc043cd9640eac42
I just wanted to comment that in my case, the overwrite is a feature and not a bug. |
This seems to duplicate #41. |
@parrenin I think this is misguided. Since this violates the spec, you don't know whether the original or the overridden value would be active (without looking at the implementation, which could change at any time, since it is not part of the spec.) Simply reading the "base" and "overriding" settings as two separate files, and combining as you see fit would be more correct. |
The documentation, specifically "Deviations from the specification" in https://pyyaml.org/wiki/PyYAMLDocumentation should be updated. |
I looked in the code, and fixing this is pretty simple (famous last words). I was going to submit a pull request, but there are actually tests validating that duplicate keys ARE ALLOWED, so obviously the maintainers have some different opinions about this. The patch at https://gist.github.com/marvingreenberg/ba2f67f740400f722680fc7c538e94a0 does prevent duplicate keys, and seems correct. But, there are then 4 test failures. Three of them seem simple to simply disable (or change to verify that an exception is raised). The fourth I don't quite understand, and may have something to do with places where YAML becomes "not actually so simple" like references and other subtle things. If pyyaml simply intends to keep this behavior (since 2006) I'd say closing this case and updating the documentation is warranted. |
And not surprisingly, the patch I include is incomplete since there is a separate constructor for a yaml "ordered mapping" which specifically has a comment that "# Note: we do not check for duplicate keys, because it's too CPU-expensive." (which again seems misguided to me). |
One problem is that this logic would break the functionality of merge keys: To fix this, we would need a seperate handling of the merge keys and the actual keys in the mapping. I don't know which tests were failing (haven't tested your patch yet), but I guess the ones with merge keys were among them. |
If you want I can take the trouble of making a pull request, and try my best at updating the tests to be complete. I am a little unclear about what the goals, according to the folks in a better position to decide (the maintainers), are with respect to this particular issue. I think it's important to follow the spec -- silently and without documentation ignoring the spec seems wrong to me. But, YAML is definitely more complicated than "oh, its just a simple little thing". |
@marvingreenberg yeah, I think forbidding duplicate keys should be the default. But merge key behaviour should stay the same. @ingydotnet is the one to decide about this. |
I think this sounds like a use case for merge keys, actually. |
Part of the fix for: pyvec/naucse#2 Workaround for: yaml/pyyaml#165 The workaround is incomplete, but should be OK for our uses.
The YAML 1.1 spec also says that keys should be unique: https://yaml.org/spec/1.1/#id861060
|
Part of the fix for: pyvec/naucse#2 Workaround for: yaml/pyyaml#165 The workaround is incomplete, but should be OK for our uses.
@perlpunk Does that include the merge keys themselves, or only the keys after a resolved merge? e.g.
|
@wimglenn the correct way to use multiple merge keys is actually:
That way duplicate merge keys can be avoided. Other than that, the spec is actually not very specific about equality of nodes. |
Then what about
I know merge spec says that splat0 wins in case of duplicate keys in splat0 & splat1, but what does it say about dupes outside of the merge caused by resolving the merge? |
@wimglenn Duplicates should only be considered in the actual mapping definition itself. Merge keys are explicitly there for merging, so they should override keys. By the way, the spec https://yaml.org/type/merge.html is not very detailed about this, but from how I read the spec and the example, it does not matter where the merge key appers - as the first key, in the middle or at the end. The order is always the same:
|
The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate keys with the new value (see yaml/pyyaml#165). This would have caught a nasty bug that showed up in https://github.com/pytorch/pytorch/pull/66225/files#r723796194. I tested it on that linked PR, to confirm that it errors correctly (and gives the line number containing the duplicate). Differential Revision: [D31464585](https://our.internmc.facebook.com/intern/diff/D31464585) [ghstack-poisoned]
The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate keys with the new value (see yaml/pyyaml#165). This would have caught a nasty bug that showed up in https://github.com/pytorch/pytorch/pull/66225/files#r723796194. I tested it on that linked PR, to confirm that it errors correctly (and gives the line number containing the duplicate). Differential Revision: [D31464585](https://our.internmc.facebook.com/intern/diff/D31464585) [ghstack-poisoned]
The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate keys with the new value (see yaml/pyyaml#165). This would have caught a nasty bug that showed up in https://github.com/pytorch/pytorch/pull/66225/files#r723796194. I tested it on that linked PR, to confirm that it errors correctly (and gives the line number containing the duplicate). Differential Revision: [D31464585](https://our.internmc.facebook.com/intern/diff/D31464585) [ghstack-poisoned]
The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate keys with the new value (see yaml/pyyaml#165). This would have caught a nasty bug that showed up in https://github.com/pytorch/pytorch/pull/66225/files#r723796194. I tested it on that linked PR, to confirm that it errors correctly (and gives the line number containing the duplicate). Differential Revision: [D31464585](https://our.internmc.facebook.com/intern/diff/D31464585) [ghstack-poisoned]
The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate keys with the new value (see yaml/pyyaml#165). This would have caught a nasty bug that showed up in https://github.com/pytorch/pytorch/pull/66225/files#r723796194. I tested it on that linked PR, to confirm that it errors correctly (and gives the line number containing the duplicate). Differential Revision: [D31464585](https://our.internmc.facebook.com/intern/diff/D31464585) [ghstack-poisoned]
The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate keys with the new value (see yaml/pyyaml#165). This would have caught a nasty bug that showed up in https://github.com/pytorch/pytorch/pull/66225/files#r723796194. I tested it on that linked PR, to confirm that it errors correctly (and gives the line number containing the duplicate). Differential Revision: [D31464585](https://our.internmc.facebook.com/intern/diff/D31464585) [ghstack-poisoned]
YAML spec in version 1.2 says that in a valid YAML file, mapping keys are unique.
This is not the case in pyyaml, as can be seen by loading this sample file:
The correct result from loading this file is an error. pyyaml instead naively overwrites results with the last key, resulting in this dict:
{'a': {'q': 'b'}}
.The text was updated successfully, but these errors were encountered: