-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
order in dict is not preserved #110
Comments
This is a property of Python, not PyYAML. Python does not preserve the order of dictionaries and so we cannot either. To do so, you'd have to |
@sigmavirus24, that's not completely true. import yaml
document = """
b:
c: 3
d: 4
a: 1
"""
dictionary = yaml.safe_load(document)
print(dictionary)
print(yaml.dump(dictionary))
So it would seem that Python does preserve the order of the dictionary. In fact, the sorting is done by PyYAML during A whole bunch of people have created forks/extensions to PyYAML specifically to get around this issue, so it would be nice if it was fixed in PyYAML itself. |
Seems python does only guarantee to keep the insertion order of dicts since 3.7: |
An option to not sort would be great, thanks, and fine for my intended use case, but if Python doesn't sort by default then perhaps PyYAML shouldn't sort by default either? |
@shoogle I guess both defaults can make sense. Important for me would be backwards compatibility. Pull requests are welcome. I don't know when I can implement that, I'm busy for a couple of weeks, and additionally I only just started to learn Python ;-) |
I created a PR #143 |
I think a fundamental problem is that the yaml-specs do not guarantee an order. As PyYAML is a yaml parser, guaranteeing order seems like a slight breach of the yaml specs. That's why there is Phynix/yamlloader, which is based on PyYAML but extends the functionality by explicitly keeping the order or OrderedDicts (and dicts for Python 3.7+). Though I wanna stress out, that this actually breaks the yaml specifications! But it is still useful... My proposition would be not to guarantee that behavior directly in Any thoughts on that? (This of course is not a vote against the |
I wrote a drop-in replacement to address this problem: https://github.com/wimglenn/oyaml |
Correct.
Wrong. Since the spec doesn't guarantee an order, that means any order is valid. PyYAML could return dict keys in any arbitrary order (alphabetical, reverse-alphabetical, shortest first, random, order of creation, etc.) and it would still be perfectly consistent with the YAML specification. In practice, the only ordering that makes any sense is the order in which they were created, because if they are returned in a different order then the information about which was created first is lost forever. If the user requires any other form of ordering (alphabetical, etc.), then he/she is able to sort the dict themself after it has been returned in creation order. However, if the dict is not returned in creation order then the user can never put it back in creation order (except by a lucky guess). It is for this very reason that, since Python 3.7, dictionaries are ordered by default as a feature of the language (and not just as an implementation detail as they were in 3.6). This is why I think returning in creation order should be the default in PyYAML (at least for Python >= 3.7) and not just an option, though I understand the desire to ensure backwards compatibility. (It should be noted, however, that nobody complained when Python dicts became ordered by default, even though it could be seen as a backwards-incompatible change.) |
I agree with @shoogle that, while the Spec does not guarantee order, it's not a requirement to return keys in random order. Regarding backwards compatibility, people might rely on the current behaviour that keys are sorted. |
@shoogle you are right, I've formulated things wrong: guaranteeing an order does not break the yaml specs of course, but extends them. And while there is definitely more use in returning insertion-ordered dicts, the question is whether this should be guaranteed inside the basic converter PyYAML or be "sold as an extension in an extension". I think the question really boils down to the "problem" of backwards compatibility: I could not find it, but does PyYAML guarantee somewhere that the dumping will be sorted? Otherwise: Python never guaranteed an order/sorting of the dicts (up to 3.7) and neither does yaml (or PyYAML, if the sorting was not a guaranteed feature). So no one actually could have relied on any kind of sorting. I guess: who really relied on the insertion-order or any other kind of ordering used So: yes, I am in favor of it, let's extend the yaml specs in order to stick closely to the python specs and guarantee the order preserving behavior in PyYAML for Python 3.7+. For < 3.7, the order should not matter (assuming it was never guaranteed to be sorted). |
@mayou36, if insertion-ordering is optional, as it is in PR #143, then there is no problem with backwards compatibility. Furthermore, as you say, PyYAML made no guarantees about ordering anyway, so it would not be breaking the API to change to insertion-ordering by default. I'm not saying it should happen right away, but maybe after one or two releases where it was provided as an option. |
@shoogle I fully agree, insertion-order could even be the default and sorting has to be set. What I meant was: if you want to write 3.x (and not just 3.7 (3.6) + ) compatible code relying on any kind of dict ordering, |
@shoogle I also agree. Over time I think the path of least resistance would be to adhere as closely as possible to how Python does it. That would mean ordered by creation by default for Python >=3.6. |
I would actually stick to > 3.6, not >=. It is mentioned as an implementation detail in CPython and not as a language feature. It doesn't matter a lot for someone if it is not yet available in 3.6, but if someone uses it with 3.6 in an implementation where the insertion order is not kept (although being a rare case probably), I think this is the bigger issue. |
YAML dumpers can (and probably should) dump mappings with their keys sorted (by default) in environments where insertion order is not preserved. PyYAML sorts keys doesn't have an option not to. Having keys in a deterministic order is generally more useful than not. The most correct and useful thing to do here is to provide a
It looks like @perlpunk++'s #143 does this. I'll try to get it released soon. |
I made a different thing for myself:
calling with
Admittedly ugly, but allows whatever sort order is desired |
@wimglenn Thanks! Worked great! |
If it is any consolation, Python's JSON module preserves order when dumping. Since oyaml now exists, it's not that big of an issue but I just thought I'd throw it out there. |
I battled this last year and 'solved' it as indicated above. So, a very hackish 'solution' is to just comment out the try block in representer.py: try:
mapping = sorted(mapping)
except TypeError:
pass This should be a feature of yaml.dump, same as json.dumps(foo, sort_keys=False) |
|
@feluxe if the only alternative is a random key order (like for python < 3.7 and many other languages), sorted keys sounds like a pretty useful default ;-) |
@perlpunk Before |
Agree with both of you. Like I mentioned above, json.dumps uses sort keys and defaults to true but can be set to false. That functionality should be added to PyYaml for sure. Simply need a parameter based conditional around the try block I posted above. |
@perlpunk, the original order is not "random" order, please stop caling it that. |
@perlpunk Just as a quip and curiosity note, the order is not random. It's "arbitrary", which means it's consistent but not to be relied upon before 3.7. You can however make it truly random by either specifying PYTHONHASHSEED=random or -R when you invoke python. |
@jasweet, you are actually wrong, |
So, I have to use yet another package (oyaml), or is this going to be fixed anytime soon? :-) |
Fixed by #254 |
The referenced fix is for the dumper. Has the loader also been fixed? |
@orodbhen, I believe it was only the dumper that was broken. As long as you are using a version of Python >= 3.6, if you print a YAML dictionary (rather than dump it) then it prints in insersion order regardless of PyYAML version. |
I just install 5.1.1 using pip, and it does look like this has all been fixed for both the loader and the dumper. The loader preserves the order now by default, whereas the dumper requires setting |
I am not sure if this would best be also featured and perhaps exemplified in the documentation, the length of the above discussion may seem overly daunting for the shallow user (aka myself). |
btw: pprint is sorting too! -___-
|
Python 3.6.3
import yaml
document = """
b:
c: 3
d: 4
a: 1
"""
print(yaml.dump(yaml.load(document), default_flow_style=False))
a: 1
b:
c: 3
d: 4
The text was updated successfully, but these errors were encountered: