-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework the safety related API code #189
Conversation
The main change is that 'Danger' has been renamed to 'Python' and that the default `dump()` and `dump_all()` functions use the 'Python' schema to be able to dump any Python data structure. NOTE: In YAML, 'Schema' is used to mean all the semantics and rules of what a YAML document means and how it is processed. The `load()` and `load_all()` functions continue to use the Safe schema. The dump() and load() sugar functions should be similar in that they both do the must useful and safe operations. There are top level functions for each schema (Safe and Python) and those functions should be used when feeding data from one system to the other and expecting the same semantics (schema): * safe_dump safe_dump_all * safe_load safe_load_all * python_dump python_dump_all * python_load python_load_all When we have a schema language for YAML, the generic methods with be: * yaml.dump(node, Schema='foo.schema') * yaml.load(yaml, Schema='foo.schema') A loader class like SafeLoader is a loader with a hardcoded schema. Right now pyyaml has 2 schemas: * Python - serialize any python data * Safe - only serialize in a way that won't trigger code 'Danger' was used in response to a situation where people were caught unaware that something bad could happen in a seemingly normal, default situation. Now we've fixed the default to be safe, and Safe is an OK name for a schema, but Danger really is not. It's not the purpose of the schema to be dangerous. The purpose is to serialize Python data structures. The danger_ API functions can be removed because they have only been released for a couple days and they aren't documented anywhere. ---- This also fixes a bug in that safe_load() and load() were aliases. They shouldn't be, because load() accepts a Loader kwarg, and safe_load() should not. ie safe_load(yaml, Loader=PythonLoader) shouldn't be allowed.
I do not believe the name python_load adequately conveys that using it on
untrusted input is the moral equivalent of calling eval on it.
…On Thu, Jun 28, 2018 at 6:02 PM Ingy döt Net ***@***.***> wrote:
The main change is that 'Danger' has been renamed to 'Python' and that
the default dump() and dump_all() functions use the 'Python' schema
to be able to dump any Python data structure.
NOTE: In YAML, 'Schema' is used to mean all the semantics and rules of
what a YAML document means and how it is processed.
The load() and load_all() functions continue to use the Safe schema.
The dump() and load() sugar functions should be similar in that they
both do the must useful and safe operations.
There are top level functions for each schema (Safe and Python) and
those functions should be used when feeding data from one system to the
other and expecting the same semantics (schema):
- safe_dump safe_dump_all
- safe_load safe_load_all
- python_dump python_dump_all
- python_load python_load_all
When we have a schema language for YAML, the generic methods with be:
- yaml.dump(node, Schema='foo.schema')
- yaml.load(yaml, Schema='foo.schema')
A loader class like SafeLoader is a loader with a hardcoded schema.
Right now pyyaml has 2 schemas:
- Python - serialize any python data
- Safe - only serialize in a way that won't trigger code
'Danger' was used in response to a situation where people were caught
unaware that something bad could happen in a seemingly normal, default
situation. Now we've fixed the default to be safe, and Safe is an OK
name for a schema, but Danger really is not. It's not the purpose of the
schema to be dangerous. The purpose is to serialize Python data
structures.
The danger_ API functions can be removed because they have only been
released for a couple days and they aren't documented anywhere.
------------------------------
This also fixes a bug in that safe_load() and load() were aliases. They
shouldn't be, because load() accepts a Loader kwarg, and safe_load()
should not. ie safe_load(yaml, Loader=PythonLoader) shouldn't be
allowed.
------------------------------
You can view, comment on, or merge this pull request online at:
#189
Commit Summary
- Rework the safety related API code
File Changes
- *M* lib/yaml/__init__.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-0> (89)
- *M* lib/yaml/cyaml.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-1> (20)
- *M* lib/yaml/dumper.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-2> (10)
- *M* lib/yaml/loader.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-3> (10)
- *M* lib3/yaml/__init__.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-4> (89)
- *M* lib3/yaml/cyaml.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-5> (20)
- *M* lib3/yaml/dumper.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-6> (10)
- *M* lib3/yaml/loader.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-7> (10)
- *M* tests/lib/test_constructor.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-8> (4)
- *M* tests/lib/test_recursive.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-9> (6)
- *M* tests/lib3/test_constructor.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-10> (4)
- *M* tests/lib3/test_recursive.py
<https://github.com/yaml/pyyaml/pull/189/files#diff-11> (6)
Patch Links:
- https://github.com/yaml/pyyaml/pull/189.patch
- https://github.com/yaml/pyyaml/pull/189.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#189>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAADBA-sJigXK-qUwOPe9jhov91EC0s1ks5uBVJ2gaJpZM4U8JP->
.
--
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: D1B3 ADC0 E023 8CA6
|
This relates to #187 |
@alex I think this is an issue to be taken care of in the docs. Adding a 'danger_' prefix to things doesn't get across the purpose of the function. The purpose is not to do something dangerous, it's to load python data. Which btw is only potentially dangerous (if you are using untrusted input). The 'safe_' prefix, otoh does indicate the purpose. To load data that matches the safe schema and thus is believed not to trigger code using untrusted data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. I also suggested python_load
. I think it's a good name because it enables !!python/...
tags.
@perlpunk interesting mnemonic 👍 Kirill gave me the pyyaml.org content, so we can update the docs there soon. He also suggested putting it all on https://readthedocs.org/ which I think is a great idea. |
I was very excited to see the renaming to
The danger is multi-faceted:
In my considered opinion, there are exactly 2 times when it is valid to serialize arbitrary objects in this way:
These are both fairly niche applications, neither of which should be the primary use of pyyaml, which is to produce and consume vaguely human-readable configuration documents. So the fact that |
There are a lot of dangerous operations in computing but I've never seen the prefix Any operation that you don't understand is potentially dangerous. The solution is to help people understand what is dangerous (and why) and what is not. We can do that by explaining things well in the docs. I might be OK with adding a The other option is to get rid of both ( Regarding the Regarding the naming of the Loader/Dumper classes, these really need to be named after the schemas they are enforcing. Overall I think naming things 'danger' is both unprecedented and it's going to feel silly to be forced to use it after you understand how things work. I think it's a short-sighted solution, just as thinking of the YAML Data Language as a configuration language is a short-sighted point of view. |
I don't really have much time, but let me provide some things that are separate from the discussion of naming. Let's say I have something I want to serialize with PyYAML. Let's say I've sanitized most of it but accidentally missed a random object and now I use I don't particularly care about the |
@sigmavirus24 perhaps the best thing for now is to not pollute the import namespace with new functions that are not great. Since we can get what we want with explicit calls let's just use those for now. People who want to serialize outside the Safe schema can do so without the sugar. If they don't like it, we will get real world feedback instead of subjective opinions. Note: The |
@sigmavirus24 after thinking more on this I believe it would be wise to revert the #74 merge and let it sit out the 4.2 release. There is a ton going on already and this commit is the most contentious and breaks backward compatibility for almost all usage. I am sorry that I wasn't paying attention when this got merged last August, but then again something of this magnitude shouldn't have gotten in without my signoff. I'm quite sure I would have had very similar reactions back then. I agree that this is issue is very important, but I think it deserves a release of its own, and should not be lumped in with an already heavily overloaded 4.2 release. |
|
If you're going to reverse course on this, I think this CVE needs to be updated to indicate that subsequent versions are still vulnerable? http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-18342 |
@glyph there is nothing inherently dangerous about the re the CVE, technically the course is already reversed as 4.1 was retracted from PyPI earlier today. |
From my standpoint, I think the danger* terms are potentially useful for sugar for the exposed interface that people commonly use as they signal "dragons!". I think this option
is also a good one. The critical aspect of it is that you're doing something specific to overcome safe-by-default. This might also be reasonable for With regard to where to go from here: safe-by-default seems to me like the keystone feature of a big new PyYAML 4, so releasing something without it doesn't seem like the way to go. (And to be self-serving, but the sugar aspect of things is less critical to me than getting a clear and consistent definition of what "safe" will mean: #187.) |
Just throwing in some thoughts. If the majority thinks there has to be a shortcut for I'm coming from perl and did not know what kinds of exploits are possible with In pyyaml, the situation seems to be different, and it's very easy to inject any code. So I'm in favor of dropping I still don't like the name |
@perlpunk The old |
obsolete, see #257 |
The main change is that 'Danger' has been renamed to 'Python' and that
the default
dump()
anddump_all()
functions use the 'Python' schemato be able to dump any Python data structure.
NOTE: In YAML, 'Schema' is used to mean all the semantics and rules of
what a YAML document means and how it is processed.
The
load()
andload_all()
functions continue to use the Safe schema.The dump() and load() sugar functions should be similar in that they
both do the must useful and safe operations.
There are top level functions for each schema (Safe and Python) and
those functions should be used when feeding data from one system to the
other and expecting the same semantics (schema):
When we have a schema language for YAML, the generic methods with be:
A loader class like SafeLoader is a loader with a hardcoded schema.
Right now pyyaml has 2 schemas:
'Danger' was used in response to a situation where people were caught
unaware that something bad could happen in a seemingly normal, default
situation. Now we've fixed the default to be safe, and Safe is an OK
name for a schema, but Danger really is not. It's not the purpose of the
schema to be dangerous. The purpose is to serialize Python data
structures.
The danger_ API functions can be removed because they have only been
released for a couple days and they aren't documented anywhere.
This also fixes a bug in that safe_load() and load() were aliases. They
shouldn't be, because load() accepts a Loader kwarg, and safe_load()
should not. ie safe_load(yaml, Loader=PythonLoader) shouldn't be
allowed.