Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack Overflow (72997432, 73012922) #107

Closed
Google-Autofuzz opened this issue Feb 9, 2018 · 3 comments · Fixed by #127 or tarantool/libyaml#1
Closed

Stack Overflow (72997432, 73012922) #107

Google-Autofuzz opened this issue Feb 9, 2018 · 3 comments · Fixed by #127 or tarantool/libyaml#1

Comments

@Google-Autofuzz
Copy link

Hello YAML team,

As part of our fuzzing efforts at Google, we have identified an issue affecting
YAML (tested with revision * master 01f3a87).

To reproduce, we are attaching Dockerfiles which compile the project with
LLVM, taking advantage of the sanitizers that it offers. More information about
how to use the attached Dockerfile can be found here:
https://docs.docker.com/engine/reference/builder/

TL;DR instructions:

  • mkdir project
  • cp Dockerfile.YAML /path/to/project/Dockerfile
  • docker build --no-cache /path/to/project
  • docker run -it image_id_from_docker_build

From another terminal, outside the container:
docker cp /path/to/attached/reproducer running_container_hostname:/fuzzing/reproducer
(reference: https://docs.docker.com/engine/reference/commandline/cp/)

And, back inside the container:
/fuzzing/repro.sh /fuzzing/reproducer

Alternatively, and depending on the bug, you could use gcc, valgrind or other
instrumentation tools to aid in the investigation. The sanitizer error that we
encountered is here:

=================================================================
==11==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc65f40ef8 (pc 0x0000004b7057 bp 0x7ffc65f41770 sp 0x7ffc65f40f00 T0)
    #0 0x4b7056 in __asan_memcpy (/fuzzing/yaml_fuzzer+0x4b7056)
    #1 0x7fd318ef049d in yaml_parser_save_simple_key /fuzzing/libyaml/src/scanner.c:1119:35
    #2 0x7fd318ee734f in yaml_parser_fetch_flow_collection_start /fuzzing/libyaml/src/scanner.c:1456:10
    #3 0x7fd318ee3257 in yaml_parser_fetch_more_tokens /fuzzing/libyaml/src/scanner.c:846:14
    #4 0x7fd318f06a86 in yaml_parser_parse_flow_sequence_entry /fuzzing/libyaml/src/parser.c:961:13
    #5 0x7fd318f0b4f1 in yaml_parser_load_sequence /fuzzing/libyaml/src/loader.c:355:10
    #6 0x7fd318f0b5cb in yaml_parser_load_sequence /fuzzing/libyaml/src/loader.c:361:22
[...]
    #251 0x7fd318f0b5cb in yaml_parser_load_sequence /fuzzing/libyaml/src/loader.c:361:22

SUMMARY: AddressSanitizer: stack-overflow (/fuzzing/yaml_fuzzer+0x4b7056) in __asan_memcpy
==11==ABORTING

And:

=================================================================
==11==ERROR: AddressSanitizer: stack-overflow on address 0x7ffe61b66f18 (pc 0x0000004b7057 bp 0x7ffe61b67790 sp 0x7ffe61b66f20 T0)
    #0 0x4b7056 in __asan_memcpy (/fuzzing/yaml_fuzzer+0x4b7056)
    #1 0x7fb88e1ad49d in yaml_parser_save_simple_key /fuzzing/libyaml/src/scanner.c:1119:35
    #2 0x7fb88e1a434f in yaml_parser_fetch_flow_collection_start /fuzzing/libyaml/src/scanner.c:1456:10
    #3 0x7fb88e1a0257 in yaml_parser_fetch_more_tokens /fuzzing/libyaml/src/scanner.c:846:14
    #4 0x7fb88e1c4e2e in yaml_parser_parse_flow_mapping_key /fuzzing/libyaml/src/parser.c:1112:13
    #5 0x7fb88e1c8f42 in yaml_parser_load_mapping /fuzzing/libyaml/src/loader.c:418:10
    #6 0x7fb88e1c902e in yaml_parser_load_mapping /fuzzing/libyaml/src/loader.c:424:20
[...]
    #251 0x7f718697402e in yaml_parser_load_mapping /fuzzing/libyaml/src/loader.c:424:20
SUMMARY: AddressSanitizer: stack-overflow (/fuzzing/yaml_fuzzer+0x4b7056) in __asan_memcpy
==11==ABORTING

Without some kind of depth limit (even if the default behavior is "unbounded"),
library users parsing untrusted YAML files need to use workarounds such as
preprocessors that will reject invalid payloads, use separate processes for
parsing to compartmentalize impact of a crash, or prevent them from parsing
non-trusted payloads at all.

We will gladly work with you so you can successfully confirm and reproduce this
issue. Do let us know if you have any feedback surrounding the documentation.

Once you have reproduced the issue, we'd appreciate to learn your expected
timeline for an update to be released. With any fix, please attribute the report
to "Google Autofuzz project".

We are also pleased to inform you that your project is eligible for inclusion to
the OSS-Fuzz project, which can provide additional continuous fuzzing, and
encourage you to investigate integration options.

Don't hesitate to let us know if you have any questions!

Google AutoFuzz Team
artifacts_72997432.zip
artifacts_73012922.zip

@perlpunk
Copy link
Member

perlpunk commented Mar 2, 2018

Thanks!
I was able to reproduce it and reduced it to the smallest possible input which is causing it.

I think it should be noted that only the Loader API is affected, because it uses recursion. The Event API seems to be ok.
That means, for example PyYAML and Perl's YAML::XS are not affected by this.

Personally I'm not yet familiar enough with C and stack overflows, so I'm not sure if this happens simply because the recursion is too deep, or the recursive functions accidentally use too much memory and could be improved.
I hope we find someone who can help explaining/fixing this.

@Google-Autofuzz
Copy link
Author

Thanks for looking at this issue!

There's a discussion about a similar issue in a JSON parser here, which has some possible solutions (and discussion of drawbacks of them) for recursive based parsers: nlohmann/json#832

As discussed there, a value configurable by library users (ideally with a sane default) that limits how much recursion can occur (and includes an option to disable this limit for those who prefer the old behavior) seems to strike a nice balance between security and functionality.

tlsa added a commit to tlsa/libyaml that referenced this issue Jul 25, 2018
The document loading API (yaml_parser_load) was susseptable to a
stack overflow issue when given input data which opened many
mappings and/or sequences without closing them.

This was due to the use of recurion in the implementation.

With this change, we avoid recursion, and maintain our own loader
context stack on the heap.

The loader context contains a stack of document node indexes.
Each time a sequence or mapping start event is encountered,
the node index corrasponding to the event is pushed to the
stack.  Each time a sequence or mapping end event is encountered,
the corrasponding node's index is popped from the stack.

The yaml_parser_load_nodes() function sits on the event stream,
issuing events to the appropriate handlers by type.

When an event handler function constructs a node, it needs to
connect the new node to its parent (unless it's the root node).
This is where the loader context stack it used to find the
parent node.  The way that the new node is added to the tree
depends on whether the parent node is a mapping (with a
yaml_node_pair_t to fill), or a sequence (with a yaml_node_item_t).

Fixes: yaml#107
tlsa added a commit to tlsa/libyaml that referenced this issue Jul 26, 2018
The document loading API (yaml_parser_load) was susseptable to a
stack overflow issue when given input data which opened many
mappings and/or sequences without closing them.

This was due to the use of recurion in the implementation.

With this change, we avoid recursion, and maintain our own loader
context stack on the heap.

The loader context contains a stack of document node indexes.
Each time a sequence or mapping start event is encountered,
the node index corrasponding to the event is pushed to the
stack.  Each time a sequence or mapping end event is encountered,
the corrasponding node's index is popped from the stack.

The yaml_parser_load_nodes() function sits on the event stream,
issuing events to the appropriate handlers by type.

When an event handler function constructs a node, it needs to
connect the new node to its parent (unless it's the root node).
This is where the loader context stack is used to find the
parent node.  The way that the new node is added to the tree
depends on whether the parent node is a mapping (with a
yaml_node_pair_t to fill), or a sequence (with a yaml_node_item_t).

Fixes: yaml#107
@perlpunk
Copy link
Member

Thanks! This was fixed by #127 and released in 0.2.3

ligurio pushed a commit to tarantool/libyaml that referenced this issue Jul 18, 2022
The document loading API (yaml_parser_load) was susseptable to a
stack overflow issue when given input data which opened many
mappings and/or sequences without closing them.

This was due to the use of recurion in the implementation.

With this change, we avoid recursion, and maintain our own loader
context stack on the heap.

The loader context contains a stack of document node indexes.
Each time a sequence or mapping start event is encountered,
the node index corrasponding to the event is pushed to the
stack.  Each time a sequence or mapping end event is encountered,
the corrasponding node's index is popped from the stack.

The yaml_parser_load_nodes() function sits on the event stream,
issuing events to the appropriate handlers by type.

When an event handler function constructs a node, it needs to
connect the new node to its parent (unless it's the root node).
This is where the loader context stack is used to find the
parent node.  The way that the new node is added to the tree
depends on whether the parent node is a mapping (with a
yaml_node_pair_t to fill), or a sequence (with a yaml_node_item_t).

Fixes: yaml#107
igormunkin pushed a commit to tarantool/libyaml that referenced this issue Aug 2, 2022
The document loading API (yaml_parser_load) was susseptable to a
stack overflow issue when given input data which opened many
mappings and/or sequences without closing them.

This was due to the use of recurion in the implementation.

With this change, we avoid recursion, and maintain our own loader
context stack on the heap.

The loader context contains a stack of document node indexes.
Each time a sequence or mapping start event is encountered,
the node index corrasponding to the event is pushed to the
stack.  Each time a sequence or mapping end event is encountered,
the corrasponding node's index is popped from the stack.

The yaml_parser_load_nodes() function sits on the event stream,
issuing events to the appropriate handlers by type.

When an event handler function constructs a node, it needs to
connect the new node to its parent (unless it's the root node).
This is where the loader context stack is used to find the
parent node.  The way that the new node is added to the tree
depends on whether the parent node is a mapping (with a
yaml_node_pair_t to fill), or a sequence (with a yaml_node_item_t).

Fixes: yaml#107
ligurio added a commit to ligurio/tarantool that referenced this issue Aug 2, 2022
Patch updates libyaml to the version that contains fixes of stack
overflows [1]. PR in upstream with original patch: "Avoid recursion in
the document loader" [2].

1. yaml/libyaml#107
2. yaml/libyaml#127

NO_DOC=internal
NO_TEST=internal
ligurio added a commit to ligurio/tarantool that referenced this issue Aug 2, 2022
Patch updates libyaml to the version that contains fixes of stack
overflows [1]. PR in upstream with original patch: "Avoid recursion in
the document loader" [2].

1. yaml/libyaml#107
2. yaml/libyaml#127

NO_DOC=internal
NO_TEST=internal
igormunkin pushed a commit to tarantool/tarantool that referenced this issue Aug 2, 2022
Patch updates libyaml to the version that contains fixes of stack
overflows [1]. PR in upstream with original patch: "Avoid recursion in
the document loader" [2].

1. yaml/libyaml#107
2. yaml/libyaml#127

NO_DOC=internal
NO_TEST=internal
igormunkin pushed a commit to tarantool/tarantool that referenced this issue Aug 2, 2022
Patch updates libyaml to the version that contains fixes of stack
overflows [1]. PR in upstream with original patch: "Avoid recursion in
the document loader" [2].

1. yaml/libyaml#107
2. yaml/libyaml#127

NO_DOC=internal
NO_TEST=internal

(cherry picked from commit 6aa30d0)
igormunkin pushed a commit to tarantool/tarantool that referenced this issue Aug 2, 2022
Patch updates libyaml to the version that contains fixes of stack
overflows [1]. PR in upstream with original patch: "Avoid recursion in
the document loader" [2].

1. yaml/libyaml#107
2. yaml/libyaml#127

NO_DOC=internal
NO_TEST=internal

(cherry picked from commit 6aa30d0)
p7nov pushed a commit to tarantool/tarantool that referenced this issue Aug 3, 2022
Patch updates libyaml to the version that contains fixes of stack
overflows [1]. PR in upstream with original patch: "Avoid recursion in
the document loader" [2].

1. yaml/libyaml#107
2. yaml/libyaml#127

NO_DOC=internal
NO_TEST=internal

(cherry picked from commit 6aa30d0)
mkokryashkin pushed a commit to mkokryashkin/tarantool that referenced this issue Sep 9, 2022
Patch updates libyaml to the version that contains fixes of stack
overflows [1]. PR in upstream with original patch: "Avoid recursion in
the document loader" [2].

1. yaml/libyaml#107
2. yaml/libyaml#127

NO_DOC=internal
NO_TEST=internal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants