Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of bounds read in utf8proc #78

Merged
merged 3 commits into from
Jun 23, 2017

Conversation

philipturnbull
Copy link
Contributor

I stumbled upon some out-of-bounds reads when fuzzing with AddressSanitizer enabled. There are two bugs, one of which was fixed upsteam in JuliaStrings/utf8proc#66.

I think the other bug is that the + 1 in the lexer is calculating the wrong bounds:

uint32_t size = self->chunk_size - position_in_chunk + 1;

When parsing a single-byte file \xdf with tree-sitter-go, with some printfs added I can see the size is calculated as two:

ts_lexer__get_lookahead: size = 0x2 = 0x1 - 0x0 + 1
=================================================================
==7==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000b1 at pc 0x00000058c1ef bp 0x7ffe57cd32b0 sp 0x7ffe57cd32a8
READ of size 1 at 0x6020000000b1 thread T0
SCARINESS: 12 (1-byte-read-heap-buffer-overflow)
    #0 0x58c1ee in utf8proc_iterate /src/tree-sitter/externals/utf8proc/utf8proc.c:131:25
    #1 0x581c92 in ts_lexer__get_lookahead /src/tree-sitter/src/runtime/lexer.c:44:30
    #2 0x5736d6 in parser__lex /src/tree-sitter/src/runtime/parser.c:260:5
    #3 0x568d32 in parser__advance /src/tree-sitter/src/runtime/parser.c:1058:21
    #4 0x567c89 in parser_parse /src/tree-sitter/src/runtime/parser.c:1256:9
    #5 0x56103b in ts_document_parse_with_options /src/tree-sitter/src/runtime/document.c:137:16
    #6 0x5152ae in LLVMFuzzerTestOneInput /src/tree-sitter/../fuzzer.cc:21:3
...

0x6020000000b1 is located 0 bytes to the right of 1-byte region [0x6020000000b0,0x6020000000b1)
allocated by thread T0 here:
    #0 0x511070 in operator new[](unsigned long) /src/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:84
    #1 0x5b9201 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long)
...

However, removing the + 1 causes an infinite loop because the main loop in parser__lex never advances. So, I think another change is needed somewhere in the lexer or parser to completely fix the issue 🤔

philipturnbull and others added 3 commits June 21, 2017 09:55
This includes JuliaStrings/utf8proc#66 which is an out-of-bounds read when parsing
malformed utf8 characters.
Signed-off-by: Philip Turnbull <[email protected]>
@maxbrunsfeld maxbrunsfeld merged commit 076002a into tree-sitter:master Jun 23, 2017
@maxbrunsfeld maxbrunsfeld deleted the update-utf8proc branch June 23, 2017 19:18
BekaValentine pushed a commit that referenced this pull request Feb 23, 2021
* Insert automatic semicolons at included range boundaries
* ⬆️ tree-sitter-cli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants