-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak / double free in address_parse_normalize_string #160
Comments
Thanks for reporting. This version frees up the memory without making an additional copy: 38c6c26. The use of realloc in utf8proc_map itself is not a problem per se because the function takes a double-pointer |
My first patch was similar in logic. Here is the issue I believe you still have...
In the case of have_utf8proc_options, it does indeed return the realloc'd pointer which is changed or not. So in one case it returns the pointer to the original address owned by the caller of parse_address. In the other case, it freed that pointer and returns a new one. I believe the code you added will still allow utf8proc_map to free the client's buffer... potentially causing heap corruption. If it doesn't do that then it is still possible that pointer you are deleting is the client's buffer. Or there is a third option, that I'm missing something but valgrind seems to agree. |
Ah... nvm. I understand now. So the issue still occurs with this option. The code modifies the passed in string (the client buffer) and then returns a pointer to it. That is the pointer I believe you are deleting.
|
I didn't want to do it this way but it may be the best option... diff --git a/src/address_parser.c b/src/address_parser.c
|
With the current implementation of To reiterate re: Also, note that In my tests the pushed commits fixed all valgrind errors in the command-line client for a variety of inputs. Is this hypotethical or is there some input string that's causing this error for you? |
Ok. Great. I believe you do need to add a free before the return in the nested if right above it too.
On Feb 9, 2017, 11:59 PM -0500, Al Barrentine <[email protected]>, wrote:
With the current implementation of string_trim, that would free the caller's buffer if there are leading spaces in the input (thinking more of the expand_address calls, as the options used by the parser<https://github.com/openvenues/libpostal/blob/ea168279bd5a0040e10fcbe3bf854d9a113690e7/src/address_parser.h#L61> are more limited). The string_trim implementation, as well as that of string_replace (also currently modifies the pointer in-place, although it does not change the pointer's location) make copies in the upcoming parser-data<https://github.com/openvenues/libpostal/blob/parser-data/src/string_utils.c> release, and as such the functions in normalize.c<https://github.com/openvenues/libpostal/blob/parser-data/src/normalize.c> clean up after all of their potential internal copies.
To reiterate re: utf8proc_map, it does not write to the input buffer. The pointer it's realloc-ing is not str (which is only read from), but uint8_t *utf8proc_normalized which is initialized as NULL and passed in as a double pointer so utf8proc_map can set/reset what it points to (if the user passed in a pre-allocated, non-NULL buffer, which we do not, and as noted before realloc-ing NULL is essentially just malloc).
Also, note that str in normalize_string_utf8 is not a double-pointer, so the effect of setting it is limited to the function's scope. The goal here is to chain several function calls depending on the options passed in, so we set str to the output of the last function for the next function to work on.
In my tests the pushed commits fixed all valgrind errors in the command-line client for a variety of inputs. Is this hypotethical or is there some input string that's causing this error for you?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#160 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AXzs1LStmEx-ieQYehLfoWdT6cVZYQv_ks5ra-6ugaJpZM4L8KjA>.
|
Only need to free memory if more than one allocation takes place internally. If the options were simply NORMALIZE_STRING_TRIM, only string_trim gets called and allocates memory, so only one copy would be made in normalize_string_utf8, and that gets returned to be freed by the caller. |
==26728== 1,077 bytes in 32 blocks are definitely lost in loss record 1,344 of 2,063
==26728== at 0x4C2BB4A: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==26728== by 0x4FCCE938: utf8proc_map (utf8proc.c:610)
==26728== by 0x4FCD35F1: normalize_string_utf8 (normalize.c:35)
==26728== by 0x4FCD366E: normalize_string_latin (normalize.c:57)
==26728== by 0x4FCDAF1C: address_parser_normalize_string (address_parser.c:153)
==26728== by 0x4FCDAF1C: address_parser_parse (address_parser.c:812)
==26728== by 0x4FCB6D8B: parse_address (libpostal.c:1043)
utf8proc_map uses realloc and parse_address has no way to deal with the pointer changing and the original being freed. If the pointer does change then parse_address has freed the address passed in without the client knowing and the new pointer is never freed.
Below is a potential fix.
diff --git a/src/address_parser.c b/src/address_parser.c
index 187bada..253b182 100644
--- a/src/address_parser.c
+++ b/src/address_parser.c
@@ -809,10 +809,12 @@ address_parser_response_t *address_parser_parse(char *address, char *language, c
return NULL;
}
bool is_normalized = normalized != NULL;
if (!is_normalized) {
}
averaged_perceptron_t *model = parser->model;
@@ -887,7 +889,7 @@ address_parser_response_t *address_parser_parse(char *address, char *language, c
char **single_label = malloc(sizeof(char *));
single_label[0] = label;
char **single_component = malloc(sizeof(char *));
@@ -899,6 +901,8 @@ address_parser_response_t *address_parser_parse(char *address, char *language, c
}
}
free(normalized);
cstring_array *token_labels = cstring_array_new_size(tokens->n);
char *prev_label = NULL;
The text was updated successfully, but these errors were encountered: