-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strings without null-termination #97
Conversation
Can this please be merged? I'm running into problems with this, as I would like to use string with embedded NUL chars in them. When copying a String with a NUL char in it, the memory after the NUL bytes is not being copied, so the rest of that String is uninitialised memory. Here's sample code showing the error:
Output of this sample:
|
@matthijskooijman How can I build this and run it on my own Arduino's? As you might recall, I added #112 and would like to create a patch to fix it. But then I need to compile this code myself in order to fix it. How does one do that? |
@cvwillegen You should be able to clone this repository (and then the right branch) into Then, it is probably useful to change the If you then restart the Arduino IDE, you should the AVR boards twice in the boards menu. If you select the right one, it will use the code from git rather than the official release. Is that enough info? |
@matthijskooijman can you please rebase? I'd like to see the CI unit tests run, probably the tests need some fixing. |
3209e07
to
721ed20
Compare
721ed20
to
9a604e7
Compare
Codecov Report
@@ Coverage Diff @@
## master #97 +/- ##
==========================================
- Coverage 96.41% 96.07% -0.35%
==========================================
Files 14 14
Lines 837 840 +3
==========================================
Hits 807 807
- Misses 30 33 +3
Continue to review full report at Codecov.
|
@aentinger I finally got around to having another look at this, I see you already did a rebase. I just did another force push, which:
With that, the testcases succeed again. I think this could be merged as-is (after squashing the fixup), but I also want to look at:
Ideally, we'd do 1. and 2. before merging this, 3. is probably more tricky and might be left for later. |
There are still strcmp() calls in the source, instead of memcmp(), so it's not going to work as-is... |
Yeah, that's point 2. on my list above. But for some purposes (such as passing around buffers with embedded nuls) the PR is already useful as it stands (as long as you do not need comparison or some other operations). But as I said, ideally we'd fix at least the comparisons before merging this. |
I actually ran into this when doing an == on 2 NUL-containing strings :-) Indeed, I created #112 and you pointed me to this PR... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @matthijskooijman 👋 and a happy new year 🎆
Thank you very much for all your good work 👍! I agree that both
- Adding some testcases with embedded nul bytes to verify that those actually work and stay working in the future (and increase coverage).
and
- Fixing String::equals (and related methods) with embedded nul characters (probably required to write proper testcases). This is also String == String does not work with embedded NUL characters #112.
will be necessary for merging this PR. Maybe if you could provide test cases in the appropriate files for String::equals
and String::compareTo
? I don't see a need to provide embedded-nul-string test code for operators ==, !=, <, >, <=, >= as they are basically wrappers around the two functioned mentioned before.
I assume a rebase is in order @matthijskooijman after my merging of #137? |
Instead of calling strlen in a dozen places, instead just call String::concat(s), which will then call strlen. This shrinks the code size of these calls significantly, the StringAppendOperator example on the Arduino Uno shrinks by 72 bytes. This change does incur a slight runtime cost, because there is one extra function call.
Previously, these methods took a nul-terminated string and appended it to the current buffer. The length argument (or length of the passed String object) was used to allocate memory, but was not used when actually copying the string. This meant that: - If the passed length was too short, or the string passed in was not nul-terminated, the buffer would overflow. - If the string passed in contained embedded nul characters (i.e, among the first length characters), any characters after the embedded nul would not be copied (but len would be updated to indicate they were). In practice, neither of the above would occur, since the length passed is always the known length of the string, usually as returned by strlen. However, to make this method public, and to allow using this concat method to pass in strings that are not nul-terminated, it should be changed to be more robust. This commit changes the method to use memcpy instead of strcpy, copying exactly the number of bytes passed in. For the current calls to this method, which pass a nul-terminated string, without embedded nul characters and a correct length, this behaviour should not change. However, this concat method can now also be used in the two cases mentioned above. Non-nul-terminated strings now work as expected and for strings with embedded newlines the entire string is copied as-is, instead of leaving uninitialized memory after the embedded nul byte. Note that a lot of operations will still only see the string up to the embedded nul byte, but that's not really fixable unless we reimplement functions like strcmp.
Now concat(const char*, unsigned int) no longer requires a nul-terminated string, we can simplify the concat(char) method to just pass the address of the single character instead of having to create buffer with nul-termination.
This method is useful when receiving data from external sources that pass an explicit length instead of a NUL-terminated string.
This constructor allows converting a buffer containing a non-nul-terminated string to a String object, by explicitely passing the length.
Before, substring would (temporarily) add a \0 in the original string at the end of the substring, so the substring could be copied into a new String object. However, now that the String::copy() method can work with non-nul-terminated strings (by passing an explicit length), this trick is not needed and we can just call the copy method instead.
This just calls the char* version, but allows calling the method with a uint8_t* as well (which is not uncommon for buffers).
This allows creating a String from a uint8_t[] or uint8_t* as well, without having to add explicit casts.
9a604e7
to
d6fd84f
Compare
Yup, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Thank you @matthijskooijman 🚀
Yay for merging this, but I didn't make the changes mentioned in #97 (review) yet... And there was still a fixup commit in here that should have been squashed before merging... Regardless, I believe that the changes in this PR are already useful and worth including, though support for strings with NUL is now not entirely complete and lacks testcases, so that still needs to be fixed (no time for that right now, though). |
cherry-pick from: arduino/ArduinoCore-API#97
When working with the Arduino String class, I've found that I couldn't efficiently combine it with some external libraries that explicitely pass char* and length around, without nul-terminating their strings. This prompted me to modify and expose the concat (const char* cstr, unsigned int length) method, add a new String(const char* cstr, unsigned int length) constructor. While I was going over the string class, I found some other minor improvements, which are included here. This is a port of arduino/Arduino#1936. The commits are identical, except for some improved commit messages and one commit was dropped since that change was already made by someone else in the meantime. I've provided some testcases by updating the string examples: arduino/Arduino#9239 If this is ok to merge, I'll also provide a PR for the reference documentation. cherry-pick from: arduino/ArduinoCore-API#97
When working with the Arduino String class, I've found that I couldn't efficiently combine it with some external libraries that explicitely pass char* and length around, without nul-terminating their strings. This prompted me to modify and expose the concat (const char* cstr, unsigned int length) method, add a new String(const char* cstr, unsigned int length) constructor. While I was going over the string class, I found some other minor improvements, which are included here. This is a port of arduino/Arduino#1936. The commits are identical, except for some improved commit messages and one commit was dropped since that change was already made by someone else in the meantime. I've provided some testcases by updating the string examples: arduino/Arduino#9239 If this is ok to merge, I'll also provide a PR for the reference documentation. cherry-pick from: arduino/ArduinoCore-API#97
When working with the Arduino String class, I've found that I couldn't efficiently combine it with some external libraries that explicitely pass char* and length around, without nul-terminating their strings. This prompted me to modify and expose the concat (const char* cstr, unsigned int length) method, add a new String(const char* cstr, unsigned int length) constructor. While I was going over the string class, I found some other minor improvements, which are included here.
This is a port of arduino/Arduino#1936. The commits are identical, except for some improved commit messages and one commit was dropped since that change was already made by someone else in the meantime.
I've provided some testcases by updating the string examples: arduino/Arduino#9239
If this is ok to merge, I'll also provide a PR for the reference documentation.