-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit scope of libc usage in note-c. #126
Conversation
43d59bc
to
42f86cb
Compare
42f86cb
to
5578119
Compare
n_cjson.c
Outdated
// Greater than z -> not a valid letter. | ||
// Between a and Z -> not a valid letter. | ||
// Between a and z (inclusive) -> already lower. | ||
if (c < 'A' || c > 'z' || (c < 'a' && c > 'Z') || (c >= 'a' && c <= 'z')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems overly complicated.
It becomes more obvious if you swap in numbers.
Return the letter if:
x < 65 OR x > 122 OR (x < 97 AND x > 90) OR (x >= 97 AND x <= 122)
-128 >= x < 65,
x > 122,
90 < x < 97,
97 <= x <= 122
-128 <------ 65 ______ 91 --97--122--> 127
if (c < 'A' || c > 'z' || (c < 'a' && c > 'Z') || (c >= 'a' && c <= 'z')) { | |
if (c < 'A' || c > 'Z') { |
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah I think your change works.
"memset" | ||
# These string functions are ok. | ||
"strchr" | ||
"strcmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good opportunity to not allow these unsafe functions, especially when the safer version is listed below (strncmp
).
Obviously there may be a good reason, but if we can get away from it we probably should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't think this should prevent this commit from happening. If it's a chip shot then get it, but otherwise we should add it to the backlog - especially now that there is a mechanism for protecting against this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to re-immerse myself in the discourse on this age-old argument, but my rough understanding is that you're not that much better off using strncmp over strcmp: https://stackoverflow.com/questions/24353504/whats-wrong-with-strcmp
If your string isn't NULL-terminated, all bets are off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the two most compelling arguments from the post you linked are:
FOR:
Reading, writing, executing... it doesn't matter. Any memory reference to an unintended address is undefined behavior. In the most apparent scenario, you attempt to access a page that isn't mapped into your process's address space, causing a page fault, and subsequent SIGSEGV. In the worst case, you sometimes run into a \0 byte, but othertimes you run into some other buffer, causing inconstant program behavior. –
Jonathon Reinhart
AGAINST:
And if you're trying to call strcmp or strncmp with an array that you thought contained a null-terminated string but actually doesn't, then your code already has a bug. Using strncmp() might help you avoid the immediate symptom of that bug, but it won't fix it. - Keith Thompson
To my way of thinking, strncmp()
is more intentional than strcmp()
, because it requires the programmer to consider/know the size of the string they want to test. Yes, bugs can exist with both, but the scope of the damage can be lessened with strncmp()
. As a counter-point, it's usually easier to find a smoking gun when the offense is greater, but that damage is inflicted upon the user (so that's not a great experience for them).
All in all, I would go ahead and merge the PR today, but in my opinion, I think we would be better served to reduce our libc
footprint and move toward intentional coding.
n_cjson.c
Outdated
// Greater than z -> not a valid letter. | ||
// Between a and Z -> not a valid letter. | ||
// Between a and z (inclusive) -> already lower. | ||
if (c < 'A' || c > 'z' || (c < 'a' && c > 'Z') || (c >= 'a' && c <= 'z')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah I think your change works.
"memset" | ||
# These string functions are ok. | ||
"strchr" | ||
"strcmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to re-immerse myself in the discourse on this age-old argument, but my rough understanding is that you're not that much better off using strncmp over strcmp: https://stackoverflow.com/questions/24353504/whats-wrong-with-strcmp
If your string isn't NULL-terminated, all bets are off.
After some discussion with Ray, we settled on certain libc functions that we're ok using in note-c. Everything else from libc should be excluded. This commit adds a CMake option to build the note-c library without libc. It also tells the linker to generate errors for undefined references. The result is that we can run this build to see what libc functions we're using in note-c. A new Bash script, check_libc_dependencies.sh, processes the output of this build. If any non-permitted functions are found in the undefined reference errors, the script fails. This protects us against the introduction of non-permitted libc functions. A new job in ci.yml runs this script on every PR and push to master.
5578119
to
ec5653b
Compare
"memset" | ||
# These string functions are ok. | ||
"strchr" | ||
"strcmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the two most compelling arguments from the post you linked are:
FOR:
Reading, writing, executing... it doesn't matter. Any memory reference to an unintended address is undefined behavior. In the most apparent scenario, you attempt to access a page that isn't mapped into your process's address space, causing a page fault, and subsequent SIGSEGV. In the worst case, you sometimes run into a \0 byte, but othertimes you run into some other buffer, causing inconstant program behavior. –
Jonathon Reinhart
AGAINST:
And if you're trying to call strcmp or strncmp with an array that you thought contained a null-terminated string but actually doesn't, then your code already has a bug. Using strncmp() might help you avoid the immediate symptom of that bug, but it won't fix it. - Keith Thompson
To my way of thinking, strncmp()
is more intentional than strcmp()
, because it requires the programmer to consider/know the size of the string they want to test. Yes, bugs can exist with both, but the scope of the damage can be lessened with strncmp()
. As a counter-point, it's usually easier to find a smoking gun when the offense is greater, but that damage is inflicted upon the user (so that's not a great experience for them).
All in all, I would go ahead and merge the PR today, but in my opinion, I think we would be better served to reduce our libc
footprint and move toward intentional coding.
See commit messages for more details.