-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rustify DOMXPath::quote #13545
base: master
Are you sure you want to change the base?
Rustify DOMXPath::quote #13545
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
#ifndef DOM_XPATH_RUST_H | ||
#define DOM_XPATH_RUST_H | ||
extern char* domxpath_quote_literal(const char *const input, uintptr_t *const len); | ||
#endif |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
use std::ffi::{CString, c_char}; | ||
|
||
|
||
#[no_mangle] | ||
pub extern "C" fn domxpath_quote_literal(input: *const c_char, len: *mut usize) -> *mut c_char { | ||
let slice = unsafe { std::slice::from_raw_parts(input as *const u8, *len as usize) }; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should preferably contain a "safety requirement" comment, i.e. that it is expected that the caller does not hold any mutable references to the input. I know it's a bit trivial in this case, but when using unsafe I'd rather see it documented, and the caller must know about it too to avoid soundness issues. |
||
|
||
let single_quote_absent = !slice.contains(&b'\''); | ||
let double_quote_absent = !slice.contains(&b'"'); | ||
|
||
let result = if single_quote_absent { | ||
let mut res = Vec::with_capacity(slice.len() + 2); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without additional infrastructure this bypasses PHP's memory manager and thus PHP's memory limit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I wonder how difficult it is to give rust access to php's memory manager 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The easiest way I see is to implement GlobalAlloc to use emalloc/efree. I can submit a separate PR for this, and possibly an RFC, if anyone's interested. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using GlobalAlloc makes sense, request-based allocations are the most common case anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @danog please do. I won't have time/energy to contribute for some time. |
||
res.push(b'\''); | ||
res.extend_from_slice(slice); | ||
res.push(b'\''); | ||
res | ||
} else if double_quote_absent { | ||
let mut res = Vec::with_capacity(slice.len() + 2); | ||
res.push(b'"'); | ||
res.extend_from_slice(slice); | ||
res.push(b'"'); | ||
res | ||
} else { | ||
let mut res = Vec::from("concat(".as_bytes()); | ||
let mut temp_slice = slice; | ||
|
||
while !temp_slice.is_empty() { | ||
let bytes_until_single_quote = temp_slice.iter().position(|&x| x == b'\'').unwrap_or(temp_slice.len()); | ||
let bytes_until_double_quote = temp_slice.iter().position(|&x| x == b'"').unwrap_or(temp_slice.len()); | ||
|
||
let (quote_method, bytes_until_quote) = if bytes_until_single_quote > bytes_until_double_quote { | ||
(b'\'', bytes_until_single_quote) | ||
} else { | ||
(b'"', bytes_until_double_quote) | ||
}; | ||
|
||
res.push(quote_method); | ||
res.extend_from_slice(&temp_slice[..bytes_until_quote]); | ||
res.push(quote_method); | ||
res.push(b','); | ||
temp_slice = &temp_slice[bytes_until_quote..]; | ||
} | ||
let res_len = res.len(); | ||
res[res_len - 1] = b')'; | ||
res | ||
}; | ||
|
||
// Update length | ||
unsafe { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You should at least be able to get rid of this unsafety here. |
||
*len = result.len() as usize; | ||
} | ||
|
||
// Convert Vec<u8> to *mut c_char | ||
let c_str = CString::new(result).expect("CString::new failed"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that this would leak memory, as you're not freeing the (persistently-allocated) memory of the string at the call site. Using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. quick question, would it be better to handle this more "smoothly" ? ie using match pattern. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this can actually fail because the input may contain \0 bytes. So we must not panic by using |
||
c_str.into_raw() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo:
ouput
. This construct will also require a full allocation + copy into a newzend_string()
.