-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rustify DOMXPath::quote #13545
base: master
Are you sure you want to change the base?
Rustify DOMXPath::quote #13545
Conversation
Adding a new build-time dependency most certainly requires an RFC, because it will exclude platforms where a rust compiler is not available or not officially supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a big fan of Rust, and have used it in my own projects.
Adopting this in PHP may be beneficial, but requires the right abstractions to be put in place, and it's not a call I can make on my own. Tim is right that the introduction of any other language into PHP needs an RFC.
}; | ||
|
||
// Update length | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should at least be able to get rid of this unsafety here.
I suggest returning a ffi-compatible struct that contains both the string as raw parts, and the length.
} | ||
|
||
// Convert Vec<u8> to *mut c_char | ||
let c_str = CString::new(result).expect("CString::new failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this would leak memory, as you're not freeing the (persistently-allocated) memory of the string at the call site. Using RETURN_STRINGL
also makes an extra allocation + copy which is unfortunate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick question, would it be better to handle this more "smoothly" ? ie using match pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this can actually fail because the input may contain \0 bytes. So we must not panic by using expect
. And this should probably use a Zend API to avoid the reallocation too, will be much cleaner in the end with the right access and abstractions to Zend APIs.
|
||
#[no_mangle] | ||
pub extern "C" fn domxpath_quote_literal(input: *const c_char, len: *mut usize) -> *mut c_char { | ||
let slice = unsafe { std::slice::from_raw_parts(input as *const u8, *len as usize) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should preferably contain a "safety requirement" comment, i.e. that it is expected that the caller does not hold any mutable references to the input. I know it's a bit trivial in this case, but when using unsafe I'd rather see it documented, and the caller must know about it too to avoid soundness issues.
let double_quote_absent = !slice.contains(&b'"'); | ||
|
||
let result = if single_quote_absent { | ||
let mut res = Vec::with_capacity(slice.len() + 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without additional infrastructure this bypasses PHP's memory manager and thus PHP's memory limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I wonder how difficult it is to give rust access to php's memory manager 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest way I see is to implement GlobalAlloc to use emalloc/efree.
I can submit a separate PR for this, and possibly an RFC, if anyone's interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using GlobalAlloc makes sense, request-based allocations are the most common case anyway.
If you go for an RFC then we'd need more than just the allocation interface, but we also need proper access to the Zend data structures. So this is more work than we might expect at first, but my personal opinion is that it's a good idea to allow memory safe languages inside PHP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danog please do. I won't have time/energy to contribute for some time.
ext/dom/xpath_rust.h
Outdated
@@ -0,0 +1 @@ | |||
extern char* domxpath_quote_literal(const char *const input, uintptr_t *const len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The choice of uintptr_t
looks odd for a size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
php-src's
uintptr_t
should be equivalent to rust'susize
Nope. size_t
is the type to use for sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, apparently Rust's usize
maps to uintptr_t
indeed, but this isn't really accurate from a C perspective 🤔 I expect both types to be the same on modern platforms, but it bothers me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh there actually is a core::ffi::c_size_t
in Rust-nightly, but it's not released/stable yet, quoting https://doc.rust-lang.org/core/ffi/type.c_size_t.html :
🔬This is a nightly-only experimental API. (c_size_t #88345)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume we'd depend on libc
at some point anyway, so libc::size_t
can probably be used.
const char *ouput = domxpath_quote_literal(input, &input_len); | ||
RETURN_STRINGL(ouput, input_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: ouput
. This construct will also require a full allocation + copy into a new zend_string()
.
I know :) but I had some questions, like
and this is a cool, functional proof of concept :) (at least it works locally on my machine. The CI's need rustc) |
For Makefiles, it's just a matter of invoking rustc. For buildconf and configure I'm not too sure, better ask the build system gurus :-)
Ideally, we'd be able to use zend_string both inside Rust and C. |
Yeah that would be ideal
Nope! Neat! Wonder if i could get those guys' input here, ill try contacting them |
Are there Rust-unsupported platforms that PHP for-reals supports? (Not just claimed to 20 years ago and forgot to check.) This would definitely require an RFC as it's a big policy change, though I'd be supportive, depending on the specifics. |
As per https://doc.rust-lang.org/nightly/rustc/platform-support.html: OpenBSD is in Tier 3 support for Rust, which means:
Solaris / illumos is in Tier 2 without Host Tools which means:
I'm fairly confident that PHP works fine on both and I expect both of them to be in production use. |
I am fairly certain DOMXPath::quote is bug-free due to the extensive testing done by @nielsdos , quote:
That is impressive, and made me think, what if we take it 1 step further and re-write it in Rust, with it's memory safety properties?
For those not in the know, Rust is a memory-safe high-performance language.
The Linux Kernel and the Windows kernel and Mozilla Firefox has adopted Rust for it's performance and safety, and the Chromium browser developers is looking into doing the same for Chromium,
Maybe PHP can adopt it too? 😄