Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RawVersion::to_string with CStr, enable WIndows version check #1341

Closed

Conversation

MartinKavik
Copy link
Contributor

Fixes #1307

Linux and Windows seem to handle strings differently when both expect C-like nul-terminated strings but raw Rust strings are provided.

The code in the PR tested with cargo run on Windows 10 and Kubuntu 22.04.

I was trying to do as few changes as possible but there are some potential improvements like:

pub fn to_str(self) -> &'static str {
     unsafe {
        CStr::from_ptr(self.0)
            .to_str()
            .expect("Failed to convert RawVersion into str")
    }
}

or to create C_VERSION_PKG lazily from VERSION_PKG.

Copy link
Owner

@hannobraun hannobraun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this pull request, @MartinKavik! It's great to see some attention on Windows support.

Linux and Windows seem to handle strings differently when both expect C-like nul-terminated strings but raw Rust strings are provided.

I don't understand that. What did expect a nul-terminated string before? As far as I can tell, the current code on main both produces and expects raw Rust strings.


While testing this, I ran into segmentation faults, both on main and your branch. Here's what I did:

  1. Install the latest fj-app release: cargo install fj-app`
  2. Go to the repository root.
  3. Run the test model: cargo run

Here's what happens on the main branch:

  2022-11-13T18:31:49.110410Z  WARN fj_host::model: Host version (0.23.0 (unreleased)) and model version (0.23.0 (35a02544a, unreleased)) do not match
    at /home/hanno/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/fj-host-0.23.0/src/model.rs:139

Segmentation fault (core dumped)

And this happens, if I do the same with your branch:

memory allocation of 140097384091217 bytes failed
Aborted (core dumped)

This is happening on my laptop, and this is the first time since any version checking code was added that I did any testing of this kind there. The version checking code was originally developed on my office PC, and maybe something works differently there. (cargo run works on my laptop like normal, on both branches.)

I'd like to look into this tomorrow. I'll hold off on making any decisions about this pull request, until we better understand what's going on.

@MartinKavik MartinKavik force-pushed the C-compatible-RawVersion branch from e05972f to 77df0cc Compare November 13, 2022 21:11
@MartinKavik
Copy link
Contributor Author

MartinKavik commented Nov 13, 2022

I've created another version you can try to run: main...MartinKavik:Fornjot:C-compatible-RawVersion-v2 (tested on Windows 10 & Kubuntu)


I don't understand that. What did expect a nul-terminated string before? As far as I can tell, the current code on main both produces and expects raw Rust strings.

I don't really know - I just guess there is a problem when we try to pass the Rust version string through C-compatible API even though it's from Rust to Rust in practice. There is a chance Windows is trying to encode literals in UTF-16 or the old code is counting bytes instead of chars or there is missing \0 at the end of the string or extra \0 in the middle of the string or there are some magic compiler optimizations that could mess it up (e.g. I saw some LLVM-related code and comments in Rust CString source code). I have zero experience with Rust <-> C FFI so I would need to jump to a deep rabbit hole to find out. Also I've tried to run the code with miri but it found nothing.

FFI patterns for passing/accepting Strings: https://rust-unofficial.github.io/patterns/idioms/ffi/intro.html


Install the latest fj-app release: cargo install fj-app
(cargo run works on my laptop like normal, on both branches.)

Perhaps you already know but for sure: cargo install xx is my favorite footgun - cargo install ignores Cargo.lock but cargo run respects it. So there is a big chance cargo install and cargo run use different dependency versions. I had problems with it in MoonZoon when for instance Actix released incompatible crate versions or when somebody yanked a crate version => therefore I recommend to use the cargo install parameter --locked in my READMEs.


memory allocation of 140097384091217 bytes failed

I've seen it once on my Kubuntu while the Fornjot was starting. It feels a bit random. There is a chance it isn't directly related to Fornjot code, see for instance rust-lang/rust-clippy#8558.


Segmentation fault (core dumped)

It's a quite popular topic among the MacOS users - see vectordotdev/vector#4196 (comment)
I don't have Mac so I can't really help here.

@hannobraun
Copy link
Owner

I've created another version you can try to run: main...MartinKavik:Fornjot:C-compatible-RawVersion-v2 (tested on Windows 10 & Kubuntu)

Thank you! I'll take a look later.

I don't really know - I just guess there is a problem when we try to pass the Rust version string through C-compatible API even though it's from Rust to Rust in practice. There is a chance Windows is trying to encode literals in UTF-16 or the old code is counting bytes instead of chars or there is missing \0 at the end of the string or extra \0 in the middle of the string or there are some magic compiler optimizations that could mess it up (e.g. I saw some LLVM-related code and comments in Rust CString source code)

Hmm, I don't know. None of that rings true to me.

  • I've considered that Windows is somehow inserting UTF-16 in there, but I don't see how. All we're doing is passing u8 pointers through an FFI boundary, so there's really no opportunity for Windows-specific encoding to creep in. I don't know how the environment variables are encoded, but those are read using the env! macro, which returns a &'static str. If that were wrong, somehow, that would be a pretty blatant bug hiding in the middle of the standard library.
  • The old code is counting bytes. See str::len. str::as_ptr returns a *const u8, so that matches. And I don't see any inconsistencies between that and the receiving side.
  • I don't see why a \0 at the end of the string would be expected, nor where extra \0s should come from. As I said, it's just passing pointers through an FFI boundary. Nothing really Windows-specific about that.

Then again, it's obviously not working, so all my reasoning might be wrong. And I certainly appreciate the discussion, so keep the theories coming, if you have more.

I do suspect though that the string handling is not the reason for the problem, and that whether it's done using the raw representation of a Rust string, or CStr/CString is orthogonal to the problem. My main theory right now is that something is fundamentally wrong with how we use the static, and that it just happens to work on some systems.

FFI patterns for passing/accepting Strings: https://rust-unofficial.github.io/patterns/idioms/ffi/intro.html

Thanks, I'll look into it later! At a first glance, I'm not sure how relevant it is though. It talks about FFI between Rust and C(-compatible) APIs, while here we're only dealing with quasi-FFI between two Rust sides. As I expressed above, there really should be no C-style strings there, at all, unless we explicitly introduce them.

Perhaps you already know but for sure: cargo install xx is my favorite footgun - cargo install ignores Cargo.lock but cargo run respects it. So there is a big chance cargo install and cargo run use different dependency versions. I had problems with it in MoonZoon when for instance Actix released incompatible crate versions or when somebody yanked a crate version => therefore I recommend to use the cargo install parameter --locked in my READMEs.

Interesting! I already kinda knew this, but it certainly wasn't at the top of my mind. I'll re-test with --locked when I look into this again later.

memory allocation of 140097384091217 bytes failed

I've seen it once on my Kubuntu while the Fornjot was starting. It feels a bit random. There is a chance it isn't directly related to Fornjot code, see for instance rust-lang/rust-clippy#8558.

I'm pretty confident it's related to this issue, for the following reasons:

  • While I was testing this, I switched between main and your branch multiple times. The output was always as in my previous comment, so it doesn't seem random. (Could be a coincidence, of course. We'll see how it holds up when I look into it again.)
  • Given that something seems to be fundamentally wrong, I can see why your branch would result in such a large memory allocation, while main doesn't: If the CStr is pointing to garbage, and that garbage doesn't have many \0s in it, then trying to allocate the CString could easily result in this error.

Segmentation fault (core dumped)

It's a quite popular topic among the MacOS users - see vectordotdev/vector#4196 (comment) I don't have Mac so I can't really help here.

Interesting. I saw the segmentation fault on Linux though (all my machines run Linux), and haven't heard any reports about segmentation faults on macOS.

@MartinKavik
Copy link
Contributor Author

Thank you! I'll take a look later.

Ok, let me know how it went, that version should be the safest one I hope. If it doesn't work, please write me what Linux distribution + version + architecture you use so I can reproduce the problem.

@hannobraun
Copy link
Owner

Ok, let me know how it went, that version should be the safest one I hope. If it doesn't work, please write me what Linux distribution + version + architecture you use so I can reproduce the problem.

Will do 👍

Note: "later" had been delayed by a bit. I'm having trouble with the weekly release (due to some updates to the release automation) and will likely not get to anything else today. I should have time tomorrow.

@hannobraun
Copy link
Owner

I wasn't able to reproduce the segmentation faults I was seeing yesterday, but I got this (on my office PC running Arch Linux):
Screenshot from 2022-11-15 13-50-17

This isn't an error message that got mangled into the output somehow. It's the actual contents of the version string. I suspect that the concat! call that adds the \0 isn't working, and that this panic message was the string stored right after to the version in the binary. So the CStr doesn't actually end where it should, and includes the next string, after which there's a \0 for some reason. If that were the case, that would match the allocation failure I saw yesterday, only then there didn't happen to be a \0 in memory soon after.

About the concat! not working, I don't understand exactly why or how, but it does seem very fishy. We probably can't rely on concat! to preserve the \0 byte. Maybe it does some trimming, maybe it determines that the second string has nothing useful in it and doesn't append at all, or whatever. In addition, since \0 is a valid character in UTF-8, this character could theoretically be in the middle of the first string, which would not lead to the desired outcome. So yeah, not exactly sure what specifically is going on, but I strongly suspect that this method of converting a &str into a CStr is not valid.

Regardless of this finding, I have decided not to merge this pull request without strong justification. Using CStr seems to do nothing but add more moving parts, for no reason that I can see.


While doing my testing and research, I came across a potential soundness hole which I've fixed in #1358. That may have been the problem we've been seeing on Windows, but I don't know. The pull request is also reducing the number of moving parts overall, so if you have some time, I'd appreciate if you could try it.


Closing, as I don't believe this is the right approach, for the aforementioned reasons. But still, thanks a lot for looking into this, @MartinKavik! At the very least, you provided the motivation for me to look into this again. I doubt I would have seen this soundness hole, if I your code hadn't made me think about this in new ways.

@hannobraun hannobraun closed this Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Model version check is disabled on Windows
2 participants