-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OsStrExt3 transmutes from an &[u8] to a OsStr #1524
Comments
It's still present in v3-master. It's used to handle things like These need redoing to use encode_wide()/decode_wide() I guess. Out of interest, I just threw together this, which still assumes UTF/WTF-8, but at least tries to uphold the invariants. My editor crashed three times writing it but I'm sure it's fiOH GOD RAPTORS. |
Yeah, I just don't think transmuting like this is a good idea. WTF-8 is an internal detail, and having an important crate in the ecosystem rely it is just not a good idea. The simplest way around this isn't to use |
Without encode/decode_wide I don't see any way to handle it both correctly and safely. Lossy decoding should be right out, because it implies corrupting some valid program arguments, so we're just left with panicking on invalid UTF-8 on Windows. |
I am operating at a loss here, because I don't understand the context in which these routines are being used. Certainly, what you're saying is not generally true. For example, Taking a step back and re-reading your comment above, maybe now I'm starting to realize here. Specifically, I guess the I think I now see the predicament, I think you're right. I see three possible choices:
The latter two imply quite a bit of work, and at least some additional performance overhead. However, the performance overhead would only occur when the cc @SimonSapin - As the architect of WTF-8, what do you think the suggest path here should be? (I still continue to think the internal representation should just be exposed, despite the strong principles against doing so. The fact that we have people transmuting to the internal representation is going to make it de facto exposed eventually anyway.) |
As an exercise I tried implementing the trait methods safely and came up with this. On Unix it just deals with byte slices, on Windows it tries converting to a |
That's a lot more code, but on an initial skim, it looks good? It does seem unfortunate that everyone has to pay for the |
We have an RFC that changes the memory representation of Officially exposing the byte representation would likely make this kind of change a breaking change. I would much prefer adding string-like methods to |
Had a go at integrating OsStrOps and came up with that. Can surely be improved, but passes the test suite on Windows and FreeBSD. |
Sorry for the force-pushes, clearing a few leftovers I missed. |
If you're interested, I created OsStr Bytes to solve this problem. It allows accessing the bytes of |
Closing in favor of #1594 |
In this code:
clap/src/osstringext.rs
Lines 23 to 32 in 784524f
the transmute casts the
&[u8]
to a&OsStr
. There are a couple problems with this:&[u8]
can be an arbitrary sequence of bytes, where as&OsStr
cannot on Windows. On Windows, it internally is WTF-8 and it's not clear what, if anything, goes wrong when it isn't WTF-8. (But if it isn't WTF-8, then it could very well break a perfectly valid internal invariant that leads to UB.) A plausible alternative is to makefrom_bytes
unsafe.&OsStr
is internally a&[u8]
on Windows that is WTF-8 is an implementation detail, and could actually change, leading to an incorrecttransmute
.Is this code still present in clap 3? If so, could someone explain the motivation for this? I'd be happy to try to help brainstorm ways of removing it.
The text was updated successfully, but these errors were encountered: