Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uefi-raw: Add Char8/Char16 #809

Merged
merged 1 commit into from
May 20, 2023

Conversation

nicholasbishop
Copy link
Member

These definitions are just type aliases, unlike the more restrictive newtypes in the uefi package. In particular, the uefi package does not allow Char16 to be created for surrogate pair characters. This is more restrictive than what UEFI actually specifies though; it says that Char16 is usually UCS-2, but particular uses may allow other encodings. So for uefi-raw, just treat Char16 as numbers that can hold any u16 value.

Checklist

  • Sensible git history (for example, squash "typo" or "fix" commits). See the Rewriting History guide for help.
  • Update the changelog (if necessary)

@phip1611
Copy link
Contributor

phip1611 commented May 17, 2023

Hm. I thought we eventually move the definitions for chars to the ucs-2 crate. I'd rather expect uefi-raw to use ucs-2. What do you think?

@nicholasbishop
Copy link
Member Author

nicholasbishop commented May 17, 2023

I was originally thinking that as well, but now I think enforcing UCS-2 in uefi-raw for all places where the spec uses Char16 is too strict. In fact, I think it might be too strict for uefi as well.

For example, consider reading FAT filesystems through the Simple File System Protocol. The UEFI Spec says "FAT stores file names in two formats. [...] FAT 8.3 file names are always stored as uppercase ASCII characters. LFN can either be stored as ASCII or UCS-2." Then in a note, it says "Although the FAT32 specification allows file names to be encoded using UTF-16, this specification only recognizes the UCS-2 subset for the purposes of sorting or collation."

To me, that text is quite ambiguous as to what the proper behavior should be if the firmware exposes the SFS protocol for a FAT filesystem that happens to contain filenames with surrogate pairs. But logically it seems likely that the firmware isn't going to check for that case, and will just expose the filenames as a Char16 pointer that ends at the null. I confirmed that edk2 does exactly that by creating a FAT filesystem with a file named test😀.txt. And on our end, I wouldn't want the crate to reject such filenames; it's perfectly reasonable for people to create UTF-16 filenames since that's allowed by FAT, it's just kinda-sorta discouraged in the variation of FAT described in the UEFI Spec.

(Another example where we might want lenient behavior is UEFI Variable names.)

I think the ideal behavior may require some more types. For reading data from UEFI APIs, we should allow any u16 chars. For writing data, we might choose to be more restrictive and enforce UCS-2 in most places.

This is probably a larger discussion as far as what to do for uefi, but for uefi-raw I think simple type aliases are the way to go due to the issue of validations. If you call a function that returns a string, such as GetNextVariableName(), the restrictive uefi::Char16 type can't really be used as the string type, because the firmware might provide a string that isn't valid UCS-2.

@phip1611
Copy link
Contributor

Okay, I see your point. Fine with me

These definitions are just type aliases, unlike the more restrictive newtypes in
the `uefi` package. In particular, the `uefi` package does not allow `Char16` to
be created for surrogate pair characters. This is more restrictive than what
UEFI actually specifies though; it says that `Char16` is _usually_ UCS-2, but
particular uses may allow other encodings. So for `uefi-raw`, just treat
`Char16` as numbers that can hold any `u16` value.
@phip1611 phip1611 force-pushed the bishop-raw-chars branch from 7f0a00a to 6780061 Compare May 20, 2023 08:16
@nicholasbishop nicholasbishop merged commit d9cca99 into rust-osdev:main May 20, 2023
@nicholasbishop nicholasbishop deleted the bishop-raw-chars branch May 20, 2023 15:55
@nicholasbishop nicholasbishop mentioned this pull request Sep 4, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants