-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gdt: Add remaining GDT flags #181
Conversation
Thanks a lot! Some notes:
Unrelated to this PR: I saw that you did some issue triage and reviewed other PRs as well. Thanks a lot for that! If you like to help us maintain and extend this crate, I can add you to our |
Ya the fact that the
Looking over it, I'm inclined to agree in the case of the 64-bit Code segments, we should just set the minimal set. The data segments are more complex, as they can be run in 32 or 64-bit mode, so some fields are only somtimes ignored (this is the Linux use-case).
I think setting these follows the same logic as setting the Limit to be the maximum value for the 32-bit segments. Users should be using (and on 64-bit must use) paging to handle access controls and tracking, not segmentation. I can document this more. |
@phil-opp I reworked the PR to:
|
No problem. I would be happy to help maintain. This crate is a core dependency for several of our projects, so it's certainly in our interest to make sure everything is looking good here. This is the first PR of a patch-set that lets us construct fully static (i.e. linker determined location, read-only at runtime) GDT/IDT/PageTables/etc... |
Thanks for the update, looks good now! I'll try to do a full review soon, but it might take a few days until I get to it. (One thing that we need to figure out is whether this is a breaking change or not.)
Great, thanks! I sent you an invite.
Sounds useful, looking forward to your patches!
Good points! Then let's set these flags by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks a lot!
A left a few minor suggestions inline, but these are things that we can also address in follow-up PRs if you prefer.
Regarding breakage: While we change the observable return value of the kernel_code_segment
etc. functions, they should behave exactly as before. For this reason I'm inclined to define this change as semver-compatile and release only a new patch version (v0.12.2) for it. Or do you think we should do a minor release (v0.13.0)?
pub const fn kernel_code_segment() -> Descriptor { | ||
Descriptor::UserSegment(DescriptorFlags::KERNEL_CODE64.bits()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also add a kernel_data_segment
function to resolve #160.
Also, a general from_flags(flags: DescriptorFlags) -> Self
function be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch I forgot about syscall/sysenter/sysret/sysexit, this might actually impact what we do for defaults here.
I'll do some more reading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like we should set all of these "ignored" flags. Per the Intel manual, syscall/sysenter/sysret/sysexit all load a specific selector from the GDT, but they don't actually read the GDT entry, they instead load fixed data in the descriptor caches. The manual then states:
It is the responsibility of OS software to ensure that the descriptors (in GDT or LDT) referenced by those selector values correspond to the fixed values loaded into the descriptor caches; the SYSCALL instruction does not ensure this
correspondence.
So we should just have our defaults match those fixed values the processor loads automatically. These are the values I had initially (i.e. exactly the values Linux uses). It now also makes sense why Linux was setting ignored bits, accessed bits, etc...
Specific details of what flags are set by the processor
Ring 3 -> Ring 0
CS
SYSCALL: SEL = IA32_START[47:32]
SYSENTER: SEL = IA32_SYSENTER_CS[15:0]
- Base = 0
- Limit = 0xFFFFF
- Type = 0x1011 (execute, read, accessed)
- S = 1
- DPL = 0
- P = 1
- L = 1 (0 32-bit)
- D = 0 (1 32-bit)
- G = 1
SS
SYSCALL: SEL = IA32_START[47:32] + 8
SYSENTER: SEL = IA32_SYSENTER_CS[15:0] + 8
- Base = 0
- Limit = 0xFFFFF
- Type = 0x0011 (data, write, accessed)
- S = 1
- DPL = 0
- P = 1
- B = 1
- G = 1
Ring 0 -> Ring 3
CS
SYSRET: SEL = IA32_STAR[63:48] + 16
SYSRET (32-bit): SEL = IA32_STAR[63:48]
SYSEXIT: SEL = IA32_SYSENTER_CS[15:0] + 32
SYSEXIT (32-bit): SEL = IA32_SYSENTER_CS[15:0] + 16
- Base = 0
- Limit = 0xFFFFF
- Type = 0x1011 (execute, read, accessed)
- S = 1
- DPL = 3
- P = 1
- L = 1 (0 32-bit)
- D = 0 (1 32-bit)
- G = 1
SS
SYSRET: SEL = IA32_STAR[63:48]+8
SYSEXIT: SEL = IA32_SYSENTER_CS[15:0] + 40
SYSEXIT (32-bit): SEL = IA32_SYSENTER_CS[15:0] + 24
- Base = 0
- Limit = 0xFFFFF
- Type = 0x0011 (data, write, accessed)
- S = 1
- DPL = 3
- P = 1
- B = 1
- G = 1
Interestingly, due to the constraints of these commands, if you wanted to support both, you would need a GDT that had the following segments ordered like:
- KERNEL_CODE64
- KERNEL_DATA
- USER_CODE32
- USER_DATA
- USER_CODE64
- USER_DATA (only need for 64-bit sysenter/sysexit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a general
from_flags(flags: DescriptorFlags) -> Self
function be useful.
A user can already do Descriptor::UserSegment(flags.bits())
do you think adding another method is useful? Should it do some sort of basic checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a general
from_flags(flags: DescriptorFlags) -> Self
function be useful.A user can already do
Descriptor::UserSegment(flags.bits())
do you think adding another method is useful? Should it do some sort of basic checks?
I was mostly thinking about improving the discoverability of the DescriptorFlags
structure. Right now, the relationship between the DescriptorFlags
and GlobalDescriptorTable
types is only implicit. You have to know the GDT details to be sure that Descriptor::UserSegment(flags.bits())
is the correct implementation (as you could put any u64
in there). An additional from_flags
method would properly encode this at the type system level, which would make it less dangerous to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should just have our defaults match those fixed values the processor loads automatically. These are the values I had initially (i.e. exactly the values Linux uses). It now also makes sense why Linux was setting ignored bits, accessed bits, etc...
Thanks a lot for investigating this! I agree that it makes sense to use the sames values at Linux then.
Interestingly, due to the constraints of these commands, if you wanted to support both, you would need a GDT that had the following segments ordered like:
* KERNEL_CODE64 * KERNEL_DATA * USER_CODE32 * USER_DATA * USER_CODE64 * USER_DATA (only need for 64-bit sysenter/sysexit)
Unrelated to this PR, but maybe it makes sense to add a new constructor function to GlobalDescriptorTable
with exactly these entries? It seems like this is the canonical GDT layout that most users will want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional
from_flags
method would properly encode this at the type system level, which would make it less dangerous to use.
Agreed here, but to make this const_fn
we will have to add additional features. I'm going to merge this as-is and add any additional constructors in a follow-up PR (which will also add new GDT constructor).
I agree, this isn't a breaking change, as we don't make a public-facing guarantee as to the exact values here. Note that after this change we will (as the docs will specifically guarantee compatibility with syscall/sysret), so changing them again in the future would be a breaking change. |
@phil-opp I updated the PR to:
PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for looking into this in such detail! Compatibilty with syscall
/sysenter
is very useful.
This is ready to be merged from my side. So feel free to merge it when you think it's ready. Perhaps you could add a short sentence about the new syscall/sysenter compatibilty to the PR description before merging. Also, could you add this change to the "Unreleased" section in the Changelog.md
? See the changelog update commit for v0.12.1
for the changelog entry format we use.
After merging the PR and updating the changelog, you can publish a new crates.io release by running cargo release patch
. You need to install the cargo-release
tool for that. This will automatically increase the version number, publish to crates.io, and create and push a release tag. After publishing, I typically post a short "Published as v0.12.2" comment under the released PR.
Let me know if you have any questions about these steps! I'm also happy to do the release myself if you're not comfortable with it.
Add the remaining GDT flags. This now allows users to create 32-bit and 64-bit GDTs. We also add flag aliases for common cases. We then verify that these cases match what the Linux kernel uses by default. This also allows us to make additional methods `const fn`. I used [AMD64 Architecture Programmer’s Manual, Volume 2: System Programming](https://www.amd.com/system/files/TechDocs/24593.pdf) as the main reference for this PR. Two remaining open questions: - Should we set `WRITABLE` (aka Readable for code-segments) by default? - This bit is ignored in Long Mode (for code and data) - Linux always sets it by default - This is probably what users want for legacy mode segments. - Should we set `ACCESSED` by default? - Linux [sets it by default](https://github.com/torvalds/linux/blob/00e4db51259a5f936fec1424b884f029479d3981/arch/x86/kernel/cpu/common.c#L124-L129) - Except [sometimes it doesn't](https://github.com/torvalds/linux/blob/ab851d49f6bfc781edd8bd44c72ec1e49211670b/arch/x86/platform/pvh/head.S#L160-L166) - AMD claims this is ignored, but Intel doesn't make the same claim Signed-off-by: Joe Richey <[email protected]>
Signed-off-by: Joe Richey <[email protected]>
Also update the documentation to note that fs/gs don't ignored the BASE value. Signed-off-by: Joe Richey <[email protected]>
Signed-off-by: Joe Richey <[email protected]>
@josephlr Sorry for posting on a closed PR, but you have several good questions which appear to have gone unanswered.
There are some other errors/confusions with flag names. EXECUTABLE is the bit which distinguishes a code segment from a data segment. USER_SEGMENT is very poorly named (in the spec as well), and distinguishes "code/data segment" from "system segments". Another bit, CONFORMING also has a different meaning for data segments, and is EXPAND_DOWN in that case. This flips the permitted range from [base, limit) to [limit, base) and is useful in 32bit systems to enforce separation between the heap and the stack while keeping them in the same single logical address space from the compilers point of view. The DPL field isn't ignored under any circumstance, and it is critically important to be correct, as When the manuals say "ignored it 64bit mode", it means "the value doesn't have any impact when executing in a 64bit code segment". The bits are still loaded into the segment cache, because they are meaningful in 32bit code segments, and the data segments don't get reloaded when
Noone has used segmented memory management (citation probably needed) since the 386 replaced the 286, and the perf hit from locked operations isn't free. You've already spotted the read-only aspect, and that really is worth the effort, but there really is no point leaving the ACCESSED bit clear if you're not using segmented memory management. ~Andrew [1] Also don't ask what used to happen on Pentium era CPUs where the GDT was in ROM and the processor tried setting the ACCESSED bit. Architecturally, the processor may not continue executing until the A bit has been set in memory. |
@andyhhp thanks for the detailed response. It looks like we did the right thing in which flags we set, but feel free to be take a look at what we set. Also, let us know if the docs for the flags are incorrect or unclear.
This is actually why I initially realized something was wrong and opened this issue. For https://github.com/cloud-hypervisor/rust-hypervisor-firmware/, we were putting the GDT in ROM, and everything kept triple-faulting. |
:) So in that case, the fault is taken as #PF, doubled up on the second access, turns into #DF and escalates to SHUTDOWN, which is intercepted, hence the triple fault. On old processor (and particularly with Option ROMs running in real mode with no paging), the pipeline would read back to confirm that the A bit had been set. Nowadays, they are rather more clever. If this sounds suspiciously similar to the #DB / #AC infinite loops which plagued hypervisors a few years back, that's because it is. The CPU never reaches the next instruction boundary, so doesn't respond to NMIs or INIT IPIs. |
Add the remaining GDT flags. This now allows users to create 32-bit
and 64-bit GDTs. I also added more explanations about what each flag
does, and in which modes it is ignored.
We also add flag aliases for common cases. We then verify that these
cases match what the Linux kernel uses by default. Note that these defaults
are in some sense "mandatory" as the
syscall
/sysenter
instructions loadconstants into the various GDT registers. This also allows us
to make additional methods
const fn
.I used AMD64 Architecture Programmer’s Manual, Volume 2: System Programming
as the main reference for this PR.
Two remaining open questions:
WRITABLE
(aka Readable for code-segments) by default?ACCESSED
by default?Signed-off-by: Joe Richey [email protected]