gdt: Add remaining GDT flags #181

josephlr · 2020-09-23T04:45:50Z

Add the remaining GDT flags. This now allows users to create 32-bit
and 64-bit GDTs. I also added more explanations about what each flag
does, and in which modes it is ignored.

We also add flag aliases for common cases. We then verify that these
cases match what the Linux kernel uses by default. Note that these defaults
are in some sense "mandatory" as the syscall/sysenter instructions load
constants into the various GDT registers. This also allows us
to make additional methods const fn.

I used AMD64 Architecture Programmer’s Manual, Volume 2: System Programming
as the main reference for this PR.

Two remaining open questions:

Should we set WRITABLE (aka Readable for code-segments) by default?
- This bit is ignored in 64-bit Mode (code and data)
- Linux always sets it by default
- This is probably what users want for 32-bit mode segments.
Should we set ACCESSED by default?
- Linux sets it by default
- Except sometimes it doesn't
- AMD claims this is ignored, but Intel doesn't make the same claim.
- Setting this prevents a write on first access (useful when your GDT is mapped read-only).

Signed-off-by: Joe Richey [email protected]

phil-opp · 2020-09-23T08:46:37Z

Thanks a lot!

Some notes:

I'm not a fan of "combined" flags like FLAT_COMMON because it is not clear how they relate to the other flags. For example, it is not apparent whether FLAT_COMMON includes the PRESENT bit from the documentation. Even worse, clicking the [src] in the documentation takes you to the bitflags source, not the definition of the FLAT_COMMON flag. This is mostly a problem with the bitflags macro, so maybe it makes sense to define these combined flags outside of the macro in a separate impl blog.
Comparing the values with the Linux kernel is a good idea, but I'm not sure whether I like that we set so many ignored values for 64-bit segments. To me it now looks more complex with the extra flags, but I'm happy to discuss this.
I think my preference is to not set the WRITABLE and ACCESSED flags by default because it is less surprising. However, I don't have a strong opinion on this.

Unrelated to this PR: I saw that you did some issue triage and reviewed other PRs as well. Thanks a lot for that! If you like to help us maintain and extend this crate, I can add you to our x86_64 team, which would give you write access to this repo.

josephlr · 2020-09-23T09:30:39Z

I'm not a fan of "combined" flags like FLAT_COMMON because it is not clear how they relate to the other flags. For example, it is not apparent whether FLAT_COMMON includes the PRESENT bit from the documentation. Even worse, clicking the [src] in the documentation takes you to the bitflags source, not the definition of the FLAT_COMMON flag. This is mostly a problem with the bitflags macro, so maybe it makes sense to define these combined flags outside of the macro in a separate impl blog.

Ya the fact that the COMMON flags are public (and that they still look like "normal" flags) makes the docs confusing, I'll see how it looks moved in the impl DescriptorFlags block.

To me it now looks more complex with the extra flags, but I'm happy to discuss this.

Looking over it, I'm inclined to agree in the case of the 64-bit Code segments, we should just set the minimal set. The data segments are more complex, as they can be run in 32 or 64-bit mode, so some fields are only somtimes ignored (this is the Linux use-case).

I think my preference is to not set the WRITABLE and ACCESSED flags by default because it is less surprising. However, I don't have a strong opinion on this.

I think setting these follows the same logic as setting the Limit to be the maximum value for the 32-bit segments. Users should be using (and on 64-bit must use) paging to handle access controls and tracking, not segmentation. I can document this more.

josephlr · 2020-09-23T09:57:21Z

@phil-opp I reworked the PR to:

Not set ignored flags for 64-bit segments
Move the combined flags to their own section
In that separate section, document what the values are and why we choose them (i.e. "use these values if you do all the interesting stuff with paging anyway")
Updated the tests

josephlr · 2020-09-23T10:03:41Z

Thanks a lot for that! If you like to help us maintain and extend this crate, I can add you to our x86_64 team, which would give you write access to this repo.

No problem. I would be happy to help maintain. This crate is a core dependency for several of our projects, so it's certainly in our interest to make sure everything is looking good here. This is the first PR of a patch-set that lets us construct fully static (i.e. linker determined location, read-only at runtime) GDT/IDT/PageTables/etc...

phil-opp · 2020-09-23T11:19:44Z

Thanks for the update, looks good now! I'll try to do a full review soon, but it might take a few days until I get to it.

(One thing that we need to figure out is whether this is a breaking change or not.)

No problem. I would be happy to help maintain. This crate is a core dependency for several of our projects, so it's certainly in our interest to make sure everything is looking good here.

Great, thanks! I sent you an invite.

This is the first PR of a patch-set that lets us construct fully static (i.e. linker determined location, read-only at runtime) GDT/IDT/PageTables/etc...

Sounds useful, looking forward to your patches!

I think my preference is to not set the WRITABLE and ACCESSED flags by default because it is less surprising. However, I don't have a strong opinion on this.

I think setting these follows the same logic as setting the Limit to be the maximum value for the 32-bit segments. Users should be using (and on 64-bit must use) paging to handle access controls and tracking, not segmentation. I can document this more.

Good points! Then let's set these flags by default.

phil-opp

Looks good to me, thanks a lot!

A left a few minor suggestions inline, but these are things that we can also address in follow-up PRs if you prefer.

Regarding breakage: While we change the observable return value of the kernel_code_segment etc. functions, they should behave exactly as before. For this reason I'm inclined to define this change as semver-compatile and release only a new patch version (v0.12.2) for it. Or do you think we should do a minor release (v0.13.0)?

src/structures/gdt.rs

phil-opp · 2020-09-26T11:00:47Z

src/structures/gdt.rs

+    pub const fn kernel_code_segment() -> Descriptor {
+        Descriptor::UserSegment(DescriptorFlags::KERNEL_CODE64.bits())
    }


We should probably also add a kernel_data_segment function to resolve #160.

Also, a general from_flags(flags: DescriptorFlags) -> Self function be useful.

Good catch I forgot about syscall/sysenter/sysret/sysexit, this might actually impact what we do for defaults here.

I'll do some more reading

So it looks like we should set all of these "ignored" flags. Per the Intel manual, syscall/sysenter/sysret/sysexit all load a specific selector from the GDT, but they don't actually read the GDT entry, they instead load fixed data in the descriptor caches. The manual then states:

It is the responsibility of OS software to ensure that the descriptors (in GDT or LDT) referenced by those selector values correspond to the fixed values loaded into the descriptor caches; the SYSCALL instruction does not ensure this
correspondence.

So we should just have our defaults match those fixed values the processor loads automatically. These are the values I had initially (i.e. exactly the values Linux uses). It now also makes sense why Linux was setting ignored bits, accessed bits, etc...

Specific details of what flags are set by the processor

Ring 3 -> Ring 0

CS

SYSCALL: SEL = IA32_START[47:32]
SYSENTER: SEL = IA32_SYSENTER_CS[15:0]

Base = 0

Limit = 0xFFFFF

Type = 0x1011 (execute, read, accessed)

S = 1

DPL = 0

P = 1

L = 1 (0 32-bit)

D = 0 (1 32-bit)

G = 1

SS

SYSCALL: SEL = IA32_START[47:32] + 8
SYSENTER: SEL = IA32_SYSENTER_CS[15:0] + 8

Base = 0

Limit = 0xFFFFF

Type = 0x0011 (data, write, accessed)

S = 1

DPL = 0

P = 1

B = 1

G = 1

Ring 0 -> Ring 3

CS

SYSRET: SEL = IA32_STAR[63:48] + 16
SYSRET (32-bit): SEL = IA32_STAR[63:48]
SYSEXIT: SEL = IA32_SYSENTER_CS[15:0] + 32
SYSEXIT (32-bit): SEL = IA32_SYSENTER_CS[15:0] + 16

Base = 0

Limit = 0xFFFFF

Type = 0x1011 (execute, read, accessed)

S = 1

DPL = 3

P = 1

L = 1 (0 32-bit)

D = 0 (1 32-bit)

G = 1

SS

SYSRET: SEL = IA32_STAR[63:48]+8
SYSEXIT: SEL = IA32_SYSENTER_CS[15:0] + 40
SYSEXIT (32-bit): SEL = IA32_SYSENTER_CS[15:0] + 24

Base = 0

Limit = 0xFFFFF

Type = 0x0011 (data, write, accessed)

S = 1

DPL = 3

P = 1

B = 1

G = 1

Interestingly, due to the constraints of these commands, if you wanted to support both, you would need a GDT that had the following segments ordered like:

KERNEL_CODE64

KERNEL_DATA

USER_CODE32

USER_DATA

USER_CODE64

USER_DATA (only need for 64-bit sysenter/sysexit)

Also, a general from_flags(flags: DescriptorFlags) -> Self function be useful.

A user can already do Descriptor::UserSegment(flags.bits()) do you think adding another method is useful? Should it do some sort of basic checks?

Also, a general from_flags(flags: DescriptorFlags) -> Self function be useful.

A user can already do Descriptor::UserSegment(flags.bits()) do you think adding another method is useful? Should it do some sort of basic checks?

I was mostly thinking about improving the discoverability of the DescriptorFlags structure. Right now, the relationship between the DescriptorFlags and GlobalDescriptorTable types is only implicit. You have to know the GDT details to be sure that Descriptor::UserSegment(flags.bits()) is the correct implementation (as you could put any u64 in there). An additional from_flags method would properly encode this at the type system level, which would make it less dangerous to use.

So we should just have our defaults match those fixed values the processor loads automatically. These are the values I had initially (i.e. exactly the values Linux uses). It now also makes sense why Linux was setting ignored bits, accessed bits, etc...

Thanks a lot for investigating this! I agree that it makes sense to use the sames values at Linux then.

Interestingly, due to the constraints of these commands, if you wanted to support both, you would need a GDT that had the following segments ordered like:

* KERNEL_CODE64 * KERNEL_DATA * USER_CODE32 * USER_DATA * USER_CODE64 * USER_DATA (only need for 64-bit sysenter/sysexit)

Unrelated to this PR, but maybe it makes sense to add a new constructor function to GlobalDescriptorTable with exactly these entries? It seems like this is the canonical GDT layout that most users will want.

An additional from_flags method would properly encode this at the type system level, which would make it less dangerous to use.

Agreed here, but to make this const_fn we will have to add additional features. I'm going to merge this as-is and add any additional constructors in a follow-up PR (which will also add new GDT constructor).

josephlr · 2020-09-27T07:39:07Z

For this reason I'm inclined to define this change as semver-compatile and release only a new patch version (v0.12.2) for it.

I agree, this isn't a breaking change, as we don't make a public-facing guarantee as to the exact values here. Note that after this change we will (as the docs will specifically guarantee compatibility with syscall/sysret), so changing them again in the future would be a breaking change.

josephlr · 2020-09-27T08:03:36Z

@phil-opp I updated the PR to:

Have our default flags be the values loaded by syscall/sysret
Added user_data_segment
Updated docs to talk about syscall/sysret/sysenter/sysexit compat:
Updated docs to note that BASE isn't ignored for fs and gs in 64-bit mode

PTAL

phil-opp

Thanks a lot for looking into this in such detail! Compatibilty with syscall/sysenter is very useful.

This is ready to be merged from my side. So feel free to merge it when you think it's ready. Perhaps you could add a short sentence about the new syscall/sysenter compatibilty to the PR description before merging. Also, could you add this change to the "Unreleased" section in the Changelog.md? See the changelog update commit for v0.12.1 for the changelog entry format we use.

After merging the PR and updating the changelog, you can publish a new crates.io release by running cargo release patch. You need to install the cargo-release tool for that. This will automatically increase the version number, publish to crates.io, and create and push a release tag. After publishing, I typically post a short "Published as v0.12.2" comment under the released PR.

Let me know if you have any questions about these steps! I'm also happy to do the release myself if you're not comfortable with it.

Add the remaining GDT flags. This now allows users to create 32-bit and 64-bit GDTs. We also add flag aliases for common cases. We then verify that these cases match what the Linux kernel uses by default. This also allows us to make additional methods `const fn`. I used [AMD64 Architecture Programmer’s Manual, Volume 2: System Programming](https://www.amd.com/system/files/TechDocs/24593.pdf) as the main reference for this PR. Two remaining open questions: - Should we set `WRITABLE` (aka Readable for code-segments) by default? - This bit is ignored in Long Mode (for code and data) - Linux always sets it by default - This is probably what users want for legacy mode segments. - Should we set `ACCESSED` by default? - Linux [sets it by default](https://github.com/torvalds/linux/blob/00e4db51259a5f936fec1424b884f029479d3981/arch/x86/kernel/cpu/common.c#L124-L129) - Except [sometimes it doesn't](https://github.com/torvalds/linux/blob/ab851d49f6bfc781edd8bd44c72ec1e49211670b/arch/x86/platform/pvh/head.S#L160-L166) - AMD claims this is ignored, but Intel doesn't make the same claim Signed-off-by: Joe Richey <[email protected]>

Signed-off-by: Joe Richey <[email protected]>

Also update the documentation to note that fs/gs don't ignored the BASE value. Signed-off-by: Joe Richey <[email protected]>

Signed-off-by: Joe Richey <[email protected]>

josephlr · 2020-09-29T07:15:53Z

@phil-opp similar to #186 this will have to be force-merged (once th 0.12.2 realease goes though, we should be good).

andyhhp · 2020-10-09T02:20:38Z

@josephlr Sorry for posting on a closed PR, but you have several good questions which appear to have gone unanswered.

Code segments don't have a WRITEABLE bit. The bit in the same position is READABLE for code segments, and having it clear means that mov %cs:(%eax), %ecx will take a #GP fault.

There are some other errors/confusions with flag names. EXECUTABLE is the bit which distinguishes a code segment from a data segment. USER_SEGMENT is very poorly named (in the spec as well), and distinguishes "code/data segment" from "system segments".

Another bit, CONFORMING also has a different meaning for data segments, and is EXPAND_DOWN in that case. This flips the permitted range from [base, limit) to [limit, base) and is useful in 32bit systems to enforce separation between the heap and the stack while keeping them in the same single logical address space from the compilers point of view.

The DPL field isn't ignored under any circumstance, and it is critically important to be correct, as %ss.dpl is state more commonly called CPL in the manuals.

When the manuals say "ignored it 64bit mode", it means "the value doesn't have any impact when executing in a 64bit code segment". The bits are still loaded into the segment cache, because they are meaningful in 32bit code segments, and the data segments don't get reloaded when %cs gets reloaded.

Always set ACCESSED bits[1]. It has identical semantics to the ACCESSED bit in a pagetable entry, and is used by the OS to track the segments which have been used recently. It also has the same downside as pagetables, in that loading the segment causes a locked memory operation by the CPU to set the bit before it is loaded into the segment cache.

Noone has used segmented memory management (citation probably needed) since the 386 replaced the 286, and the perf hit from locked operations isn't free. You've already spotted the read-only aspect, and that really is worth the effort, but there really is no point leaving the ACCESSED bit clear if you're not using segmented memory management.

~Andrew

[1] Also don't ask what used to happen on Pentium era CPUs where the GDT was in ROM and the processor tried setting the ACCESSED bit. Architecturally, the processor may not continue executing until the A bit has been set in memory.

josephlr · 2020-10-09T03:39:46Z

@andyhhp thanks for the detailed response. It looks like we did the right thing in which flags we set, but feel free to be take a look at what we set. Also, let us know if the docs for the flags are incorrect or unclear.

[1] Also don't ask what used to happen on Pentium era CPUs where the GDT was in ROM and the processor tried setting the ACCESSED bit. Architecturally, the processor may not continue executing until the A bit has been set in memory.

This is actually why I initially realized something was wrong and opened this issue. For https://github.com/cloud-hypervisor/rust-hypervisor-firmware/, we were putting the GDT in ROM, and everything kept triple-faulting.

andyhhp · 2020-10-09T17:39:38Z

This is actually why I initially realized something was wrong and opened this issue. For https://github.com/cloud-hypervisor/rust-hypervisor-firmware/, we were putting the GDT in ROM, and everything kept triple-faulting.

:) So in that case, the fault is taken as #PF, doubled up on the second access, turns into #DF and escalates to SHUTDOWN, which is intercepted, hence the triple fault.

On old processor (and particularly with Option ROMs running in real mode with no paging), the pipeline would read back to confirm that the A bit had been set. Nowadays, they are rather more clever.

If this sounds suspiciously similar to the #DB / #AC infinite loops which plagued hypervisors a few years back, that's because it is. The CPU never reaches the next instruction boundary, so doesn't respond to NMIs or INIT IPIs.

josephlr force-pushed the gdt branch from e9c080e to be43444 Compare September 23, 2020 04:51

This was referenced Sep 23, 2020

Add VirtAddr::is_null #180

Merged

Descriptor:user_data_segment() does not have Flags::LONG_MODE set #171

Closed

josephlr force-pushed the gdt branch from aaa0b5d to 9a0524c Compare September 24, 2020 08:03

phil-opp approved these changes Sep 26, 2020

View reviewed changes

josephlr requested a review from phil-opp September 27, 2020 08:31

phil-opp approved these changes Sep 27, 2020

View reviewed changes

josephlr added 4 commits September 29, 2020 00:13

gdt: Clarify documentation and move flags

e265b0e

Signed-off-by: Joe Richey <[email protected]>

gdt: Set unused bits for syscall/sysret compatiblity

5c11d0d

Also update the documentation to note that fs/gs don't ignored the BASE value. Signed-off-by: Joe Richey <[email protected]>

Update Changelog

d6a45d7

Signed-off-by: Joe Richey <[email protected]>

josephlr force-pushed the gdt branch from 0abcda8 to d6a45d7 Compare September 29, 2020 07:15

josephlr merged commit d2f3145 into rust-osdev:master Sep 29, 2020

josephlr deleted the gdt branch September 29, 2020 07:21

phil-opp mentioned this pull request Oct 4, 2020

No kernel data segment? #160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gdt: Add remaining GDT flags #181

gdt: Add remaining GDT flags #181

josephlr commented Sep 23, 2020 •

edited

Loading

phil-opp commented Sep 23, 2020

josephlr commented Sep 23, 2020

josephlr commented Sep 23, 2020

josephlr commented Sep 23, 2020

phil-opp commented Sep 23, 2020

phil-opp left a comment

phil-opp Sep 26, 2020

josephlr Sep 26, 2020

josephlr Sep 27, 2020 •

edited

Loading

josephlr Sep 27, 2020

phil-opp Sep 27, 2020

phil-opp Sep 27, 2020

josephlr Sep 29, 2020

josephlr commented Sep 27, 2020 •

edited

Loading

josephlr commented Sep 27, 2020 •

edited

Loading

phil-opp left a comment

josephlr commented Sep 29, 2020

andyhhp commented Oct 9, 2020

josephlr commented Oct 9, 2020

andyhhp commented Oct 9, 2020

gdt: Add remaining GDT flags #181

gdt: Add remaining GDT flags #181

Conversation

josephlr commented Sep 23, 2020 • edited Loading

phil-opp commented Sep 23, 2020

josephlr commented Sep 23, 2020

josephlr commented Sep 23, 2020

josephlr commented Sep 23, 2020

phil-opp commented Sep 23, 2020

phil-opp left a comment

Choose a reason for hiding this comment

phil-opp Sep 26, 2020

Choose a reason for hiding this comment

josephlr Sep 26, 2020

Choose a reason for hiding this comment

josephlr Sep 27, 2020 • edited Loading

Choose a reason for hiding this comment

Ring 3 -> Ring 0

CS

SS

Ring 0 -> Ring 3

CS

SS

josephlr Sep 27, 2020

Choose a reason for hiding this comment

phil-opp Sep 27, 2020

Choose a reason for hiding this comment

phil-opp Sep 27, 2020

Choose a reason for hiding this comment

josephlr Sep 29, 2020

Choose a reason for hiding this comment

josephlr commented Sep 27, 2020 • edited Loading

josephlr commented Sep 27, 2020 • edited Loading

phil-opp left a comment

Choose a reason for hiding this comment

josephlr commented Sep 29, 2020

andyhhp commented Oct 9, 2020

josephlr commented Oct 9, 2020

andyhhp commented Oct 9, 2020

josephlr commented Sep 23, 2020 •

edited

Loading

josephlr Sep 27, 2020 •

edited

Loading

josephlr commented Sep 27, 2020 •

edited

Loading

josephlr commented Sep 27, 2020 •

edited

Loading