Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdt: Add remaining GDT flags #181

Merged
merged 4 commits into from
Sep 29, 2020
Merged

gdt: Add remaining GDT flags #181

merged 4 commits into from
Sep 29, 2020

Conversation

josephlr
Copy link
Contributor

@josephlr josephlr commented Sep 23, 2020

Add the remaining GDT flags. This now allows users to create 32-bit
and 64-bit GDTs. I also added more explanations about what each flag
does, and in which modes it is ignored.

We also add flag aliases for common cases. We then verify that these
cases match what the Linux kernel uses by default. Note that these defaults
are in some sense "mandatory" as the syscall/sysenter instructions load
constants into the various GDT registers. This also allows us
to make additional methods const fn.

I used AMD64 Architecture Programmer’s Manual, Volume 2: System Programming
as the main reference for this PR.

Two remaining open questions:

  1. Should we set WRITABLE (aka Readable for code-segments) by default?
    • This bit is ignored in 64-bit Mode (code and data)
    • Linux always sets it by default
    • This is probably what users want for 32-bit mode segments.
  2. Should we set ACCESSED by default?
    • Linux sets it by default
    • Except sometimes it doesn't
    • AMD claims this is ignored, but Intel doesn't make the same claim.
    • Setting this prevents a write on first access (useful when your GDT is mapped read-only).

Signed-off-by: Joe Richey [email protected]

@phil-opp
Copy link
Member

Thanks a lot!

Some notes:

  • I'm not a fan of "combined" flags like FLAT_COMMON because it is not clear how they relate to the other flags. For example, it is not apparent whether FLAT_COMMON includes the PRESENT bit from the documentation. Even worse, clicking the [src] in the documentation takes you to the bitflags source, not the definition of the FLAT_COMMON flag. This is mostly a problem with the bitflags macro, so maybe it makes sense to define these combined flags outside of the macro in a separate impl blog.
  • Comparing the values with the Linux kernel is a good idea, but I'm not sure whether I like that we set so many ignored values for 64-bit segments. To me it now looks more complex with the extra flags, but I'm happy to discuss this.
  • I think my preference is to not set the WRITABLE and ACCESSED flags by default because it is less surprising. However, I don't have a strong opinion on this.

Unrelated to this PR: I saw that you did some issue triage and reviewed other PRs as well. Thanks a lot for that! If you like to help us maintain and extend this crate, I can add you to our x86_64 team, which would give you write access to this repo.

@josephlr
Copy link
Contributor Author

I'm not a fan of "combined" flags like FLAT_COMMON because it is not clear how they relate to the other flags. For example, it is not apparent whether FLAT_COMMON includes the PRESENT bit from the documentation. Even worse, clicking the [src] in the documentation takes you to the bitflags source, not the definition of the FLAT_COMMON flag. This is mostly a problem with the bitflags macro, so maybe it makes sense to define these combined flags outside of the macro in a separate impl blog.

Ya the fact that the COMMON flags are public (and that they still look like "normal" flags) makes the docs confusing, I'll see how it looks moved in the impl DescriptorFlags block.

To me it now looks more complex with the extra flags, but I'm happy to discuss this.

Looking over it, I'm inclined to agree in the case of the 64-bit Code segments, we should just set the minimal set. The data segments are more complex, as they can be run in 32 or 64-bit mode, so some fields are only somtimes ignored (this is the Linux use-case).

I think my preference is to not set the WRITABLE and ACCESSED flags by default because it is less surprising. However, I don't have a strong opinion on this.

I think setting these follows the same logic as setting the Limit to be the maximum value for the 32-bit segments. Users should be using (and on 64-bit must use) paging to handle access controls and tracking, not segmentation. I can document this more.

@josephlr
Copy link
Contributor Author

@phil-opp I reworked the PR to:

  • Not set ignored flags for 64-bit segments
  • Move the combined flags to their own section
  • In that separate section, document what the values are and why we choose them (i.e. "use these values if you do all the interesting stuff with paging anyway")
  • Updated the tests

@josephlr
Copy link
Contributor Author

Thanks a lot for that! If you like to help us maintain and extend this crate, I can add you to our x86_64 team, which would give you write access to this repo.

No problem. I would be happy to help maintain. This crate is a core dependency for several of our projects, so it's certainly in our interest to make sure everything is looking good here. This is the first PR of a patch-set that lets us construct fully static (i.e. linker determined location, read-only at runtime) GDT/IDT/PageTables/etc...

@phil-opp
Copy link
Member

Thanks for the update, looks good now! I'll try to do a full review soon, but it might take a few days until I get to it.

(One thing that we need to figure out is whether this is a breaking change or not.)

No problem. I would be happy to help maintain. This crate is a core dependency for several of our projects, so it's certainly in our interest to make sure everything is looking good here.

Great, thanks! I sent you an invite.

This is the first PR of a patch-set that lets us construct fully static (i.e. linker determined location, read-only at runtime) GDT/IDT/PageTables/etc...

Sounds useful, looking forward to your patches!

I think my preference is to not set the WRITABLE and ACCESSED flags by default because it is less surprising. However, I don't have a strong opinion on this.

I think setting these follows the same logic as setting the Limit to be the maximum value for the 32-bit segments. Users should be using (and on 64-bit must use) paging to handle access controls and tracking, not segmentation. I can document this more.

Good points! Then let's set these flags by default.

Copy link
Member

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks a lot!

A left a few minor suggestions inline, but these are things that we can also address in follow-up PRs if you prefer.

Regarding breakage: While we change the observable return value of the kernel_code_segment etc. functions, they should behave exactly as before. For this reason I'm inclined to define this change as semver-compatile and release only a new patch version (v0.12.2) for it. Or do you think we should do a minor release (v0.13.0)?

src/structures/gdt.rs Show resolved Hide resolved
pub const fn kernel_code_segment() -> Descriptor {
Descriptor::UserSegment(DescriptorFlags::KERNEL_CODE64.bits())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also add a kernel_data_segment function to resolve #160.

Also, a general from_flags(flags: DescriptorFlags) -> Self function be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch I forgot about syscall/sysenter/sysret/sysexit, this might actually impact what we do for defaults here.

I'll do some more reading

Copy link
Contributor Author

@josephlr josephlr Sep 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like we should set all of these "ignored" flags. Per the Intel manual, syscall/sysenter/sysret/sysexit all load a specific selector from the GDT, but they don't actually read the GDT entry, they instead load fixed data in the descriptor caches. The manual then states:

It is the responsibility of OS software to ensure that the descriptors (in GDT or LDT) referenced by those selector values correspond to the fixed values loaded into the descriptor caches; the SYSCALL instruction does not ensure this
correspondence.

So we should just have our defaults match those fixed values the processor loads automatically. These are the values I had initially (i.e. exactly the values Linux uses). It now also makes sense why Linux was setting ignored bits, accessed bits, etc...

Specific details of what flags are set by the processor

Ring 3 -> Ring 0

CS

SYSCALL: SEL = IA32_START[47:32]
SYSENTER: SEL = IA32_SYSENTER_CS[15:0]

  • Base = 0
  • Limit = 0xFFFFF
  • Type = 0x1011 (execute, read, accessed)
  • S = 1
  • DPL = 0
  • P = 1
  • L = 1 (0 32-bit)
  • D = 0 (1 32-bit)
  • G = 1

SS

SYSCALL: SEL = IA32_START[47:32] + 8
SYSENTER: SEL = IA32_SYSENTER_CS[15:0] + 8

  • Base = 0
  • Limit = 0xFFFFF
  • Type = 0x0011 (data, write, accessed)
  • S = 1
  • DPL = 0
  • P = 1
  • B = 1
  • G = 1

Ring 0 -> Ring 3

CS

SYSRET: SEL = IA32_STAR[63:48] + 16
SYSRET (32-bit): SEL = IA32_STAR[63:48]
SYSEXIT: SEL = IA32_SYSENTER_CS[15:0] + 32
SYSEXIT (32-bit): SEL = IA32_SYSENTER_CS[15:0] + 16

  • Base = 0
  • Limit = 0xFFFFF
  • Type = 0x1011 (execute, read, accessed)
  • S = 1
  • DPL = 3
  • P = 1
  • L = 1 (0 32-bit)
  • D = 0 (1 32-bit)
  • G = 1

SS

SYSRET: SEL = IA32_STAR[63:48]+8
SYSEXIT: SEL = IA32_SYSENTER_CS[15:0] + 40
SYSEXIT (32-bit): SEL = IA32_SYSENTER_CS[15:0] + 24

  • Base = 0
  • Limit = 0xFFFFF
  • Type = 0x0011 (data, write, accessed)
  • S = 1
  • DPL = 3
  • P = 1
  • B = 1
  • G = 1

Interestingly, due to the constraints of these commands, if you wanted to support both, you would need a GDT that had the following segments ordered like:

  • KERNEL_CODE64
  • KERNEL_DATA
  • USER_CODE32
  • USER_DATA
  • USER_CODE64
  • USER_DATA (only need for 64-bit sysenter/sysexit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a general from_flags(flags: DescriptorFlags) -> Self function be useful.

A user can already do Descriptor::UserSegment(flags.bits()) do you think adding another method is useful? Should it do some sort of basic checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a general from_flags(flags: DescriptorFlags) -> Self function be useful.

A user can already do Descriptor::UserSegment(flags.bits()) do you think adding another method is useful? Should it do some sort of basic checks?

I was mostly thinking about improving the discoverability of the DescriptorFlags structure. Right now, the relationship between the DescriptorFlags and GlobalDescriptorTable types is only implicit. You have to know the GDT details to be sure that Descriptor::UserSegment(flags.bits()) is the correct implementation (as you could put any u64 in there). An additional from_flags method would properly encode this at the type system level, which would make it less dangerous to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should just have our defaults match those fixed values the processor loads automatically. These are the values I had initially (i.e. exactly the values Linux uses). It now also makes sense why Linux was setting ignored bits, accessed bits, etc...

Thanks a lot for investigating this! I agree that it makes sense to use the sames values at Linux then.

Interestingly, due to the constraints of these commands, if you wanted to support both, you would need a GDT that had the following segments ordered like:

* KERNEL_CODE64

* KERNEL_DATA

* USER_CODE32

* USER_DATA

* USER_CODE64

* USER_DATA (only need for 64-bit sysenter/sysexit)

Unrelated to this PR, but maybe it makes sense to add a new constructor function to GlobalDescriptorTable with exactly these entries? It seems like this is the canonical GDT layout that most users will want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional from_flags method would properly encode this at the type system level, which would make it less dangerous to use.

Agreed here, but to make this const_fn we will have to add additional features. I'm going to merge this as-is and add any additional constructors in a follow-up PR (which will also add new GDT constructor).

@josephlr
Copy link
Contributor Author

josephlr commented Sep 27, 2020

For this reason I'm inclined to define this change as semver-compatile and release only a new patch version (v0.12.2) for it.

I agree, this isn't a breaking change, as we don't make a public-facing guarantee as to the exact values here. Note that after this change we will (as the docs will specifically guarantee compatibility with syscall/sysret), so changing them again in the future would be a breaking change.

@josephlr
Copy link
Contributor Author

josephlr commented Sep 27, 2020

@phil-opp I updated the PR to:

  • Have our default flags be the values loaded by syscall/sysret
  • Added user_data_segment
  • Updated docs to talk about syscall/sysret/sysenter/sysexit compat:
  • Updated docs to note that BASE isn't ignored for fs and gs in 64-bit mode

PTAL

@josephlr josephlr requested a review from phil-opp September 27, 2020 08:31
Copy link
Member

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for looking into this in such detail! Compatibilty with syscall/sysenter is very useful.

This is ready to be merged from my side. So feel free to merge it when you think it's ready. Perhaps you could add a short sentence about the new syscall/sysenter compatibilty to the PR description before merging. Also, could you add this change to the "Unreleased" section in the Changelog.md? See the changelog update commit for v0.12.1 for the changelog entry format we use.

After merging the PR and updating the changelog, you can publish a new crates.io release by running cargo release patch. You need to install the cargo-release tool for that. This will automatically increase the version number, publish to crates.io, and create and push a release tag. After publishing, I typically post a short "Published as v0.12.2" comment under the released PR.

Let me know if you have any questions about these steps! I'm also happy to do the release myself if you're not comfortable with it.

Add the remaining GDT flags. This now allows users to create 32-bit
and 64-bit GDTs.

We also add flag aliases for common cases. We then verify that these
cases match what the Linux kernel uses by default. This also allows us
to make additional methods `const fn`.

I used [AMD64 Architecture Programmer’s Manual, Volume 2: System Programming](https://www.amd.com/system/files/TechDocs/24593.pdf)
as the main reference for this PR.

Two remaining open questions:
  - Should we set `WRITABLE` (aka Readable for code-segments) by default?
    - This bit is ignored in Long Mode (for code and data)
    - Linux always sets it by default
    - This is probably what users want for legacy mode segments.
  - Should we set `ACCESSED` by default?
    - Linux [sets it by default](https://github.com/torvalds/linux/blob/00e4db51259a5f936fec1424b884f029479d3981/arch/x86/kernel/cpu/common.c#L124-L129)
    - Except [sometimes it doesn't](https://github.com/torvalds/linux/blob/ab851d49f6bfc781edd8bd44c72ec1e49211670b/arch/x86/platform/pvh/head.S#L160-L166)
    - AMD claims this is ignored, but Intel doesn't make the same claim

Signed-off-by: Joe Richey <[email protected]>
Also update the documentation to note that fs/gs don't ignored the
BASE value.

Signed-off-by: Joe Richey <[email protected]>
Signed-off-by: Joe Richey <[email protected]>
@josephlr
Copy link
Contributor Author

@phil-opp similar to #186 this will have to be force-merged (once th 0.12.2 realease goes though, we should be good).

@josephlr josephlr merged commit d2f3145 into rust-osdev:master Sep 29, 2020
@josephlr josephlr deleted the gdt branch September 29, 2020 07:21
@phil-opp phil-opp mentioned this pull request Oct 4, 2020
@andyhhp
Copy link

andyhhp commented Oct 9, 2020

@josephlr Sorry for posting on a closed PR, but you have several good questions which appear to have gone unanswered.

  1. Code segments don't have a WRITEABLE bit. The bit in the same position is READABLE for code segments, and having it clear means that mov %cs:(%eax), %ecx will take a #GP fault.

There are some other errors/confusions with flag names. EXECUTABLE is the bit which distinguishes a code segment from a data segment. USER_SEGMENT is very poorly named (in the spec as well), and distinguishes "code/data segment" from "system segments".

Another bit, CONFORMING also has a different meaning for data segments, and is EXPAND_DOWN in that case. This flips the permitted range from [base, limit) to [limit, base) and is useful in 32bit systems to enforce separation between the heap and the stack while keeping them in the same single logical address space from the compilers point of view.

The DPL field isn't ignored under any circumstance, and it is critically important to be correct, as %ss.dpl is state more commonly called CPL in the manuals.

When the manuals say "ignored it 64bit mode", it means "the value doesn't have any impact when executing in a 64bit code segment". The bits are still loaded into the segment cache, because they are meaningful in 32bit code segments, and the data segments don't get reloaded when %cs gets reloaded.

  1. Always set ACCESSED bits[1]. It has identical semantics to the ACCESSED bit in a pagetable entry, and is used by the OS to track the segments which have been used recently. It also has the same downside as pagetables, in that loading the segment causes a locked memory operation by the CPU to set the bit before it is loaded into the segment cache.

Noone has used segmented memory management (citation probably needed) since the 386 replaced the 286, and the perf hit from locked operations isn't free. You've already spotted the read-only aspect, and that really is worth the effort, but there really is no point leaving the ACCESSED bit clear if you're not using segmented memory management.

~Andrew

[1] Also don't ask what used to happen on Pentium era CPUs where the GDT was in ROM and the processor tried setting the ACCESSED bit. Architecturally, the processor may not continue executing until the A bit has been set in memory.

@josephlr
Copy link
Contributor Author

josephlr commented Oct 9, 2020

@andyhhp thanks for the detailed response. It looks like we did the right thing in which flags we set, but feel free to be take a look at what we set. Also, let us know if the docs for the flags are incorrect or unclear.

[1] Also don't ask what used to happen on Pentium era CPUs where the GDT was in ROM and the processor tried setting the ACCESSED bit. Architecturally, the processor may not continue executing until the A bit has been set in memory.

This is actually why I initially realized something was wrong and opened this issue. For https://github.com/cloud-hypervisor/rust-hypervisor-firmware/, we were putting the GDT in ROM, and everything kept triple-faulting.

@andyhhp
Copy link

andyhhp commented Oct 9, 2020

This is actually why I initially realized something was wrong and opened this issue. For https://github.com/cloud-hypervisor/rust-hypervisor-firmware/, we were putting the GDT in ROM, and everything kept triple-faulting.

:) So in that case, the fault is taken as #PF, doubled up on the second access, turns into #DF and escalates to SHUTDOWN, which is intercepted, hence the triple fault.

On old processor (and particularly with Option ROMs running in real mode with no paging), the pipeline would read back to confirm that the A bit had been set. Nowadays, they are rather more clever.

If this sounds suspiciously similar to the #DB / #AC infinite loops which plagued hypervisors a few years back, that's because it is. The CPU never reaches the next instruction boundary, so doesn't respond to NMIs or INIT IPIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants