Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add flush_broadcast and tlbsync functions #403

Merged
merged 11 commits into from
Sep 15, 2023

Conversation

Freax13
Copy link
Member

@Freax13 Freax13 commented Feb 11, 2023

flush_broadcast can be used to broadcast TLB flushes to all CPU cores and tlbsync can be used to wait for those flushes to be done.

@Freax13 Freax13 requested a review from phil-opp February 11, 2023 14:02
Comment on lines 105 to 121

/// Invalidates TLB entry(s) with Broadcast.
///
/// # Safety
///
/// This function is unsafe as it requires CPUID.(EAX=8000_0008H, ECX=0H):EBX.INVLPGB
/// to be 1 and count to be less than or equal to CPUID.(EAX=8000_0008H, ECX=0H):EDX[0..=15].
#[inline]
pub unsafe fn flush_broadcast<S>(
va_and_count: Option<(Page<S>, u16)>,
pcid: Option<Pcid>,
asid: Option<u16>,
include_global: bool,
final_translation_only: bool,
include_nested_translations: bool,
) where
S: NotGiantPageSize,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use the builder pattern for this? There are an awful lot of parameters and this doesn't seem very future proof in case AMD adds more options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want some sort of builder or #[non_exhaustive] struct to be used for this.

A builder pattern would also better reflect:

  • What reasonable defaults should be
  • What parameters are optional

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be used to make some parameters unsafe.

@Freax13 Freax13 requested a review from josephlr February 11, 2023 14:06
@tlendacky
Copy link

If count is greater than the CPUID value (0x80000008_EDX[15:0]), you can issue the instruction in a loop and then perform a single TLBSYNC.

@Freax13
Copy link
Member Author

Freax13 commented Feb 11, 2023

If count is greater than the CPUID value (0x80000008_EDX[15:0]), you can issue the instruction in a loop and then perform a single TLBSYNC.

Ah, good to know, I was wondering about that, thank you!

I don't think we want to execute cpuid inside that function to figure out how often we have to loop though: There is precedence for relying on the caller to ensure an instruction is valid to execute (e.g. see flush_pcid in this crate or _rdrand64_step inside the standard library). Also in my particular use-case, I want to execute these instructions inside an SEV-SNP VM where getting cpuid values is not trivial (the cpuid instruction still exists, but may be slow).

@tlendacky
Copy link

If count is greater than the CPUID value (0x80000008_EDX[15:0]), you can issue the instruction in a loop and then perform a single TLBSYNC.

Ah, good to know, I was wondering about that, thank you!

I don't think we want to execute cpuid inside that function to figure out how often we have to loop though: There is precedence for relying on the caller to ensure an instruction is valid to execute (e.g. see flush_pcid in this crate or _rdrand64_step inside the standard library). Also in my particular use-case, I want to execute these instructions inside an SEV-SNP VM where getting cpuid values is not trivial (the cpuid instruction still exists, but may be slow).

Yes, you definitely don't want to be executing CPUID each time. In an SEV-SNP VM it will generate a #VC exception, which will then use the CPUID page to satisfy the instruction without exiting to the hypervisor, at least.

I haven't looked closely at the crate, is there any way to set/save some kind of initialization state that can be referenced?

@Freax13
Copy link
Member Author

Freax13 commented Feb 12, 2023

If count is greater than the CPUID value (0x80000008_EDX[15:0]), you can issue the instruction in a loop and then perform a single TLBSYNC.

Ah, good to know, I was wondering about that, thank you!
I don't think we want to execute cpuid inside that function to figure out how often we have to loop though: There is precedence for relying on the caller to ensure an instruction is valid to execute (e.g. see flush_pcid in this crate or _rdrand64_step inside the standard library). Also in my particular use-case, I want to execute these instructions inside an SEV-SNP VM where getting cpuid values is not trivial (the cpuid instruction still exists, but may be slow).

Yes, you definitely don't want to be executing CPUID each time. In an SEV-SNP VM it will generate a #VC exception, which will then use the CPUID page to satisfy the instruction without exiting to the hypervisor, at least.

I haven't looked closely at the crate, is there any way to set/save some kind of initialization state that can be referenced?

We do something kind of like that for the rdrand instruction:

#[derive(Copy, Clone, Debug)]
/// Used to obtain random numbers using x86_64's RDRAND opcode
pub struct RdRand(());
impl RdRand {
/// Creates Some(RdRand) if RDRAND is supported, None otherwise
#[inline]
pub fn new() -> Option<Self> {
// RDRAND support indicated by CPUID page 01h, ecx bit 30
// https://en.wikipedia.org/wiki/RdRand#Overview
let cpuid = unsafe { core::arch::x86_64::__cpuid(0x1) };
if cpuid.ecx & (1 << 30) != 0 {
Some(RdRand(()))
} else {
None
}
}

For rdrand we cache that the processor is able to execute it by creating an instance of a type that can only be created when the processor supports rdrand, if the user keeps that instance around they don't need to check again. We could do the same for invlpgb and tlbsync.

@Freax13
Copy link
Member Author

Freax13 commented Feb 12, 2023

Implementing the loop for this turns out to be less trivial than I initially thought. Two things I'm not completely certain about:

  1. using invlpgb with a count of zero still flushes one page. Can the max count reported in the cpuid leaf also be zero? I tried looking in the PPRs, but most processor don't seem to support the instructions. EPYC Milan supports it and always has a max count of 7 according to the PPR. The PPR for Family 19h Model 11h Revision B1 also has a fixed max count of 7 (I'm not sure which processors that actually applies to, but I think something with the either Zen3+ or Zen4 microarchitecture).
  2. what happens if we flush an address right before the gap in the address space with a count of more than one i.e.0x7fff_ffff_f000 with a count of two? I assume that it only flushes the address in the lower half in the address space and doesn't automatically jump the gap in the address space. Either way we need to be careful.

@Freax13 Freax13 marked this pull request as draft February 13, 2023 09:43
@tlendacky
Copy link

Implementing the loop for this turns out to be less trivial than I initially thought. Two things I'm not completely certain about:

  1. using invlpgb with a count of zero still flushes one page. Can the max count reported in the cpuid leaf also be zero? I tried looking in the PPRs, but most processor don't seem to support the instructions. EPYC Milan supports it and always has a max count of 7 according to the PPR. The PPR for Family 19h Model 11h Revision B1 also has a fixed max count of 7 (I'm not sure which processors that actually applies to, but I think something with the either Zen3+ or Zen4 microarchitecture).

fam19h/model11h is Genoa.

Yes, theoretically, the CPUID could report 0 for the maximum count, in which case you would only be able to do a single page at a time (#GP if the ECX[15:0] > maximum page count supported).

  1. what happens if we flush an address right before the gap in the address space with a count of more than one i.e.0x7fff_ffff_f000 with a count of two? I assume that it only flushes the address in the lower half in the address space and doesn't automatically jump the gap in the address space. Either way we need to be careful.

I believe that the address would not be canonicalized, and would then just be a miss in the TLB since no addresses would match.

src/instructions/tlb.rs Outdated Show resolved Hide resolved
@josephlr
Copy link
Contributor

Also, for the ASID related instructions, it seems like we would want some sort of common Asid abstraction (like what we currently do for Pcid). This would also allow us to support something like INVLPGA in the future.

@Freax13
Copy link
Member Author

Freax13 commented Feb 14, 2023

Also, for the ASID related instructions, it seems like we would want some sort of common Asid abstraction (like what we currently do for Pcid). This would also allow us to support something like INVLPGA in the future.

AFAICT the reason that we have a Pcid type is that Pcid always has to be a 12-bit integer and Rust doesn't provide that. For ASID the limit is not a fixed constant: The maximum supported value is reported CPUID_Fn8000000A_EBX as a 32-bit integer, INVLPGA also takes a 32-bit ASID in ecx and the ASID field in the VMCB is also 32-bit. INVLPGB is the outlier here as it only takes a 16-bit ASID. In pratice a common maximum supported value seems to be 0x8000. There don't seem to be any bounds checks for INVLPGA, but INVLPGB will cause a #GP if the ASID is out of range.

With that said I'm not sure what size the ASID type should be and how we should enforce the limit given by the CPUID function.

Perhaps we should start keeping these limits in global variables so that we can properly enforce them without executing cpuid every time.

@Freax13 Freax13 force-pushed the feature/new-flush branch 3 times, most recently from 8f3a412 to 5e5e328 Compare February 14, 2023 11:08
This allows us to these functions internally without requiring the
nightly `step_trait` feature.
@Freax13 Freax13 marked this pull request as ready for review February 23, 2023 13:09
@Freax13 Freax13 requested a review from josephlr March 8, 2023 11:59
Copy link
Member

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for tackling this! I left some superficial comments on the API. I didn't have time to look through the flush_broadcast implementation, but the rest looks good to me.

src/addr.rs Outdated Show resolved Hide resolved
src/structures/paging/page.rs Outdated Show resolved Hide resolved
src/instructions/tlb.rs Outdated Show resolved Hide resolved
src/instructions/tlb.rs Outdated Show resolved Hide resolved
src/instructions/tlb.rs Show resolved Hide resolved
Freax13 added 5 commits March 14, 2023 12:49
This avoids any confusion between the trait methods and inherent
methods.
This works for all methods except `pages` which changes the type.
@Freax13 Freax13 requested a review from phil-opp March 14, 2023 12:33
@Freax13 Freax13 merged commit 8f0c4c0 into rust-osdev:master Sep 15, 2023
@Freax13 Freax13 mentioned this pull request Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants