Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimise integer pointer mode encoding space usage #338

Open
tariqkurd-repo opened this issue Aug 6, 2024 · 8 comments
Open

minimise integer pointer mode encoding space usage #338

tariqkurd-repo opened this issue Aug 6, 2024 · 8 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@tariqkurd-repo
Copy link
Collaborator

It came up today in discussion that LC and SC as used in integer pointer mode do not need immediate offsets.

It's also true that in capability pointer mode we already reserve the right to re-encode instructions - e.g. C.FLD becoming C.LC

We should focus on minimising the additional encoding space in integer pointer mode, as this must be permenantly allocated in the RISC-V encoding space. This will be increasingly important when we get to the point of officialling encoding CHERI along with all the other extensions which want to allocate load/store encodings. - as the 12-bit offsets are very expensive.

So... what does integer pointer mode really need?

@tariqkurd-repo tariqkurd-repo added help wanted Extra attention is needed question Further information is requested labels Aug 6, 2024
@arichardson
Copy link
Collaborator

Technically we don't need immediates in integer mode since we could just do a modeswitch to get offsets if needed (assuming we have a sensible base capability), but we still need to offset version for purecap mode. Does not allocating it in integer mode really help with reducing opcode allocation?
Moving around the purecap space if a new base ISA extension is added is not an option since it needs to be a stable opcode. I don't believe there are any instructions with large immediates that are not useful in purecap mode that we could reallocate?

@davidchisnall
Copy link

To give a bit of history:

The early versions of CHERI MIPS did not have immediate offsets for the loads and stores. Most of the time, this was purely a code-size / performance pessimisation. This was a big problem came when we wanted to make the stack pointer a capability. Stack spills needed to materialise a register to store the offset, which required a stack spill. Moving the stack pointer and then spilling was possible but added a lot of complexity in the worst cases. Adding register+immediate variants was one of the earliest changes I made to CHERI MIPS (preserving tags in registers was the first, I think this was the second).

In hybrid mode, the stack pointer is not a capability and so it's always possible to spill another register to build the longer sequence.

If the only places that use capability-manipulation instructions in legacy mode are stubs at the boundary around transitions into pure-capability mode, there's a strong argument for minimising the number of instructions that operate on capabilities.

The most likely counterexample would be a WAsm compiler. If you have a 64-bit WAsm (not MS-WAsm) that uses integers as pointers but where those pointers are indexes into a WAsm memory object (linear address space with fixed bounds) then you'll probably be better off using integer-pointer mode with DDC as the current WAsm memory object but a capability for the WAsm stack, which I think is hard in the current spec (did we lose the capability-based loads and stores in integer-pointer mode entirely?). We did some initial prototyping in Morello but not enough to make strong conclusions. If you can codegen the WAsm stack operations as purecap and do a single fast mode-switching-call-through-sentry to get to them, that's probably fine.

@jrtc27
Copy link
Collaborator

jrtc27 commented Aug 6, 2024

Like Alex I don't see what this actually achieves. LC needs to have a full immediate in capability mode, and almost every other extension that might want to use that encoding is likely to be one that makes just as much sense in capability mode so would need to find non-conflicting space anyway. I struggle to see a world where, should CHERI stick within RISC-V, there would be instructions needing large immediates that you could put where LC is today and wouldn't be needed in capability mode. And of course, should CHERI not stick, one could just ignore it anyway and declare extensions incompatible with it.

@PRugg-Cap
Copy link
Contributor

From the meeting, the motivating example was load/store pair, which if encoded as planned removes the argument that we just reuse existing encodings from the width above XLEN. I think the idea would be to have load/store pair in integer mode and load/store cap in cap mode.
However, I do share the concern that if we start only reserving our encodings in capmode, we make it very impractical to ever move to "cap mode by default", and end up requiring mode switches even for pure capability code to make use of the other extensions. The encoding mode currently just changes where the bounds come from, with the exception of C where the tag-clearing address-based instructions don't make sense in cap mode.

@jrtc27
Copy link
Collaborator

jrtc27 commented Aug 6, 2024

Load/store pair of XLEN integers could still be useful in capability mode. Load/store pair of CLEN capabilities even more so. One only needs to look at Morello to see that both are useful.

@tariqkurd-repo
Copy link
Collaborator Author

yes - essentially everything is useful in capability mode certainly - we do want everything to be competitive.

If we have another version of LC and SC which didn't take an immediate which were usable in integer mode, and then we'd have 2 versions in capability mode. LC/SC-no-imm in both modes and normal LC/SC with an immediate in capability mode only.

There's going to be a lot of tension when it comes to allocating encoding space. Having duplicate encodings would be an undesirable outcome but it's hard to imagine that we'll get a 12-bit immediate for these in integer mode when it was clear in te meeting that the immediate isn't actually needed.

This isn't a suggestion to change the spec now - but to think about what we really need in integer mode - so when the encoding struggle comes we are prepared.

@andresag01
Copy link
Collaborator

Do we have any data quantifying how often the LC/SC instructions are used and how often the immediate is used in integer mode? It would better to consider our options with a bit more context/info.

@PRugg-Cap
Copy link
Contributor

@andresag01 We can try to gather something, but just to reiterate: even if we were to determine that the LC/SC encodings are completely useless in integer mode, we need to also assert that whatever wants to use those encodings in integer mode is not useful in capability mode. In this case that's load/store pair, which we have strong reason to believe is useful in capability mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants