Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pointer Masking ACT #15

Open
13 tasks
jjscheel opened this issue Mar 16, 2023 · 94 comments
Open
13 tasks

Pointer Masking ACT #15

jjscheel opened this issue Mar 16, 2023 · 94 comments
Assignees

Comments

@jjscheel
Copy link
Contributor

jjscheel commented Mar 16, 2023

Technical Group

Pointer Masking TG

ratification-pkg

Pointer Masking

Technical Liaison

Martin Maas, Adam Zabrocki

Task Category

Arch Tests

Task Sub Category

  • gcc
  • binutils
  • gdb
  • intrinsics
  • Java
  • KVM
  • ld
  • llvm
  • Linux kernel
  • QEMU
  • Spike

Ratification Target

3Q2023

Statement of Work (SOW)

SOW: link

SOW Signoffs: (delete those not needed)

  • Task group liaison sign-off date:
  • Development partner sign-off date:
  • ACT SIG sign-off date (if ACT work):

Waiver

  • Freeze
  • Ratification

Pull Request Details

  • ACT Tests and coverpointes: PR #484 - open
@jjscheel jjscheel self-assigned this Mar 16, 2023
@jjscheel jjscheel moved this from As-planned to Late in RISC-V DevPartner Work Mar 16, 2023
@jjscheel jjscheel moved this from Late to At-risk in RISC-V DevPartner Work Mar 16, 2023
@jjscheel
Copy link
Contributor Author

This work is not yet staffed.

@jjscheel
Copy link
Contributor Author

@mustafa7755, @Muhammad-Saadi-Abbasi, it sounds like you guys will be working on this. I've assigned it to both of you not knowing if you're sharing or working on separate pieces. Feel free to unassign yourselves as-appropriate based on assignments.

@jjscheel
Copy link
Contributor Author

If possible, I'd like an update in the April 25th meeting on when you think you'll have the following dates:

  • Development of code complete (ready to test)
  • Testing complete (ready to submit first PR)
  • Last PR accepted (done)

Comments in the issue with these dates would be great.

@jjscheel
Copy link
Contributor Author

BTW, I should note for this issue that it's marked "At Risk" due to the fact that we had no one assigned.

Given that we have assignees, I'd like their comfort with the SOW and have a basic plan before we move back to "As Planned" status. My hope is that we'll be able to do this in the April 25th meeting at the latest.

@jjscheel jjscheel assigned martinmaas and unassigned jjscheel Apr 12, 2023
@jjscheel
Copy link
Contributor Author

@Muhammad-Saadi-Abbasi, @mustafa7755, please review the SOW and note that there have been updates to the content based on the feedback from Architecture Review.

This is a good reminder that the contents of the specification may be in flux until we get final AR approval. So, please be advised.

@UmerShahidengr
Copy link

UmerShahidengr commented Apr 25, 2023

Update: Apr 25th, 2023 => The planning phase of this project has been completed, this is the test plan for the viewers to review: Test plan Pointer Masking Ext
Goal for next week: Develop the skeleton code (Basic R/W access to the Pointer masking CSR regs) according to the ACT template, and run it on Spike via riscof.

@UmerShahidengr UmerShahidengr moved this from At-risk to As-planned in RISC-V DevPartner Work Apr 25, 2023
@UmerShahidengr
Copy link

@martinmaas, @Adam-pi3 Sir, Please find some time to review the test plan. You are the domain experts, so your comments would be valuable.

@martinmaas
Copy link

martinmaas commented May 5, 2023

Apologies for the delayed response (I was on vacation and it took me a while to catch up on everything after coming back). Thank you so much for working on this!

My main feedback is that the latest version of the pointer masking spec has removed instruction pointer masking, which obviates the need for tests related to instructions and to the *ipmen fields.

https://github.com/riscv/riscv-j-extension/blob/master/zjpm-spec.pdf

Some additional feedback:

  • We will still need to test interrupts/exceptions and how pointer masking behaves when changing between different privilege levels.
  • All cases need to be tested for all privilege modes and address translation modes (Sv39, Sv48, Sv57 and Bare) that are supported.
  • There is an additional corner case where a misaligned address wraps around (e.g., reading 64 bit from 0x...fffffa). We should cover this in the tests.
  • We should ensure that we cover all instructions subject to pointer masking. In addition to the ones on the second page, this now includes CMOs as well.
  • We should test that the behavior for MPRV=1 is correct.
  • Pointer masking is not currently defined for RV32, so no testing is needed for that case.

Please don't hesitate to let @Adam-pi3 and me know if there are any additional questions.

@UmerShahidengr
Copy link

Update May 7th, 2023 ⇾ Not much update on this task. Possibly it will get deferred to Q3, 2023 because of upcoming new work (debug native triggers)

@jjscheel
Copy link
Contributor Author

jjscheel commented May 8, 2023

@martinmaas, thanks for the update. Would you kindly ensure that your additional comments about ACT get into the referenced SOW document (link)?

@jjscheel
Copy link
Contributor Author

jjscheel commented May 8, 2023

Thanks for the update, @UmerShahidengr. I recognize we're in a holding pattern here.

I'm going to leave on the Agenda for discussion purpsoses at this time.

@martinmaas, @Adam-pi3, you folks should be aware that given the challenges we've had with Code Size Reduction, I've got 1 resource to spend on either Pointer Masking and/or Debug. So, barring some explicit priority guidance, they will be started first-come, first serve. I'll be in touch with what and how we decide.

@jjscheel
Copy link
Contributor Author

jjscheel commented May 8, 2023

Correction, the challenges with Pointer Masking ACT have been in Zfinx, Zfh, Zdinx, and Zhinx.

@jjscheel
Copy link
Contributor Author

The discussion this week in the Pointer Masking TG was that @martinmaas would request the Architecture Review Committee to prioritize closure of the items preventing SAIL and ACT work in hopes of keeping this work moving forward.

@martinmaas, please provide an update when available.

@UmerShahidengr
Copy link

Update May 23rd, 2023 => Not much update on this task yet.

@jjscheel jjscheel moved this from As-planned to Blocked in RISC-V DevPartner Work May 23, 2023
@jjscheel
Copy link
Contributor Author

@martinmaas, any update from AR?

Marking as Blocked until we get AR resolution of issues.

@jjscheel
Copy link
Contributor Author

jjscheel commented Aug 6, 2024

Thanks, @UmerShahidengr. That's exciting.

@martinmaas, @Adam-pi3, can you please take a look at the tests to ensure they map to the proposed coverage from Adam? Please post your confirmation of completeness of the tests and/or raise any issues here.

Once we get the coverage PR, we'll need to 2 more things to proceed to Ratification:

  1. Review of the tests in the coverage reports to ensure it matches Adam's outline. This completes the Freeze waiver.
  2. Review of the information reported to confirm we are passing all tests. This completes the Ratification requirement for testing.

@Adam-pi3
Copy link

Adam-pi3 commented Aug 9, 2024

I schimmed through the tests and I've been in sync with @UmerShahidengr regarding the functional problem which @UmerShahidengr already fixed. Nevertheless, per my understanding as of now the test covers:
1.i.
2.i
3

there is missing for 1.ii/iii/iv, 2.ii/iii/iv and 4. Did I miss them?
Additionally, for the test which we have, I believe we should verify address with and without tag value. For now we only generate addresses without tag value. Example:

  • with PMM disable we test
    -> 0x0000000012345678 (we assume it points to the valid memory) - it should succeed
    -> 0xAA00000012345678 - it should fail and generate exception which we gracefully handle
  • with PMM enabled we test:
    -> 0x0000000012345678 (we assume it points to the valid memory) - it should succeed
    -> 0xAA00000012345678 - it should succeed too and point to the same address and value as previous load.

@jjscheel
Copy link
Contributor Author

@UmerShahidengr, can you respond to @Adam-pi3 review comments? I'm particularly interested in whether the mentioned gaps exist in the current ACT tests? Are they included in the cover reports?

@UmerShahidengr
Copy link

@Adam-pi3 There is the inline reply to each coverpoint required:

Standard LOAD / STORE Execution and Validation of Expected Results

  • Normal Execution

    • With "PM enabled" in all "RISC-V modes" [Should cover PM function across {all_lsu_instruction, [bare_mode, nvmpu, hash_mpn], [mmode, smode, umode]}]
    • With "PM disabled" in all "RISC-V modes"
    • Comment: Done for all cases.
  • Execution in Exception Context

    • If normal execution generates an exception and continues execution logic in exception context, verify if PM state is NOT modified.
    • With "PM enabled" in all "RISC-V modes"
    • With "PM disabled" in all "RISC-V modes"
    • Comment: It has also been done. It is not checked within the trap handler if PM state is getting modified or not, but after exiting the trap, it is checked in most of the tests if the state persists or not.
  • Execution in Interrupt Context

    • If normal execution is interrupted and continues execution logic in interrupt context, verify if PM state is NOT modified.
    • With "PM enabled" in all "RISC-V modes"
    • With "PM disabled" in all "RISC-V modes"
    • Comment: No test has been written in interrupt context yet, as riscv-arch-tests trap handler has not been tested for any interrupt, and Sail has not been tested for basic interrupts yet. So this will be on hold until basic interrupt tests are developed.
  • Execution in NMI (with Disabled IRQs)

    • If normal execution is interrupted by NMI with IRQ disabled and continues execution logic in NMI context, verify if PM state is NOT modified.
    • With "PM enabled" in all "NVRISC-V modes"
    • With "PM disabled" in all "NVRISC-V modes"
    • Comment: There is no support in Sail to generate NMI, so this cannot be verified as of now.

Execute LOAD / STORE and Validate Expected Results

  • Aligned Access

    • With "PM enabled" in all "RISC-V modes"
    • With "PM disabled" in all "RISC-V modes"
    • Comment: Done.
  • Non-aligned Access

    • With "PM enabled" in all "RISC-V modes"
    • With "PM disabled" in all "RISC-V modes"
    • Comment: Not done yet, but will be released in the next commit.
  • Edge of the Page

    • Crossing Page Boundary

      • With "PM enabled" in all "RISC-V modes"
      • With "PM disabled" in all "RISC-V modes"
      • Comment: Done.
    • Not Crossing Page Boundary

      • With "PM enabled" in all "RISC-V modes"
      • With "PM disabled" in all "RISC-V modes"
      • Comment: Done.

Initialization Values

  • CSR PM Related Registers
    • Comment: Done.

Non-FNL SW Test Plan - Context Switch

  • Validate if all PM related CSRs are preserved during context switch when:
    • All PM related CSRs are enabled in all "RISC-V modes" at the same time.
    • All PM related CSRs are enabled in just one specific "RISC-V mode" at the time.
    • Out of Scope
    • Comment: Context switching is not supported in arch-tests, so this test is out of scope based on the current state of Sail/ACTs.

@martinmaas
Copy link

Thanks @UmerShahidengr! I'll let @Adam-pi3 chime in as well, but it makes sense to me that some of the features cannot be supported in the Sail model.

However, I had the same question as Adam:

Additionally, for the test which we have, I believe we should verify address with and without tag value

The part that was unclear to me is where it checks that the address that is being accessed after the masking operation is the correct one. For example, how do we check whether the implementation masked 7 bits or 16 bits? Or is the idea that we will trap on every access and log the address, thus making the masked address part of the signature?

It is not checked within the trap handler if PM state is getting modified or not, but after exiting the trap, it is checked in most of the tests if the state persists or not.

I probably overlooked something really simple, but wasn't able to find where this check is happening. Can you please point me to the logic?

@Adam-pi3
Copy link

Look OK (especially if something is not supported). However, some questions which @martinmaas also pointed out are still open.

@jjscheel
Copy link
Contributor Author

@UmerShahidengr, I believe you have answers for some of @martinmaas questions. Can you kindly post them for all to see?

@jjscheel
Copy link
Contributor Author

Email from Umer:

  1. What's left to complete for tests? When will we have a PR for tests?

I have addressed all comments on the SoW, and in the latest commit at ACT PR, I have completed the following tests:
All sv48 tests for M/S/U mode with 4 different variants i.e, when mask bits are zero and bit[47]=0 (no trap case for PMM disabled/enabled), mask bits are non-zero and bit[47]=1 (no trap when PMM enabled but trap when disabled due to invalid Virtual Address), when mask bits are zero and bit[47]=1 (no trap when PMM enabled but trap when disabled due to invalid Virtual Address), when mask bits are non-zero and bit[47]=1 (no trap when PMM enabled, may or may not trap for PMM disabled based on if masked bit are all ones or not).
For SV57 tests, we came to know that those are not possible for as sail model does not support sv57 (and spike does not support pmm), satp is not setting its MODE bits to sv57. Although we have integrated the support of pmm in Sail model, and its implementation has been verified via debugging sail code, but its ACTs cant be written because sail is not setting up PTEs for sv57. I added the tests for sv57 in my PR but now I have removed it in my latest commit, since it is not passing the coverpoints and satp is not setting up properly.
For Interrupts and NMIs, ACTs are not possible because of the same constraints on Sail.
In the limited support, these are the only tests which could have been written, and it should be enough proof of concept for ratification

  1. Are the coverpoints complete or do we need to extend coverage based on Adam's suggestions? If so, when will that be done? When will we have a PR?

The coverpoints are complete but we have not release those coverpoints yet. The coverpoints require effective address and translated address for comparison in order to check the behavior. Both of these addresses are not provided in the Sail log, coverage tool (RISC-V ISAC) parses the Sail log to check the coverpoints. In order to tackle this issue, RISC-V ISAC tool has been updated, a new architecture state is defined as part of each instruction in the tool which is holding the effective address value and translated address value. This update has been working on our side, but for public release, we need to pass it from an internal review (a company's open source policy), we are internally reviewing those changes, it will be released as PR to RISC-V ISAC. Once RISC-V ISAC PR is added, we will add the coverpoint definitions which will be supported by the tool.

  1. What is the state of our current test execution? Have we run them successfully on any simulator?

Currently all tests are running on riscof, and showing the correct behaviour on Sail model, its signatures are not matching with Spike as spike does not support pmm. but we have verified every test's behaviour by looking at the Sail log, all coverpoints are hitting. Once the coverpoints will be released (within one week), I will let you all know to review the tests against the coverpoints, and I will release the coverage reports.

@jjscheel
Copy link
Contributor Author

Thanks, @UmerShahidengr for the update. Please post the PR for all at your earlierst convenience so that @Adam-pi3 and @martinmaas can review. I'll prepare the vote.

@martinmaas
Copy link

Fantastic – thank you so much for the detailed update! One quick comment on item (3): Pointer masking is now supported in Spike, so it should be possible to compare the two.

riscv-software-src/riscv-isa-sim#1718

@MuhammadHammad001
Copy link

The tests and covergroups are complete and are available at riscv-non-isa/riscv-arch-test#484
@jjscheel @martinmaas

@jjscheel
Copy link
Contributor Author

@martinmaas, @Adam-pi3, your review of the PR for completeness is what I need for now. Please post comments here.

@martinmaas
Copy link

Thank you for all the work on this, and for sending the PR!

The question I would like to confirm is how the following part works:

In order to tackle this issue, RISC-V ISAC tool has been updated, a new architecture state is defined as part of each instruction in the tool which is holding the effective address value and translated address value. [...] Once RISC-V ISAC PR is added, we will add the coverpoint definitions which will be supported by the tool.

I am not familiar with the coverpoint definition format, but was unable to spot where this was defined.

Assuming that these definitions are present, how are the effective and translated addresses validated? From my understanding, they would need to either be checked explicitly as part of the coverpoint definition, or compared against a run of a different simulator (likely Spike), to confirm they are the same.

Further, have the ACTs been validated against Spike as well, in addition to the Sail model?

Subject to those two points (checking of the effective/translated addresses, and comparison against Spike), the PR looks good to me.

@jjscheel
Copy link
Contributor Author

@UmerShahidengr, @MuhammadHammad001, can we please keep the questions being asked here moving? We need Adam and Martin to approve your work in the next couple days.

@MuhammadHammad001
Copy link

MuhammadHammad001 commented Sep 26, 2024

Hi, @jjscheel , @martinmaas , @Adam-pi3
I have run the tests for S and U mode cases on both the spike and the sail and few M Mode Cases as well, the test run report is available at:
https://drive.google.com/drive/folders/1kYqB4MS75zSBoKVqwf_CPiI4LXNizxWl

The spike is not working as intended for the M Mode Tests in which the virtualization is enabled but those tests are running fine on the Sail. I have created an issue for this available at riscv-software-src/riscv-isa-sim#1817

@jjscheel
Copy link
Contributor Author

@Adam-pi3, @martinmaas, with respect to the freeze waiver for Pointer Masking, are you satisfied that we've created the tests for expected coverage? We may still have debug work and stylistic things to clean up, but we're "feature complete" for testing.

@martinmaas
Copy link

Thank you for the update! This addresses my concern about comparison against Spike (I am not too concerned if there is a minor bug somewhere, as long as we understand what the bug is and that it does not point towards a problem with the spec; which does not seem to be the case here).

My remaining question is about how the masked addresses are being checked so that they are the same between Sail and Spike – there is a good chance this is somewhere in the logs, but I was not able to see it reading through them.

There is probably an argument to be made that the tests could be considered feature complete even without these comparisons (since the tests themselves already check whether or not masking is applied, just not that the exact right bits are being masked) – but independently of whether this is required for the "feature complete" milestone, it would be good to have this eventually, as it will make coverage significantly stronger.

I'd be interested in @Adam-pi3's view on this as well.

@UmerShahidengr
Copy link

The masked address in getting checked within the Sail and Spike log, and the cover points are written in such a way that in some of the tests, test is showing different behaviors when pmm is disabled vs pmm is enabled. For example, in one of the tests, we used virtual address as 0xABAB500000000000, this test will cause a trap (due to illegal virtual address) when pmm is disabled, and when pmm will be enabled, then its 0xABAB bits will be masked with zeros and it will not cause trap. Similarly for virtual addresses like 0x0000900000000000, it will cause trap when pmm is disabled, but when pmm is enabled then last bits will be sign extended and masked address will become 0xFFFF900000000000 and will not trap. All these scenerios are tested in the tests.
Also whenever there will be a trap, mtval will get masked address, which is also getting verified via Sail and Spike logs.

@martinmaas
Copy link

Thanks for the clarification! This is how I read the tests as well, and I agree this tests the most important aspects of pointer masking behavior (i.e., "is it enabled exactly when it should be?").

What the trap-based testing doesn't catch are cases where the implementation masks the wrong bits (e.g., 0x0000900000000000 turns into 0x0010900000000000 despite pointer masking being disabled) – in both cases we'd see a trap, but we don't see that the address was still incorrect. In other words, we get one bit of signal per check, when there are many more incorrect behaviors that cannot be discerned based on that one bit. Having said this, those failure modes don't seem to be like the most likely ones.

I believe there is a good case to be made that the current version should be sufficient in terms of feature-completeness. But if there is a way to test the sequence of actual addresses, I think that would represent a significant increase in coverage.

If everyone else is fine with this, I think it doesn't have to block feature completeness sign-off, though.

@Adam-pi3
Copy link

Sorry for delay in replying but I was trying to carefully analyze the tests. Per my understanding most of the cover points are covered (the one which are possible to implement taking into account some limitation which @UmerShahidengr was and @MuhammadHammad001 were describing). However, I don't really understand how exactly comparison of the expected addresses in case of fault happens. Tests expects to generate a fault (e.g., when PM is disabled or enabled - depends on the test). However, I don't see the function set-up in MTVEC which would handle this fault and do the expected logic. Is the expectation that tests don't require it and some default fault handler writes something to the log which are parsed? That's the only missing point I have.

I agree we shouldn't block feature completeness sign-off because of that. Nevertheless, my question remains and it is also connected to the @martinmaas comments too.

@jjscheel
Copy link
Contributor Author

Thanks, @Adam-pi3, @martinmaas. I will mark the Freeze Waiver complete based on your approvals.

Now, the final question becomes have we completed ACT Testing? This is a Ratification Requirement.

@martinmaas
Copy link

Thank you @jjscheel! I suggest we try to reach a conclusion on the latter question as well, to avoid delaying ratification.

I think the question that @Adam-pi3 mentioned should be the only remaining item for me as well.

@jjscheel
Copy link
Contributor Author

@UmerShahidengr, can you kindly drive closure on the state of ACT Testing ASAP?

@UmerShahidengr
Copy link

UmerShahidengr commented Oct 1, 2024

Sorry for delay in replying but I was trying to carefully analyze the tests. Per my understanding most of the cover points are covered (the one which are possible to implement taking into account some limitation which @UmerShahidengr was and @MuhammadHammad001 were describing). However, I don't really understand how exactly comparison of the expected addresses in case of fault happens. Tests expects to generate a fault (e.g., when PM is disabled or enabled - depends on the test). However, I don't see the function set-up in MTVEC which would handle this fault and do the expected logic. Is the expectation that tests don't require it and some default fault handler writes something to the log which are parsed? That's the only missing point I have.

@Adam-pi3 for the trap handler, we have made no changes in the default trap handler, the default trap handler of riscv-arch-tests (available here) is generic, and dealing with the traps caused by load/store access due to pmm/vm. Thus, there are no need to change the MTVEC register. The default trap handler deposits 4 signature items in the signature file per trap, 2 of them are mepc and mtval value, in case of pmm enabled/disabled, mtval csr will be updated with the effective address, and the same value will be shown in the trap signatures. This value is getting compared in Spike and Sail log

@UmerShahidengr
Copy link

What the trap-based testing doesn't catch are cases where the implementation masks the wrong bits (e.g., 0x0000900000000000 turns into 0x0010900000000000 despite pointer masking being disabled) – in both cases we'd see a trap, but we don't see that the address was still incorrect. In other words, we get one bit of signal per check, when there are many more incorrect behaviors that cannot be discerned based on that one bit. Having said this, those failure modes don't seem to be like the most likely ones.

@martinmaas as explained in the above comment, since mtval is getting the effective address on every trap and also getting updated into trap signature area, thus such cases have already been checked if correct bits are getting masked or not. We have made sure to check the effective address on every memory access through the log file, and also made sure to compare mtval value through signature on every trap.

@jjscheel
Copy link
Contributor Author

jjscheel commented Oct 1, 2024

@martinmaas and @Adam-pi3, does this answer your question?

@UmerShahidengr, so you believe all tests are passing and the coverage is being met? Correct?

@martinmaas
Copy link

Thank you for the clarification, Umer! To confirm my understanding: When there is a trap, the effective address will be written into the signature file, and thus compared against a reference implementation. When there is no trap, we know the address is correct, because the correct value is loaded.

Assuming the above, all sounds good to me and I think the tests meet the required coverage.

One final question (but not required for coverage): Was the checking of the effective address in the log file done manually, or can this be done automatically as part of running the test?

@UmerShahidengr
Copy link

@martinmaas in the developement phase, the effective address was checked from the log file manually, now it is the part of signature file, so one can automatically check the effective address from signature file (in case of a trap).

@UmerShahidengr
Copy link

@Adam-pi3 , @martinmaas is this PR good to go? Should we merge this one?

@UmerShahidengr
Copy link

This PR is awaiting the Sail PR to get merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: As-planned
Development

No branches or pull requests

8 participants