Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split rv32i_m/F/fnmadd_b15.S, fnmsub_b15.S, fmadd_b15.S, fmsub_b15.S into multiple smaller tests #461

Merged
merged 11 commits into from
Apr 26, 2024

Conversation

UmerShahidengr
Copy link
Collaborator

Description

I have split rv32i_m/F/fnmadd_b15.S, fnmsub_b15.S, fmadd_b15.S, fmsub_b15.S into multiple smaller tests. Each of these test files originally contained over 35,000 tests, resulting in Sail logs exceeding 1GB and large signature files. I have written the following python script to split these tests into smaller segments while maintaining full coverage. This approach will reduce the size of Sail logs to less than 8KB and signature files to 1KB each, simplifying the debugging process. Also I have updated the ISA strings which will allow these tests to run on rv32e as well.

List Extensions

RV32IF,RV32IFD,RV32EF,RV32EFD

Reference Model Used

  • SAIL
  • Spike
  • Other - < SPECIFY HERE >

Mandatory Checklist:

  • All tests are compliant with the test-format spec present in this repo ?
  • Ran the new tests on RISCOF with SAIL/Spike as reference model successfully ?
  • Ran the new tests on RISCOF in coverage mode
  • Link to Google-Drive folder containing the new coverage reports (See this for more info): < SPECIFY HERE >
  • Link to PR in RISCV-ISAC from which the reports were generated : < SPECIFY HERE >
  • Changelog entry created with a minor patch

Optional Checklist:

  • RISCV-V CTG PR link if tests were generated using it : < SPECIFY HERE >
  • Were the tests hand-written/modified ?
  • Have you run these on any hard DUT model ? Please specify name and provide link if possible in the description
  • If you have modified arch_test.h Please provide a detailed description of the changes in the Description section above.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Apr 26, 2024 via email

@UmerShahidengr
Copy link
Collaborator Author

Hm, I don't think we should be lumping the FMA splitting into the same PR as PMP tests; unless there's a really good reason, those should be a different PR.

Sorry, PMP tests were in a different branch, it got synced while cleaning up my branches. I have removed all pmp related stuff from this PR.

Copy link
Collaborator

@allenjbaum allenjbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reduces the size of fm[n][add/sub] tests by splitting them into individual test cases (using a custom python tool) each of which contain 256 tests (roughly 12KB instructions and 6.4KB data) and updating them to work with RV32E/RV64E architectures. No tests were changed, so the exact same test cases are still present, with the same coveragem but now in nearly 300 different test files instead of 4 monolithic test files

@allenjbaum allenjbaum merged commit a7b7cd0 into riscv-non-isa:main Apr 26, 2024
1 check passed
@allenjbaum
Copy link
Collaborator

allenjbaum commented Apr 26, 2024 via email

@Timmmm
Copy link
Contributor

Timmmm commented May 1, 2024

Thanks for doing this! Unfortunately it looks like there are still some really big tests that have been missed:

  • riscv-test-suite/rv64i_m/Zfinx/src/f[n]m(add|sub)_b15-01.S (24 MB each)
  • riscv-test-suite/rv32i_m/D/src/f[n]m(add|sub).d_b15-01.S (60 MB each)
  • All of the b1 tests. They're not 60 MB, but they're still around 8 MB which is very large.

Also I am not sure a Python script to split up the tests like this is a good long term solution. As soon as someone tries to regenerate them they will have this problem again and your script is unlikely to work at that point. I may still try my approach which is to post-process the normalised cgf YAML file just before generation. Let me know if you are going to fix the above issues though - if so I probably wont bother!

Finally, I highly recommend you use git rebase -i next time to remove unrelated commits next time! (Or squash when merging this.)

@davidharrishmc
Copy link
Contributor

Separately, I had suggested splitting into 10 files, not 300. We just added thousands of tests to riscv-arch-test. There are so many that github won't show the full directory. In some methodologies such as ImperasDV, the simulator has to launch separately on each file, creating an enormous overhead to start up thousands of times.

@Timmmm
Copy link
Contributor

Timmmm commented May 1, 2024

I think 10 files would still be too big. Based on the file sizes 50 would bring them in line with the other files.

@allenjbaum
Copy link
Collaborator

allenjbaum commented May 2, 2024 via email

@UmerShahidengr
Copy link
Collaborator Author

@Timmmm other tests can also be split into multiple files but first we need to discuss what should be the optimal file size vs number of files. It is a good thing that Allen has added this point as the agenda item for next week's meeting, we will update the script and will split the test files after converging to a conclusion in the next meeting

@Timmmm
Copy link
Contributor

Timmmm commented May 3, 2024

Sounds good. Which meeting is it?

@davidharrishmc
Copy link
Contributor

I agree that the size of the b_15 files before the split was the bottleneck on running regressions.

However, even a 3-way split would be enough in my context to break the bottleneck. There's overhead launching each simulation job that has to balance against the parallelism of the jobs. A 10ish way split would be a reasonable sweet spot so that the simulations run fast without greatly increasing the total number of simulations required to regress the riscv-arch-tests.

Moreover, there are 4 fma-type instructions in each of f, d, and Zfh, so the number of files is 12x the number of splits.

A large number of splits also leads to a huge number of files in the riscv-arch-test repository and in the regression logs. That can make directories cumbersome and can lead to real problems getting hidden in a sea of reports.

For example, for fma-type instructions, I counted 102005 cases in _b15 tests. I count 13,835 for _b1, 1684 for b8, 280 for _b5, and so forth. Integer tests have 1-819 cases each. I can see a benefit getting down to below _b1. Unless we also split _b1, why go much lower? And what metric are we optimizing to choose the number of splits?

@Timmmm, what is driving the desire to make the _b15 splits in line with the other test cases? Please help me understand why that is beneficial rather than overkill?

@allenjbaum
Copy link
Collaborator

We discussed how to write a simply python tool that will split a test up into multiple tests, based on parameters passed to it.
It has some assumptions about the structure of a test (which matches what CTG generates: the line numbers of the actual tests, the line numbers of the data area, the number of data items used in each tests case, the number of instructions generated by each test case, and the maximum size you'd want a test to be. The test will be broken up at next SIGBASE macro that follows that maximum size. So, the question remains: what is a reasonable size of a test, or a reasonable number of tests (which can be in a subdirectory, if that helps) - but you don't get to decide both....

@Timmmm
Copy link
Contributor

Timmmm commented May 7, 2024

There's overhead launching each simulation job that has to balance against the parallelism of the jobs.

I definitely agree! Our overhead for each job is only 15 seconds though (see below), which is very small compared to the actual runtime of the tests in our case. I haven't used ImperasDV but I have briefly used their reference model and we use it in a similar way to this diagram but with a custom trace system. I don't remember it having a long startup time - just the normal time of running the RTL simulator which is pretty fast.

A large number of splits also leads to a huge number of files in the riscv-arch-test repository and in the regression logs. That can make directories cumbersome and can lead to real problems getting hidden in a sea of reports.

I don't really understand how this can be a real problem to be honest. Anyone properly verifying a chip will be running tens of thousands of tests. No testing system I've ever used has required you to manually look through a list of pass/fails to find the failures. Maybe I'm misunderstanding you here though?

what is driving the desire to make the _b15 splits in line with the other test cases? Please help me understand why that is beneficial rather than overkill?

There are two reasons:

  1. Total runtime. It's preferable to keep the total runtime of the tests under 10-20 minutes so we can run them in CI. We use test ranking to decide which tests to run in CI but if they take longer than this then they aren't eligible at all. We do have tests that run longer but given 15 second overhead it's nice if they are around ~10 minutes.
  2. Memory use. Since we run thousands of jobs on a shared compute cluster, we reserve each job some memory. Currently 2 GB, but these jobs go over that limit during compilation (although only very slightly for the b01 tests).

I did some measurements and actually the b1 tests are only slightly over that 2GB limit so maybe it's not worth splitting those and we'll just increase our limit slightly. But I definitely wouldn't want anything bigger than that.

Here's some examples:

  • Hello world: 1k instructions, 15s, max RSS 200 MB
  • rv32i_m/D/fmul.d_b9-1 (the largest of the non-problematic tests): 38k instructions, 2m40s, max RSS 230 MB
  • rv32i_m/D/fnmadd.d_b1-01: 253k instructions, 44m20s, max RSS 2.1 GB

@allenjbaum
Copy link
Collaborator

allenjbaum commented May 8, 2024 via email

@Timmmm
Copy link
Contributor

Timmmm commented May 8, 2024

2.1GB is the maximum memory usage when compiling, not the size of the test.

@allenjbaum
Copy link
Collaborator

allenjbaum commented May 9, 2024 via email

@Timmmm
Copy link
Contributor

Timmmm commented May 9, 2024

I think that's fine. I would expect that that is probably a pretty good proxy for memory use during compilation.

@davidharrishmc
Copy link
Contributor

davidharrishmc commented May 9, 2024

@Timmmm there is something funny going on for your memory requirements to scale with the length of the test. Is it possible that the testbench is logging too much history of state in RAM? I agree with Allen that even the biggest tests should add less than 1 MB of memory to model.

I accept your argument that full verification of a large system involves a large number of test files. However, splitting up riscv-arch-test in this way exploded the number of tests in this suite. I may be biased focusing on just riscv-arch-test, but I feel it's gotten out of hand. For example, when I go to

https://github.com/riscv-non-isa/riscv-arch-test/tree/main/riscv-test-suite/rv32i_m/F/src

I get a message

Sorry, we had to truncate this directory to 1,000 files. 334 entries were omitted from the list. Latest commit info may be omitted.

There's also some baggage related to the CORE-V Wally flow. Presently, we list each test set individually in a file, so this involves adding thousand of lines to the file. We print each set when it runs successfully, which had been dozens and becomes thousands in the logs. Debug occasionally involves scanning the logs, and this is more cumbersome when they explode in size. None of this is showstoppers, but it affects our viewpoint.

As a simple heuristic, how about splitting the _b15 tests into 10 files instead of 300, to get their runtime below _b1?

@allenjbaum
Copy link
Collaborator

allenjbaum commented May 10, 2024 via email

@Timmmm
Copy link
Contributor

Timmmm commented May 10, 2024

Is it possible that the testbench is logging too much history of state in RAM?

The 2.1 GB memory is when compiling the test. I.e. when running GCC to turn the assembly into an ELF file. Seems reasonable that that would scale with the size of the test.

Sorry, we had to truncate this directory

Yeah that is annoying to be fair.

@davidharrishmc
Copy link
Contributor

fmadd.d_b15-01.S/ref/ref.elf is 15 MB. Something still seems funny that it would take 2 GB of memory to produce a 15 MB executable. Even the Sail log fmadd.d_b15-01.log is 365 MB.

I'm not disputing the hassle of this file being 10x longer to simulate than all the others, but I'm still postulating that a 10x split of _b15 might be sufficient, even if the tools have weird issues that consume excessive memory.

@allenjbaum
Copy link
Collaborator

allenjbaum commented May 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants