Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace command-line-switch model configuration with RISCV-Config #175

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

billmcspadden-riscv
Copy link
Collaborator

This Pull Request will probably not be ratified on the first go-round,
but we need to start discussing it because it introduces some new
functionality.

1.  I am changing the Sail model configuration method from
    command line arguments to using the RISCV Config YAML
    files.   This gives us a consistent method for configuring
    the ACTs (Arch Compatability Tests) along with the Sail
    model.  THE COMMAND LINE SWITCHES THAT HAVE TO DO WITH MODEL
    CONFIGURATION HAVE BEEN REMOVED.  Comments?

2.  I am adding an API such that the Sail portion of the model
    and the C portion of the model can query the settings in the
    YAML files. The API functions begin with rv_cfg_c_ (for the
    C implementation) and rv_cfg_s_ (for the Sail implementation).
    Note: the Sail implementation is just a shim to the C API.

    Not all desired functions are implemented.  They should be
    easy to add.

3.  I have perturbed the order in which things get initialized.
    Command line switches need to be processed first in order to get
    pointers to the RISCV-Config YAML files. Then I process the YAML 
    files and set the rv_* variables. Most of this work is done
    in riscv_sim.c.

3.  C Coding style:  I use Whitesmith for the new files that I've 
    added.  Comments?

4.  The API uses the libfyaml and pcre libraries to parse the
    parse the YAML files (libyaml) and to provide regular
    expression functionality (pcre). This leads to questions
    about using 2 new libraries in the build process and the
    risks involved.

5.  So far,  I have only added support for the existing command
    line switches.   All new code should utilize the API.  I
    have been directing the Vector Sail team to do so.

6.  I have begun work on configuring the CSR implementations in the
    model,  but I have not completed the work.

7.  I've been doing work at the same time on the cookbook. I have
    included it in this pull request because this is the area
    where I've been working on my simulation examples.  If you
    look at cookbook/functional_code_examples/platform_configuration/,
    you'll see a working example of a bit of assembly language code
    along with the YAML files for configuring the model.

    THE TEXT OF THE COOKBOOK IS NOT COMPLETE AND SHOULD NOT BE REVIEWED.
    Perhaps it should be removed from the pull-request.

8.  Concern: Spike is not configured with RISCV-Config YAML files.
    Also,  I have done no testing with Spike.

9.  Concern:  I've done no testing with RVFI.

@scottj97
Copy link
Contributor

  1. Git best practices. You seem to be using Git as a work log instead of as a recipe. For example, commits named "intermediate checkin" and "getting closer..." are not constructive. See Work Log vs Recipe, Understanding the Git Workflow, and Writing Reviewable Code for more.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Aug 16, 2022 via email

@billmcspadden-riscv
Copy link
Collaborator Author

billmcspadden-riscv commented Aug 16, 2022 via email

@arichardson
Copy link
Collaborator

Hi Scott. Thanks for the feedback. I agree with your points about frivolous commit comments. How would you recommend that I proceed? Should I withdraw the pull request, rebase my code, and then generate another PR? Bill Mc.

Force pushing to this branch after cleaning up the history should fix that, there should be no need to create a new PR.


//This macro must be defined before including pcre2.h. For a program that uses
//only one code unit width, it makes it possible to use generic function names
//such as pcre2_compile().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pcre2 and libfyaml are not used in this file?

//This macro must be defined before including pcre2.h. For a program that uses
//only one code unit width, it makes it possible to use generic function names
//such as pcre2_compile().
#define PCRE2_CODE_UNIT_WIDTH 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PCRE2 is an implementation detail so this include should move to the source file.


#include "sail.h"
#include <libfyaml.h>
#include <sys/stat.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary?

#pragma once

#include "sail.h"
#include <libfyaml.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libfyaml is an implementation detail so this include should move to the source file.


// ============================================================================
int
rv_cfg_c_ext_enable(char * isa_str, char * ext_pattern)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function only receives string constants, so there should be no need for pcre2? strstr() should be sufficient?

@github-actions
Copy link

github-actions bot commented Aug 16, 2022

Unit Test Results

712 tests  ±0       0 ✔️  - 712   0s ⏱️ ±0s
    6 suites ±0       0 💤 ±    0 
    1 files   ±0   712 +712 

For more details on these failures, see this check.

Results for commit ae4cea4. ± Comparison against base commit 72b2516.

♻️ This comment has been updated with latest results.

1)  Initial checkin of cookbook with numerous code examples.
2)  Integration of libfyaml and a regex library for use in
    reading RISCV-Config YAML configuration files (for
    configuration of the Golden Model).
@pawks
Copy link

pawks commented Sep 22, 2022

It should also be documented explicitly that the "validated" yaml generated by riscv-config should be supplied as input to the simulator in the command line. To check whether the yaml is a "validated" yaml, one can look for the "fields" node under any csr. This node is added after validation. Maybe there is a need to add an additional field at the root level of the yaml schema to indicate that it is a validated yaml or we should check for the existence of a comment in the yaml? Thoughts @neelgala?
This is how the yaml looks before and after validation.

@jrtc27
Copy link
Collaborator

jrtc27 commented Mar 15, 2023

Don’t create unnecessary merge commits, rebase your branch, otherwise the history becomes a mess

@jrtc27
Copy link
Collaborator

jrtc27 commented May 9, 2023

Merge branch 'riscv:master' into master

This is making even more of a mess of the history; please don't

@Timmmm
Copy link
Collaborator

Timmmm commented Nov 15, 2023

I don't really like the approach of letting Sail code just blindly read values from a YAML file by path. It isn't type checked and it means we're going to end up with YAML path strings scattered all over the codebase. Not a very robust approach in my experience.

I would suggest taking the riscv-config YAML schema and using it to generate a proper Sail type that represents the configuration of the model. The schema contains default values so I think we will need to use it anyway.

This probably isn't as simple as I've made it sound for a few reasons:

  1. They've gone extreme with the schema and every CSR has a different type. That would create an insane amount of Sail and C code so we probably want to process it to a more generic types.
  2. Sail doesn't support dictionaries.
  3. The actual types we want in Sail are not exactly what the YAML gives. For example the spec versions are strings rather than enums or integers.

To get the actual config value into the Sail model we can autogenerate code that calls platform callbacks similar to what you have done here manually. So the final solutions would look something like this:

# In Makefile
./generate_config --schema_isa schema_isa.yaml --output riscv_config.sail
// riscv_config.sail

struct RiscvConfig = {
   vendor: string,
   device: string,
   ISA: string,
   User_Spec_Version: string,
   Privilege_Spec_Version: string,
   supported_xlen: list(nat),
   ...
}

register config : RiscvConfig;

function init_config() = {
   config = RiscvConfig {
      vendor: rv_cfg_string("/vendor"),
      ...
}

What do you think? This is fairly high priority for us so I may have a go at prototyping it (no promises though!).

@Timmmm
Copy link
Collaborator

Timmmm commented Nov 15, 2023

Also we should definitely parse the ISA string properly rather than hacking it with a regex. Not adding another dependency would be nice too.

@Alasdair
Copy link
Collaborator

I was also thinking that the right way to go is to essentially generate Sail from the config yaml, rather than consume the yaml directly. There are a few ergonomic problems with that still, so I think we might need a few additional language features to support that nicely.

The schema is a bit more problematic. We need the schema in Sail as it defines the envelope of permitted architectural state, but its current form is an ad-hoc python specific yaml schema library which we can't easily use. When I investigated possible existing schema options none seemed like a good fit:

  • JSON-schema is widely supported but it really doesn't support the types we want, there's no easy way to have validated fixed-width bitvectors, and integers won't really work because it doesn't support arbitrary precision. Everything would end up being stringly-typed.
  • ASN.1 is being used elsewhere in RISC-V, and despite it being somewhat old and crufty it does have the ability to represent everything you'd want in precisely the form you want (you just have to look past all the ASN.1 features like support for EBCDIC strings and other relics of the past). The open-source support for ASN.1 seems overall poor to non-existent though, as it seems like it's basically been an industry thing with most of the good tooling as expensive closed source software for most of it's existence.

So yeah, I'm not sure what we can do on the schema side of things.

@billmcspadden-riscv
Copy link
Collaborator Author

billmcspadden-riscv commented Nov 15, 2023 via email

@billmcspadden-riscv
Copy link
Collaborator Author

billmcspadden-riscv commented Nov 15, 2023 via email

@billmcspadden-riscv
Copy link
Collaborator Author

billmcspadden-riscv commented Nov 15, 2023 via email

@Alasdair
Copy link
Collaborator

I don't think it's important that the schema be sucked into the
Sail model. It can be assumed that a user's config YAML file is legal
based on the tooling of RISCV-Config. There is a python script that
checks the legality of a user's config file.

In the narrow case where we just generate an emulator this is true, but for some of the other use cases we have like theorem proving/verification and symbolic execution we need to work with the model w.r.t. an arbitrary (but still legal!) configuration, so we need the Sail to have a notion of what an arbitrary legal configuration is.

@billmcspadden-riscv
Copy link
Collaborator Author

billmcspadden-riscv commented Nov 15, 2023 via email

@Alasdair
Copy link
Collaborator

There was talk of writing an ISA string parser in C for portability, was there any progress on this? Otherwise it might be a fun weekend project I could do

@billmcspadden-riscv
Copy link
Collaborator Author

billmcspadden-riscv commented Nov 15, 2023 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Nov 15, 2023

@billmcspadden-riscv init_config() would read the user's YAML file via FFI callbacks (rv_cfg_string() in the example).

Only the YAML schema file would be used to generate Sail code. The user's specific isa.yaml would be read at runtime.

@Alasdair can you give an example of the way in which the Sail model needs to know that the config is legal? Do you just mean on a type level (i.e. the spec version can't be "brocolli"), or in a more constrained way ("if User or Supervisor mode are implemented than foo must be read-only" or whatever).

I wonder if it is possible to parse the ISA string in Sail.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Nov 15, 2023 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Nov 15, 2023

I wonder if it is possible to parse the ISA string in Sail.

Had a quick go at this. Sail's not really the right language for this but it's not too bad. It does allocate a new string for every string_drop() in the C code, so this is horribly inefficient (N^2) but that doesn't really matter for a string that's like 100 characters at most.

$include <prelude.sail>
$include <string.sail>
$include <mapping.sail>

overload operator ^ = {concat_str}

enum Arch = { RV32, RV64 }

struct ISA = {
  I : bool,
  M : bool,
  A : bool,
  F : bool,
  D : bool,
  Zicsr : bool,
  Zifencei : bool,
  Zicond : bool,
}

val zs_ext_len : string -> nat
function zs_ext_len(s) = {
  match string_take(s, 1) {
    "" => 0,
    "_" => 0,
    _ => 1 + zs_ext_len(string_drop(s, 1)),
  }
}

val accumulate_isa : (string, ISA) -> ISA
function accumulate_isa(isa, a) = {
  match string_take(isa, 1) {
    "" => a,
    "_" => accumulate_isa(string_drop(isa, 1), a),
    char => {
      let len : nat = match char {
        "Z" => zs_ext_len(isa),
        "S" => zs_ext_len(isa),
        _ => 1,
      };
      let ext = string_take(isa, len);
      let a : ISA = match ext {
        "G" => {a with I = true, M = true, A = true, F = true, D = true, Zicsr = true, Zifencei = true},
        "I" => {a with I = true},
        "M" => {a with M = true},
        "A" => {a with A = true},
        "F" => {a with F = true},
        "D" => {a with D = true},
        "Zicsr" => {a with Zicsr = true},
        "Zifencei" => {a with Zifencei = true},
        "Zicond" => {a with Zicond = true},
        _ => {
          print_endline("Warning: unrecognised extension: " ^ ext);
          a
        },
      };
      accumulate_isa(string_drop(isa, len), a)
    },
  }
}

function parse_isa(isa : string) -> (Arch, ISA) = {
  let arch : Arch = match string_take(isa, 4) {
    "RV32" => RV32,
    "RV64" => RV64,
    other => {
      assert(false, "Unrecognised arch: " ^ other);
      undefined
    }
  };

  let exts = accumulate_isa(string_drop(isa, 4), struct {
    I = false,
    M = false,
    A = false,
    F = false,
    D = false,
    Zicsr = false,
    Zifencei = false,
    Zicond = false,
  });
  (arch, exts)
}

function bool_str(b : bool) -> string = (if b then "true" else "false")

function main() -> unit = {
  let (arch, isa) = parse_isa("RV32IMZicond_Zifencei_Sfoo");
  print_endline("I: " ^ bool_str(isa.I));
  print_endline("M: " ^ bool_str(isa.M));
  print_endline("A: " ^ bool_str(isa.A));
  print_endline("F: " ^ bool_str(isa.F));
  print_endline("D: " ^ bool_str(isa.D));
  print_endline("Zicsr: " ^ bool_str(isa.Zicsr));
  print_endline("Zifencei: " ^ bool_str(isa.Zifencei));
  print_endline("Zicond: " ^ bool_str(isa.Zicond));
}

Output:

Warning: unrecognised extension: Sfoo
I: true
M: true
A: false
F: false
D: false
Zicsr: false
Zifencei: true
Zicond: true

@allenjbaum
Copy link
Collaborator

allenjbaum commented Nov 15, 2023 via email

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 15, 2023

Toolchains take -march as a lowercase string, so that should definitely be supported. Probably it should be parsed case-insensitively so people can use uppercase strings, which seem more common in the hardware world (and written prose).

@Alasdair
Copy link
Collaborator

I wrote a thing, see:

https://github.com/Alasdair/rv-isa-string-parse/blob/main/src/rv_isa_string_parser.h and https://github.com/Alasdair/rv-isa-string-parse/blob/main/src/rv_isa_string_parser.c

There's also a little program in that repository that builds a small executable you can play with on the command line:

% ./rv-isa-parse RV64IM5A_CjZtso3p2_Habc_Xfoo500
Found 8 extensions
Extension I major=2 minor=0
Extension M major=5 minor=0
Extension A major=2 minor=0
Extension C major=2 minor=0
Extension j major=2 minor=0
Extension Ztso major=3 minor=2
Extension Habc major=2 minor=0
Extension Xfoo major=500 minor=0

it should chunk the ISA string into it's various extensions, while handling version numbers and case-sensitivity correctly. Some validation is performed, but not everything. The ISA manual uses 'should' and 'must' ambiguously for some parts of the ISA string parsing, so it's unclear how strictly order should really be enforced between the components. The ambiguity around the 'p' extension and 'p' in version numbers is also handled as per the manual.

I looked at the spike source, and it's quite C++. It also seems to only support extensions spike knows about, whereas the above will happily parse anything custom you want like Habc or Xfoo500 above.

Handling the additional logic for extensions that imply other extensions is currently left as an exercise for the reader 😛

@Timmmm
Copy link
Collaborator

Timmmm commented Nov 16, 2023

Lots of good comments.

There are currently more than 100 of them, and they're mostly not a single character.

It's O(N^2) in the length of the ISA string, not the length of possible ISA strings. If the O(N^2) performance did become a problem it could be fixed in Sail e.g. by reference counting strings.

Probably it should be parsed case-insensitively so people can use uppercase strings

Yeah I omitted case insensitivity since it was just a PoC and is trivial to handle. The spec does say that it is case insensitive, however as you said toolchains require lowercase, and riscv-config requires uppercase! What a mess. Generally a mistake to unnecessarily allow case insensitivity IMO. riscv-config also seems to require the string to be in their canonical format (specific order, no extra underscores) which isn't really described in the spec.

I wrote a thing

This is great, but it means you need to also write it in OCaml and add FFI to get the information into Sail. That was mainly what I was trying to avoid by writing the parser in Sail.

Some validation is performed, but not everything.

Does validation of the format itself really need to be performed at all? I think I would leave that to riscv-config. I think the Sail model should be concerned with semantic validation (e.g. mutually exclusive extensions).

The ISA manual uses 'should' and 'must' ambiguously

Yeah very annoying - see riscv/riscv-isa-manual#1144

Anyway I think I may have derailed the discussion a bit! The main conclusion is that we should parse the ISA string in some way other than regex.

I think the core of the issue which is how do we represent the config in Sail, particularly the many many CSRs, given that the schema gives a different type for each one.

@Timmmm
Copy link
Collaborator

Timmmm commented Nov 16, 2023

I made a discussion to continue to discuss the ISA string parsing if anyone wants to: #347

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants