Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISCV bootloader types #1935

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

RISCV bootloader types #1935

wants to merge 6 commits into from

Conversation

pacheco
Copy link
Collaborator

@pacheco pacheco commented Oct 22, 2024

Refactor to support multiple implementations of the bootloader and executor.
It introduces a BootloaderImpl, with types/functions that must be given by the concrete bootloader implementation (large/small).
It also creates a BootloaderInputs type to encapsulate all the indexing logic for manipulating bootloader inputs.

adds a BootloaderImpl trait that should be implemented by the small/large field
variants of the bootloader
@pacheco
Copy link
Collaborator Author

pacheco commented Oct 22, 2024

I'm open for suggestions on the design here.
My idea was to extract the "generic" parts into the trait, so as to avoid code duplication, since the code in these parts is already quite complex/fragile.

//!
//! TODO: perform determinism verification for each instruction independently
//! from execution.

Copy link
Collaborator Author

@pacheco pacheco Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code here didn't change, was just moved to the large_field module.
There's one small change ill point out.

}

impl<F: FieldElement> RegisterMemory<F> {
pub fn for_bootloader(&self) -> HashMap<u32, Vec<F>> {
Copy link
Collaborator Author

@pacheco pacheco Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type here was HashMap<u32, F>. It became Vec<F> because the small_field machine needs two F for a register value.
Ideally, this would be u32 as is the MemoryState, but the large_field machine actually uses the full F for register values during its execution (e.g., for to_signed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be possible to use an array instead of a Vec (the length can be a generic parameter)? Feels better that way.

Copy link
Collaborator Author

@pacheco pacheco Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't use the BootloaderImpl trait because it makes the crate dependencies circular, but maybe just a const parameter might work... ill try

}

pub type MemoryState = HashMap<u32, u32>;
pub type RegisterMemoryState<F> = HashMap<u32, F>;
/// Value of registers is Vec<F> to unify the output for different field sizes
pub type RegisterMemoryState<F> = HashMap<u32, Vec<F>>;
Copy link
Collaborator Author

@pacheco pacheco Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my response above

/// This trait provides all the field specific types and implementations that the bootloader needs.
///
/// For now, this trait is implemented directly by each `FieldElement` type, because the hash functions (i.e., poseidon) are field-specific.
pub(crate) trait BootloaderImpl<F> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main reason for this trait here are the Page and Hash types, which are just arrays of F but with different sizes for the small/large machines.
These are used in the merkle tree and bootloader code, and this is the way I found to make it generic.

@@ -197,7 +197,7 @@ machine Main with min_degree: {}, max_degree: {} {{

{}

let initial_memory: (fe, fe)[] = [
let initial_memory: (int, int)[] = [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be (int, int) because we cant represent u32 values using fe in small field

@pacheco pacheco marked this pull request as ready for review October 22, 2024 13:57
@pacheco pacheco requested review from leonardoalt and lvella October 22, 2024 13:57
@pacheco
Copy link
Collaborator Author

pacheco commented Oct 22, 2024

@lvella

riscv/src/continuations/bootloader.rs Show resolved Hide resolved
riscv/src/continuations/bootloader.rs Show resolved Hide resolved
riscv/src/continuations/bootloader.rs Show resolved Hide resolved
riscv/src/continuations.rs Show resolved Hide resolved
riscv/src/small_field/bootloader.rs Show resolved Hide resolved

fn update_page(page: &mut Self::Page, idx: usize, word: u32) {
let (hi, lo) = split_word(word);
// TODO: check proper endianess here!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RISCV memory is [low, high], so this should fail

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, riscv is little endian, but i was not sure here (we use hi,lo in our interface to the memory machine), so I left the todo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually yea, it's tricky. We use big-endian in the machine, but Rust uses little endian when representing u64 as 2 words in RISCV32, for example. So this is probably correct actually

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever order chosen here, it has nothing to do with RISCV (our implementation always loads 32 bits and then simulates little-endian).

It must, however, match the bootloader implementation. It seems that in gl case it does some preprocessing to fit 1 field element every two words, and it uses little-endian there. Similarly, as poseidon_bb is implemented, it expects one field element per word, so the order which the limbs are padded matter.

@leonardoalt
Copy link
Member

lgtm, but I'll let @lvella take a look too

}

impl<F: FieldElement> RegisterMemory<F> {
pub fn for_bootloader(&self) -> HashMap<u32, Vec<F>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be possible to use an array instead of a Vec (the length can be a generic parameter)? Feels better that way.

}

pub type MemoryState = HashMap<u32, u32>;
pub type RegisterMemoryState<F> = HashMap<u32, F>;
/// Value of registers is Vec<F> to unify the output for different field sizes
pub type RegisterMemoryState<F> = HashMap<u32, Vec<F>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my response above

Comment on lines +176 to +177
match F::known_field() {
Some(KnownField::BabyBearField | KnownField::Mersenne31Field) => small_field::execute_ast(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that the FieldElement trait should have a static method telling if it is a small or large field. This whole KnownField is kinda hackey. I think I used it before I knew of std::any::TypeId.

Suggested change
match F::known_field() {
Some(KnownField::BabyBearField | KnownField::Mersenne31Field) => small_field::execute_ast(
match F::size_category() {
FieldSize::Small => small_field::execute_ast(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already there as known_field().unwrap().field_size(), this PR predates that so I forgot to update here, will change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the problem is the known_field(). Field size should be a property of the FieldElement trait itself. But maybe this is out of scope for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea there are other places that could benefit from that, but we could do it after

Comment on lines +1 to +10
//! A specialized executor for our RISC-V assembly that can speedup witgen and
//! help with making partition decisions.
//!
//! WARNING: the general witness generation/execution code over the polynomial
//! constraints try to ensure the determinism of the instructions. If we bypass
//! much of witness generation using the present module, we lose the
//! non-determinism verification.
//!
//! TODO: perform determinism verification for each instruction independently
//! from execution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this docstring belongs in the toplevel lib.rs file.

Comment on lines +1 to +10
//! A specialized executor for our RISC-V assembly that can speedup witgen and
//! help with making partition decisions.
//!
//! WARNING: the general witness generation/execution code over the polynomial
//! constraints try to ensure the determinism of the instructions. If we bypass
//! much of witness generation using the present module, we lose the
//! non-determinism verification.
//!
//! TODO: perform determinism verification for each instruction independently
//! from execution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this docstring belongs in the toplevel lib.rs file.

Comment on lines +47 to +48
todo!("call rust implememtation of poseidon bb")
// poseidon_bb(&buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or poseidon2.


fn iter_word_as_fe(v: u32) -> impl Iterator<Item = F> {
let (hi, lo) = split_word(v);
// TODO: check proper endianess here!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter, as long as it is consistent with the bootloader implementation.

Which I think could be improved, if the poseidon machine assumes that each memory limb can contain a full field element: it saves some preprocessing to prepare the input on the bootloader, and multiple calls to split machine on the output.

Currently, both poseidon_memory_gl and poseidon_bb assume that a memory position will never contains more than 32-bits, so they need the double of field elements (in the memory machine) to represent each field element.

use powdr_number::{FieldElement, KnownField, LargeInt};
use powdr_riscv_executor::large_field::poseidon_gl::poseidon_gl;

pub fn split_fe<F: FieldElement>(v: &F) -> [F; 2] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one places the low bits first. Like the split machines, that returns (low, high).


fn update_page(page: &mut Self::Page, idx: usize, word: u32) {
let (hi, lo) = split_word(word);
// TODO: check proper endianess here!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever order chosen here, it has nothing to do with RISCV (our implementation always loads 32 bits and then simulates little-endian).

It must, however, match the bootloader implementation. It seems that in gl case it does some preprocessing to fit 1 field element every two words, and it uses little-endian there. Similarly, as poseidon_bb is implemented, it expects one field element per word, so the order which the limbs are padded matter.

// It's only used by this project's benchmarks (`benches` folder), which are
// users of the crate and not part of it...
pub fn default_input_witness<F: FieldElement>(accessed_pages: &[u64]) -> Vec<(String, Vec<F>)> {
match F::known_field() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use std::any::TypeId instead of KnownField here. It is more standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants