-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to element-addressable memory #1084
Conversation
08ddb24
to
d22d20a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for also updating the docs in memory.rs 🙌.
Left some optional comments.
const.INPUT_NOTE_SECTION_OFFSET=1048576 | ||
const.INPUT_NOTE_SECTION_OFFSET=4194304 | ||
|
||
# The memory address at which the input note data section begins. | ||
const.INPUT_NOTE_DATA_SECTION_OFFSET=1064960 | ||
const.INPUT_NOTE_DATA_SECTION_OFFSET=4259840 | ||
|
||
# The memory address at which the number of input notes is stored. | ||
const.NUM_INPUT_NOTES_PTR=1048576 | ||
const.NUM_INPUT_NOTES_PTR=INPUT_NOTE_SECTION_OFFSET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be perhaps better if we had:
# The memory address at which the number of input notes is stored.
INPUT_NOTE_SECTION_NUM_PTR=4194304
# The memory address at which the first input note begins.
INPUT_NOTE_SECTION_PTR=4194308
Then we don't have to add 4
to the note section offset wherever we use it, making those procedures a bit easier to read. This might apply to more of these constants here.
In general I wonder if having pointers defined in terms of offsets is helpful. I'm wondering if we should have just pointers defined explicitly and offsets stated in comments or something like that. Otherwise we will always have code that interprets an offset as a pointer, and then we might as well have defined it as a pointer in the first place.
Any thoughts on this, @Fumuran?
(None of this has to be addressed in this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is what we should have now, I think "offset" here (in most paces) is actually a pointer: I think it should be INPUT_NOTE_SECTION_PTR
and so on. The only place where we use (or at least should afaik) is offsets inside the structures which should have more than one instance, for example the data of the whole account (since we can load foreign accounts) or the values inside the node data, since we have a lot of nodes.
So, in other words, I think that it is a good idea to change the suffixes those constants from _OFFSET
to _PTR
, because they are essentially pointers.
To have a different pointer for the number of elements seems like a good idea, although I'm not sure whether it will be that practically helpful, let me look through its usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make an issue and investigate whether we use these section pointers in such a way that the first entry should be the length and the remaining entries the elements, or if we mostly use it independently.
miden-lib/src/transaction/memory.rs
Outdated
// | Section | Start address | End address | Start address (words) | End address (words) | | ||
// | ----------------- | :-----------: | :---------: | :--------------------:| :-------------------:| | ||
// | Bookkeeping | 0 | 31 | 0 | 7 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// | Section | Start address | End address | Start address (words) | End address (words) | | |
// | ----------------- | :-----------: | :---------: | :--------------------:| :-------------------:| | |
// | Bookkeeping | 0 | 31 | 0 | 7 | | |
// | Section | Element start address (word) | Element end address (word) | | |
// | ----------------- | :-----------: | :---------: | | |
// | Bookkeeping | 0 (0) | 31 (7) | |
Could we include the word in parentheses instead of separate columns? Or alternatively explain outside the table headers that the elemnt comes first and the corresponding word in parentheses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea. Honestly first I left word addresses just to make sure I didn't confuse anything, but probably it will be more convenient to check addresses in both element and word forms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'm not sure which format will be the best. In the technical literature which I read the unit of measurement in the tables is placed after the column name, separated with comma. Will it be clear for the user? Something like that:
// | Section | Start address, pointer (word pointer) | End address, pointer (word pointer) |
// | ----------------- | :-----------------------------------: | :---------------------------------: |
// | Bookkeeping | 0 (0) | 31 (7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think what you suggested is fine. We don't export this documentation anyway, so it's only for the miden-base devs.
miden-lib/src/transaction/memory.rs
Outdated
// | Block header | 800 | 835 | 200 | 208 | | ||
// | Chain MMR | 1_200 | 1_331? | 300 | 332? | | ||
// | Kernel data | 1_600 | 1_739 | 400 | 434 | | ||
// | Accounts data | 8_192 | 532_479 | 2048 | 133_119 | 64 foreign accounts max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should we add a proper comment table column for "64 foreign accounts max"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Idea! I think it will be generally helpful to have something like Max amount
column for every section
Edit: taking into account your next comment, I think it will be better just to have a comment
column.
miden-lib/src/transaction/memory.rs
Outdated
// | Global inputs | 400 | 423 | 100 | 105 | | ||
// | Block header | 800 | 835 | 200 | 208 | | ||
// | Chain MMR | 1_200 | 1_331? | 300 | 332? | | ||
// | Kernel data | 1_600 | 1_739 | 400 | 434 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// | Kernel data | 1_600 | 1_739 | 400 | 434 | | |
// | Kernel data | 1_600 | 1_744 | 400 | 434 | |
Nit: Is this not 1604+(4*35) = 1744
?
I think it would be great to explain some of these calculations (like this one or the Chain MMR
one) that do not have directly corresponding constants in memory.masm. Could just be in a separate table comment column here saying something like "35 kernel procedures hashes (4 elements each)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End pointer here is the last memory pointer occupied by the corresponding data. So to get the last pointer we need to calculate 1600 + (34 * 4 + 3)
. It is easy to see this pattern on the simple example. Let's say we need to calculate the end pointer in case we have start and end word pointers as 100
and 101
:
┌─┬─┬─┬─┬─┬─┬─┬─┐
|0|1|2|3|4|5|6|7|
└─┴─┴─┴─┴─┴─┴─┴─┘
| 100 | 101 |
In that case it will be 1 full word (4 elements, which will get us to the first element of the last word, index 4) and also 3 elements in the last word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I think my main point was that the kernel procedures start at 1604
rather than 1600
, since the word at 1600..=1603
contains the number of procedures, or not?
You're right though that when writing it as an inclusive range, the last address would indeed be 1 less than what I wrote. Then we'd have 1604+(34*4+3) = 1743
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the Kernel data
here we assume both number of kernel procedures and their hashes, so this brings us back to the question of whether it is worth creating a separate pointer to the length of the data length of the section ahead.
As far as I can see, in most places we already have these constants for lengths, they just aren't shown. For example, we have this constant for kernel procedures.
Although we don't have such constants for input notes, so I think it makes sense to add a new constant for it to make a nullifier obtaining more simple (well, we are back to your initial question about a new constant for input notes :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you. Not a full review yet (I covered like 80% of the non-test code) - but I left some comments inline.
@@ -140,11 +140,13 @@ proc.main.1 | |||
# => [] | |||
end | |||
|
|||
push.0 drop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use random numbers for push
operation - otherwise, there is a chance we'll run into collisions. Let's also make sure to do this for all trace
instructions.
Lastly, let's add comments explaining which it is needed (a single-line explanation will do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left a couple of small comments inline.
The plan for merging this is roughly as follows:
- I'll merge the element-addressable memory PR in
miden-vm
. - Then, I'll release the new version of
miden-vm
. - After this, we can update the
Cargo.toml
file in this PR to use crates.io versions ofmiden-vm
and then we merge it.
miden-lib/asm/note_scripts/SWAP.masm
Outdated
# store the note inputs to memory starting at address 12 | ||
push.12 exec.note::get_assets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment with the state of the stack after this line?
miden-lib/src/transaction/outputs.rs
Outdated
let nonce = elements[ACCT_ID_AND_NONCE_OFFSET as usize + ACCT_NONCE_IDX]; | ||
let vault_root = TryInto::<[Felt; 4]>::try_into( | ||
&elements[ACCT_VAULT_ROOT_OFFSET as usize..ACCT_VAULT_ROOT_OFFSET as usize + WORD_SIZE], | ||
) | ||
.unwrap() | ||
.into(); | ||
let storage_commitment = TryInto::<[Felt; 4]>::try_into( | ||
&elements[ACCT_STORAGE_COMMITMENT_OFFSET as usize | ||
..ACCT_STORAGE_COMMITMENT_OFFSET as usize + WORD_SIZE], | ||
) | ||
.unwrap() | ||
.into(); | ||
let code_commitment = TryInto::<[Felt; 4]>::try_into( | ||
&elements[ACCT_CODE_COMMITMENT_OFFSET as usize | ||
..ACCT_CODE_COMMITMENT_OFFSET as usize + WORD_SIZE], | ||
) | ||
.unwrap() | ||
.into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit more complicated than ideal. Maybe create a helper function - something like get_word()
or parse_word()
, which could look like so:
fn parse_word(data: &[Felt], offset: usize) -> Word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would return an error or Option
if the conversion fails, so that at call sites we can do parse_word(slice[0..4], 0).expect("we should have sliced off exactly 4 bytes")
. This seems better than panicking unconditionally inside the function, because it makes the caller less aware of this issue.
miden-lib/src/transaction/memory.rs
Outdated
@@ -275,41 +279,41 @@ pub const NOTE_MEM_SIZE: MemoryAddress = 512; | |||
// │ NOTE │ SERIAL │ SCRIPT │ INPUTS │ ASSETS │ META │ NOTE │ NUM │ ASSET │ ... │ ASSET │ PADDING │ | |||
// │ ID │ NUM │ ROOT │ HASH │ HASH │ DATA │ ARGS │ ASSETS │ 0 │ │ n │ │ | |||
// ├──────┼────────┼────────┼────────┼────────┼──────┼───────┼────────┼───────┼─────┼───────┼─────────┤ | |||
// 0 1 2 3 4 5 6 7 8 + n | |||
// 0 4 8 12 16 20 24 28 32 + 4n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: previously, the addresses referred to words and so they were centered. Now, they refer to the first element of the word - so, I'd probably shift to be "left-aligned". For example:
// ┌──────┬────────┬────────┬────────┬────────┬──────┬───────┬────────┬───────┬─────┬───────┬─────────┬
// │ NOTE │ SERIAL │ SCRIPT │ INPUTS │ ASSETS │ META │ NOTE │ NUM │ ASSET │ ... │ ASSET │ PADDING │
// │ ID │ NUM │ ROOT │ HASH │ HASH │ DATA │ ARGS │ ASSETS │ 0 │ │ n │ │
// ├──────┼────────┼────────┼────────┼────────┼──────┼───────┼────────┼───────┼─────┼───────┼─────────┤
// 0 4 8 12 16 20 24 28 32 + 4n
# use `push.* drop` instructions before `trace` to make sure that MAST root will be unique | ||
push.3456069754 drop | ||
trace.EPILOGUE_END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that with 0xPolygonMiden/miden-vm#1122 fixed we no longer needed these. You added explanations that this is for MAST root uniqueness, but what is the larger problem here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, we fixed emit
instruction and advice injectors, but for the trace
instruction seems like everything remained the same. I'm not sure I fully understand the case, but for now I created a template-ish issue for the trace
handling to help investigate further: #1094
The first two items from this list are now done. @Fumuran could you address remaining comments and let's merge. @PhilippGackstatter - once this is merged, could you merge #1090 and then update #1091 so that we could merge it later today as well? |
This PR changes the memory model from word-addressable to element-addressable to be compatible with a new version of the Miden VM.