Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to element-addressable memory #1084

Merged
merged 21 commits into from
Jan 22, 2025
Merged

Conversation

Fumuran
Copy link
Contributor

@Fumuran Fumuran commented Jan 17, 2025

This PR changes the memory model from word-addressable to element-addressable to be compatible with a new version of the Miden VM.

@Fumuran Fumuran force-pushed the element-addressable-memory branch from 08ddb24 to d22d20a Compare January 21, 2025 10:14
@Fumuran Fumuran marked this pull request as ready for review January 21, 2025 10:14
@Fumuran Fumuran changed the base branch from next to andrew-migrate-to-vm-next January 21, 2025 10:50
Copy link
Contributor

@PhilippGackstatter PhilippGackstatter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for also updating the docs in memory.rs 🙌.

Left some optional comments.

Comment on lines 151 to 157
const.INPUT_NOTE_SECTION_OFFSET=1048576
const.INPUT_NOTE_SECTION_OFFSET=4194304

# The memory address at which the input note data section begins.
const.INPUT_NOTE_DATA_SECTION_OFFSET=1064960
const.INPUT_NOTE_DATA_SECTION_OFFSET=4259840

# The memory address at which the number of input notes is stored.
const.NUM_INPUT_NOTES_PTR=1048576
const.NUM_INPUT_NOTES_PTR=INPUT_NOTE_SECTION_OFFSET
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be perhaps better if we had:

# The memory address at which the number of input notes is stored.
INPUT_NOTE_SECTION_NUM_PTR=4194304
# The memory address at which the first input note begins.
INPUT_NOTE_SECTION_PTR=4194308

Then we don't have to add 4 to the note section offset wherever we use it, making those procedures a bit easier to read. This might apply to more of these constants here.

In general I wonder if having pointers defined in terms of offsets is helpful. I'm wondering if we should have just pointers defined explicitly and offsets stated in comments or something like that. Otherwise we will always have code that interprets an offset as a pointer, and then we might as well have defined it as a pointer in the first place.

Any thoughts on this, @Fumuran?

(None of this has to be addressed in this PR.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is what we should have now, I think "offset" here (in most paces) is actually a pointer: I think it should be INPUT_NOTE_SECTION_PTR and so on. The only place where we use (or at least should afaik) is offsets inside the structures which should have more than one instance, for example the data of the whole account (since we can load foreign accounts) or the values inside the node data, since we have a lot of nodes.

So, in other words, I think that it is a good idea to change the suffixes those constants from _OFFSET to _PTR, because they are essentially pointers.

To have a different pointer for the number of elements seems like a good idea, although I'm not sure whether it will be that practically helpful, let me look through its usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make an issue and investigate whether we use these section pointers in such a way that the first entry should be the length and the remaining entries the elements, or if we mostly use it independently.

miden-lib/asm/kernels/transaction/lib/prologue.masm Outdated Show resolved Hide resolved
miden-lib/build.rs Outdated Show resolved Hide resolved
Comment on lines 17 to 19
// | Section | Start address | End address | Start address (words) | End address (words) |
// | ----------------- | :-----------: | :---------: | :--------------------:| :-------------------:|
// | Bookkeeping | 0 | 31 | 0 | 7 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// | Section | Start address | End address | Start address (words) | End address (words) |
// | ----------------- | :-----------: | :---------: | :--------------------:| :-------------------:|
// | Bookkeeping | 0 | 31 | 0 | 7 |
// | Section | Element start address (word) | Element end address (word) |
// | ----------------- | :-----------: | :---------: |
// | Bookkeeping | 0 (0) | 31 (7) |

Could we include the word in parentheses instead of separate columns? Or alternatively explain outside the table headers that the elemnt comes first and the corresponding word in parentheses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea. Honestly first I left word addresses just to make sure I didn't confuse anything, but probably it will be more convenient to check addresses in both element and word forms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm not sure which format will be the best. In the technical literature which I read the unit of measurement in the tables is placed after the column name, separated with comma. Will it be clear for the user? Something like that:

// | Section           | Start address, pointer (word pointer) | End address, pointer (word pointer) |
// | ----------------- | :-----------------------------------: | :---------------------------------: | 
// | Bookkeeping       | 0 (0)                                 | 31 (7)                              | 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think what you suggested is fine. We don't export this documentation anyway, so it's only for the miden-base devs.

// | Block header | 800 | 835 | 200 | 208 |
// | Chain MMR | 1_200 | 1_331? | 300 | 332? |
// | Kernel data | 1_600 | 1_739 | 400 | 434 |
// | Accounts data | 8_192 | 532_479 | 2048 | 133_119 | 64 foreign accounts max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we add a proper comment table column for "64 foreign accounts max"?

Copy link
Contributor Author

@Fumuran Fumuran Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Idea! I think it will be generally helpful to have something like Max amount column for every section

Edit: taking into account your next comment, I think it will be better just to have a comment column.

// | Global inputs | 400 | 423 | 100 | 105 |
// | Block header | 800 | 835 | 200 | 208 |
// | Chain MMR | 1_200 | 1_331? | 300 | 332? |
// | Kernel data | 1_600 | 1_739 | 400 | 434 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// | Kernel data | 1_600 | 1_739 | 400 | 434 |
// | Kernel data | 1_600 | 1_744 | 400 | 434 |

Nit: Is this not 1604+(4*35) = 1744?

I think it would be great to explain some of these calculations (like this one or the Chain MMR one) that do not have directly corresponding constants in memory.masm. Could just be in a separate table comment column here saying something like "35 kernel procedures hashes (4 elements each)".

Copy link
Contributor Author

@Fumuran Fumuran Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End pointer here is the last memory pointer occupied by the corresponding data. So to get the last pointer we need to calculate 1600 + (34 * 4 + 3). It is easy to see this pattern on the simple example. Let's say we need to calculate the end pointer in case we have start and end word pointers as 100 and 101:

 ┌─┬─┬─┬─┬─┬─┬─┬─┐
 |0|1|2|3|4|5|6|7|
 └─┴─┴─┴─┴─┴─┴─┴─┘
 |  100  |  101  | 

In that case it will be 1 full word (4 elements, which will get us to the first element of the last word, index 4) and also 3 elements in the last word

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I think my main point was that the kernel procedures start at 1604 rather than 1600, since the word at 1600..=1603 contains the number of procedures, or not?

You're right though that when writing it as an inclusive range, the last address would indeed be 1 less than what I wrote. Then we'd have 1604+(34*4+3) = 1743.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the Kernel data here we assume both number of kernel procedures and their hashes, so this brings us back to the question of whether it is worth creating a separate pointer to the length of the data length of the section ahead.

As far as I can see, in most places we already have these constants for lengths, they just aren't shown. For example, we have this constant for kernel procedures.
Although we don't have such constants for input notes, so I think it makes sense to add a new constant for it to make a nullifier obtaining more simple (well, we are back to your initial question about a new constant for input notes :) )

miden-lib/src/transaction/memory.rs Outdated Show resolved Hide resolved
Base automatically changed from andrew-migrate-to-vm-next to next January 21, 2025 17:56
@Fumuran Fumuran linked an issue Jan 21, 2025 that may be closed by this pull request
@Fumuran Fumuran requested a review from bobbinth January 21, 2025 18:35
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you. Not a full review yet (I covered like 80% of the non-test code) - but I left some comments inline.

miden-lib/asm/kernels/transaction/lib/tx.masm Outdated Show resolved Hide resolved
miden-lib/asm/kernels/transaction/lib/tx.masm Outdated Show resolved Hide resolved
miden-lib/asm/note_scripts/P2ID.masm Outdated Show resolved Hide resolved
miden-lib/asm/note_scripts/P2IDR.masm Outdated Show resolved Hide resolved
miden-lib/asm/note_scripts/SWAP.masm Outdated Show resolved Hide resolved
miden-tx/src/host/account_procs.rs Outdated Show resolved Hide resolved
miden-tx/src/testing/tx_context/builder.rs Outdated Show resolved Hide resolved
miden-lib/src/transaction/outputs.rs Outdated Show resolved Hide resolved
@@ -140,11 +140,13 @@ proc.main.1
# => []
end

push.0 drop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use random numbers for push operation - otherwise, there is a chance we'll run into collisions. Let's also make sure to do this for all trace instructions.

Lastly, let's add comments explaining which it is needed (a single-line explanation will do).

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left a couple of small comments inline.

The plan for merging this is roughly as follows:

  • I'll merge the element-addressable memory PR in miden-vm.
  • Then, I'll release the new version of miden-vm.
  • After this, we can update the Cargo.toml file in this PR to use crates.io versions of miden-vm and then we merge it.

Comment on lines 42 to 43
# store the note inputs to memory starting at address 12
push.12 exec.note::get_assets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment with the state of the stack after this line?

Comment on lines 42 to 59
let nonce = elements[ACCT_ID_AND_NONCE_OFFSET as usize + ACCT_NONCE_IDX];
let vault_root = TryInto::<[Felt; 4]>::try_into(
&elements[ACCT_VAULT_ROOT_OFFSET as usize..ACCT_VAULT_ROOT_OFFSET as usize + WORD_SIZE],
)
.unwrap()
.into();
let storage_commitment = TryInto::<[Felt; 4]>::try_into(
&elements[ACCT_STORAGE_COMMITMENT_OFFSET as usize
..ACCT_STORAGE_COMMITMENT_OFFSET as usize + WORD_SIZE],
)
.unwrap()
.into();
let code_commitment = TryInto::<[Felt; 4]>::try_into(
&elements[ACCT_CODE_COMMITMENT_OFFSET as usize
..ACCT_CODE_COMMITMENT_OFFSET as usize + WORD_SIZE],
)
.unwrap()
.into();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit more complicated than ideal. Maybe create a helper function - something like get_word() or parse_word(), which could look like so:

fn parse_word(data: &[Felt], offset: usize) -> Word

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would return an error or Option if the conversion fails, so that at call sites we can do parse_word(slice[0..4], 0).expect("we should have sliced off exactly 4 bytes"). This seems better than panicking unconditionally inside the function, because it makes the caller less aware of this issue.

@@ -275,41 +279,41 @@ pub const NOTE_MEM_SIZE: MemoryAddress = 512;
// │ NOTE │ SERIAL │ SCRIPT │ INPUTS │ ASSETS │ META │ NOTE │ NUM │ ASSET │ ... │ ASSET │ PADDING │
// │ ID │ NUM │ ROOT │ HASH │ HASH │ DATA │ ARGS │ ASSETS │ 0 │ │ n │ │
// ├──────┼────────┼────────┼────────┼────────┼──────┼───────┼────────┼───────┼─────┼───────┼─────────┤
// 0 1 2 3 4 5 6 7 8 + n
// 0 4 8 12 16 20 24 28 32 + 4n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: previously, the addresses referred to words and so they were centered. Now, they refer to the first element of the word - so, I'd probably shift to be "left-aligned". For example:

// ┌──────┬────────┬────────┬────────┬────────┬──────┬───────┬────────┬───────┬─────┬───────┬─────────┬
// │ NOTE │ SERIAL │ SCRIPT │ INPUTS │ ASSETS │ META │ NOTE  │   NUM  │ ASSET │ ... │ ASSET │ PADDING │
// │  ID  │  NUM   │  ROOT  │  HASH  │  HASH  │ DATA │ ARGS  │ ASSETS │   0   │     │   n   │         │
// ├──────┼────────┼────────┼────────┼────────┼──────┼───────┼────────┼───────┼─────┼───────┼─────────┤
// 0      4        8        12       16       20     24      28       32 + 4n

Comment on lines +175 to 177
# use `push.* drop` instructions before `trace` to make sure that MAST root will be unique
push.3456069754 drop
trace.EPILOGUE_END
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that with 0xPolygonMiden/miden-vm#1122 fixed we no longer needed these. You added explanations that this is for MAST root uniqueness, but what is the larger problem here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, we fixed emit instruction and advice injectors, but for the trace instruction seems like everything remained the same. I'm not sure I fully understand the case, but for now I created a template-ish issue for the trace handling to help investigate further: #1094

@bobbinth
Copy link
Contributor

  • I'll merge the element-addressable memory PR in miden-vm.
  • Then, I'll release the new version of miden-vm.
  • After this, we can update the Cargo.toml file in this PR to use crates.io versions of miden-vm and then we merge it.

The first two items from this list are now done. @Fumuran could you address remaining comments and let's merge.

@PhilippGackstatter - once this is merged, could you merge #1090 and then update #1091 so that we could merge it later today as well?

@Fumuran Fumuran merged commit 155e610 into next Jan 22, 2025
12 checks passed
@Fumuran Fumuran deleted the element-addressable-memory branch January 22, 2025 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to element-addressable memory
3 participants