Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discontiguous UF2 files not written consistently #19

Open
cbiffle opened this issue May 13, 2022 · 4 comments
Open

Discontiguous UF2 files not written consistently #19

cbiffle opened this issue May 13, 2022 · 4 comments

Comments

@cbiffle
Copy link

cbiffle commented May 13, 2022

Hi! I am admittedly doing something slightly odd, but that odd thing may have exposed a bug in the firmware loading code.

The build system for Hubris lays out separately compiled tasks in isolated memory containers. This means that its Flash footprint consists of a series of (power-of-two-aligned) chunks, many of which are not full. For instance,

ADDRESS  END          SIZE FILE
10000100 100001a8       a8 target/demo-pi-pico/dist/kernel
100001a8 10002b20     2978 target/demo-pi-pico/dist/kernel
10002b20 10002db8      298 target/demo-pi-pico/dist/kernel
10003000 10003048       48 target/demo-pi-pico/dist/idle
10003400 10003708      308 target/demo-pi-pico/dist/sys
10003708 10003710        8 target/demo-pi-pico/dist/sys
10003800 10003aec      2ec target/demo-pi-pico/dist/user_leds
10003aec 10003b48       5c target/demo-pi-pico/dist/user_leds

For instance, in the example layout above, there's a gap between the end of the kernel text (at ...2db8, or ...2e00 when rounded up to the next Flash sector) and the beginning of the next program's text at ...3000.

I've altered our build system to produce UF2 and I've tested it in two different modes. In "contiguous" mode, it generates zero-filled records for gaps like the one described above. In "discontiguous" mode, it does not (and the block count and whatnot is reduced accordingly). The datasheet didn't talk me out of doing discontiguous UF2s; indeed, section 2.8.4.2 mentions offhand that

Note that flash is erased a 4K sector at a time, so writing to only a subset of a 4K flash sector will leave the rest of that
flash sector undefined. Beyond that there is no requirement that a binary be contiguous.

I went and studied virtual_disk.c and associated code, and it looks like it's trying to do the right thing even if blocks are discontiguous, out of order, etc. So, it sure looks like this should work!

However, when loading in discontiguous mode, the Flash is not being written correctly. Specifically, the Flash will be written up to the final sector in a contiguous region, and then the first few sectors of the region after the gap are erased but not written, so they read back as FF FF etc. Example (from a conversation through gdb+openocd):

(gdb) mon mdb 0x10002d00 16
0x10002d00: 05 00 00 00 00 00 00 00 00 07 00 20 00 01 00 00 

(gdb) mon mdb 0x10002e00 16  # expected to be FFs, this is the gap
0x10002e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 

(gdb) mon mdb 0x10003000 16  # expected to have a program in it!
0x10003000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

The UF2 record that should have been written there reads:

00005c00  55 46 32 0a 57 51 5d 9e  00 20 00 00 00 30 00 10  |UF2.WQ].. ...0..|
00005c10  00 01 00 00 2e 00 00 00  68 00 00 00 56 ff 8b e4  |........h...V...|
00005c20  0a 48 0b 49 0b 4a 01 e0  08 c9 08 c2 82 42 fb d1  |.H.I.J.......B..|
00005c30  09 48 0a 49 00 22 00 e0  04 c1 81 42 fc d1 bf f3  |.H.I.".....B....|
00005c40  4f 8f bf f3 6f 8f 00 f0  0b f8 fe de 00 07 00 20  |O...o.......... |
00005c50  48 30 00 10 00 07 00 20  00 07 00 20 00 07 00 20  |H0..... ... ... |
00005c60  80 b5 00 af 30 bf fd e7  00 00 00 00 00 00 00 00  |....0...........|
00005c70  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00005df0  00 00 00 00 00 00 00 00  00 00 00 00 30 6f b1 0a  |............0o..|

The problem extends up until address ...3500 at which point sectors start being written correctly again. (This is a curious point for things to change, as it's not an erase sector boundary.)

So, I am either missing something, or there's a subtle bug here.

I have attached gzip'd UF2 images of the firmware (compiled for Pi Pico) in both contiguous mode (works!) and discontiguous mode (fails as described above). If you try it, the "working" case should blink the LED at about 1 Hz, while the failure case will set the LED and then hard fault when it jumps into an erased flash sector.

contiguous.uf2.gz
discontiguous.uf2.gz

@kilograham
Copy link
Contributor

kilograham commented May 14, 2022

yes this is a bug; see raspberrypi/pico-sdk#762

the next (SDK1.3.1) version of elf2uf2 will work around this for you. There will be an errata in the next docs release, and a documentation on how to work around the problem if you aren't using elf2uf2 (TLDR pad any partially filled 4K flash sector other than the last with zeroes - they are erased anyway).

see also #9

@cbiffle
Copy link
Author

cbiffle commented May 14, 2022

Ah. Yes, I missed that when reading the code: https://github.com/raspberrypi/pico-bootrom/blob/master/bootrom/virtual_disk.c#L205 - that likely needs to be derived from the transfer address rather than the block number.

If my current understanding of that code is correct, it's using the top 28 bits of the block number to select a bit in the erase bitmask. So, what it's actually tracking is whether any erases have been performed for a particular group of 16 incoming blocks. (With the side effect that it will unconditionally erase one flash sector per group of 16 blocks.) So the behavior I'm seeing results from a combination of two things:

  1. Within a group of 16 blocks that address more than one 4kiB erase sector, only the first erase sector mentioned will be erased. Addressed data sectors within other erase sectors will be written without erase.
  2. If the first block in a later group of 16 blocks happens to address one of those partially written erase sectors, the ROM will erase it, losing the writes.

So the mitigation seems to be: Ensure that each set of UF2 records that share the top 28 bits of the block number (that is, each consecutive group of 16 by block number) address the same 4kiB erase sector. (This is equivalent to your comment about padding partially filled 4K sectors, but more specific, because I'm going to go implement it.)

Are y'all open to a patch? Is this something you might consider fixing in a future mask revision?

cbiffle added a commit to cbiffle/uf2l that referenced this issue May 14, 2022
This class of bug seems like it has the potential to be pretty common,
so the mitigations are not hardcoded as rp2040-specific. However, there
is an rp2040-specific _check_ in the info command now.
@cbiffle
Copy link
Author

cbiffle commented May 14, 2022

I've added the mitigation described above to uf2l as well as added a check to the file validator that can detect files that will trip this bug. I'm happy to prepare a PR to fix the ROM if you're open to it.

@kilograham
Copy link
Contributor

Ah. Yes, I missed that when reading the code: https://github.com/raspberrypi/pico-bootrom/blob/master/bootrom/virtual_disk.c#L205 - that likely needs to be derived from the transfer address rather than the block number.

If my current understanding of that code is correct, it's using the top 28 bits of the block number to select a bit in the erase bitmask. So, what it's actually tracking is whether any erases have been performed for a particular group of 16 incoming blocks. (With the side effect that it will unconditionally erase one flash sector per group of 16 blocks.) So the behavior I'm seeing results from a combination of two things:

1. Within a group of 16 blocks that address more than one 4kiB erase sector, only the first erase sector mentioned will be erased. Addressed data sectors within _other_ erase sectors will be written without erase.

2. If the first block in a later group of 16 blocks happens to address one of those partially written erase sectors, the ROM will erase it, losing the writes.

So the mitigation seems to be: Ensure that each set of UF2 records that share the top 28 bits of the block number (that is, each consecutive group of 16 by block number) address the same 4kiB erase sector. (This is equivalent to your comment about padding partially filled 4K sectors, but more specific, because I'm going to go implement it.)

Are y'all open to a patch? Is this something you might consider fixing in a future mask revision?

Your understanding is exactly correct. uf2l; very nice.

Sure we will accept a patch, which we would include if RP2040 does ever get another mask revision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants