Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32-S2/S3 support #91

Merged
merged 20 commits into from
Sep 2, 2023
Merged

ESP32-S2/S3 support #91

merged 20 commits into from
Sep 2, 2023

Conversation

wnienhaus
Copy link
Collaborator

@wnienhaus wnienhaus commented Jul 12, 2023

This PR adds support for the ULP-FSM (not ULP RISC-V) of the ESP32-S2 and ESP32-S3.

(Note, the ESP32-S2 and S3 have the same ULP-FSM, so we don't distinguish between them in the implementation).

The entire instruction set of the ULP-FSM of the ESP32-S2 is supported, including the new load and store instructions to allow loading from or storing to the upper 16-bits of memory locations.

Use the -c command line argument when assembling or disassembling. Use -c esp32 (or omit the -c option) to select the original ESP32. Use -c esp32s2 to select the ESP32-S2/S3. Refer to the documentation in docs/disassembler.rst for more detail.

Please note: the latest release of MicroPython to date is 1.20, which does not enable the ULP on ESP32-S2/S3 devices. This is fixed already, but we need to wait for the next release to have the fix in an official version. To test this on a real device, use a recent nightly build of Micropython and flash that to your device.

This PR resolves #85 .

(The easiest way to review this PR is commit-by-commit. Each commit is one independent change.)

wnienhaus added 10 commits July 12, 2023 19:00
…s_s2

Currently the logic in these copies is identical to the original.
This will makes it easier to see the changes we're making in
subsequent commits.
Select cpu with -c when running the assembler (--mcpu as used by
Espressif's esp32ulp-elf-as also works). The possible values are
'esp32' and 'esp32s2'. (Note esp32s2 also works for the ESP32-S3,
because those two MCUs share the same ULP-FSM binary format).

If no cpu is specified the original 'esp32' will be used as before.
In summary, these are the most important changes:
* the `sub_opcode` field on all opcodes except ST have been shrunk
  from 3 bits down to 2 bits
* the LD and ST opcodes support more variations (new instructions)
* branching instructions have changed. The JUMPR instruction uses
  different comparisons and the JUMPS instruction now implements all
  comparisons in hardware, without requiring multiple instructions
  to emulate any comparison.
* There is no more SLEEP instruction to select a sleep timer. Since
  Espressif chose to simply convert SLEEP instructions into WAIT
  instructions we're doing the same.

Update integration tests to run for both the ESP32 and ESP32-S2.
The disassembler is now mainly the command line tool, which deals
with interpreting user input, formatting output, etc.

This allows us to add decoding logic for new cpus (the S2) and the
disassembler can then dynamically load the correct decoding module.
Currently only the esp32 is implemented (esp32s2 soon to follow).

This commit adds the -c command line option to the disassembler, as
well as updates the integration tests to supply the cpu parameter,
and test fixtures are renamed to include the cpu type in their name,
so that we can have separate fixture files for other cpus.
To disassemble ESP32-S2 ULP binaries use the `-c esp32s2` option
when running the disassembler.

Update documentation to mention support for the ESP32-S2.
…pported by the S2

The ESP32-S2 changes the comparison operators that are natively
supported by the CPU for the JUMPR and JUMPS instruction.

For the JUMPS instruction all comparison operators are now natively
supported in hardware.
For example STL and STH can be used to store to the lower or
upper 16-bits of a memory address respectively. (The original
ST instruction could only store to the lower 16-bits of a
memory address)

The following new instructions are now supported:
* LDL, LDH
* STL, STH, ST32, STI32, STI, STO
The following new instructions are now supported:
* LDL, LDH
* STL, STH, ST32, STI32, STI, STO

Note: The disassembler will return LD instead of LDL and
ST instead of STL, because they are each synonyms of the
other. We can only pick either and so we picked the keyword
that exists across both the ESP32 and the ESP32-S2/S3.
@wnienhaus
Copy link
Collaborator Author

wnienhaus commented Jul 12, 2023

There are some parts of the implementation I am currently not 100% happy with:

  1. Having both disassemble.py (cli tool) and decode{,_s2}.py (not cli tools) under tools.
    • I did consider having those files under esp32_ulp
    • but that would mean they get included in the upip package (soon mip) by default
    • and I didnt want those files on the device by default when one installs the package (maybe that's a weird choice?).
  2. Having two intentional bugs to match what Espressif's assembler (binutils-gdb/esp32ulp-elf-as) does.
    • This makes our comparison tests pass, but is still wrong (based on my understanding).
    • I reported both issues along with fixes to Espressif (here and here), but they haven't responded at all, and I wonder if they will give any attention to the ULP-FSM at all, now that the ULP RISC-V exists.
    • I was considering making our implementation correct and adjusting the tests somehow to account for the difference, but for now it is how it is.
  3. That opcode.py and opcode_s2.py share (duplicate) a lot of non-cpu specific code, such as parsing register numbers or evaluating expressions. It was like that before, so for now it's the same, but it could divided up more nicely.

@ThomasWaldmann @dpgeorge @mattytrentini - if you have time and want to take a look at this PR, i'd appreciate it.

@wnienhaus wnienhaus mentioned this pull request Jul 12, 2023
@wnienhaus
Copy link
Collaborator Author

One thing I noticed is that the S2 and S3 have different peripheral register addresses for similar things, different from the original ESP32 and also different between each other. That means that (most likely) our blink.py example will not work on an ESP32-S2/S3 even if the assembled ULP code has the right binary format.

I will test this on my S2 and S3 and add examples for the specific versions and add a mention of this detail into the documentation.

Specifying 0 as the label is different than not specifying a
label at all. This commit corrects the behaviour when label 0
is used.

Also run the all_opcodes fixture as integration test to ensure
the same result as binutils-gdb/esp32ulp-as (which is how this
bug was found).
@ftylitak
Copy link

ftylitak commented Aug 7, 2023

Hello all,

first of all thank you @wnienhaus for this usefull PR. We were in need of a Pulse Count implementation in ULP for ESP32S3 so based on this PR we changed the required addressed and it worked. (we were not sure whether to send a PR on wnienhaus:s2s3_support or on this main repo so we decided to just write this post as a description of the solution.)

The changes can be found in the last two commits of the following repo which has this PR merged and has as addition the specifics for ESP32S3:

In a nutshell, opcodes_s3.py and soc_s3.py were added and assemble.py was updated to identify cpu=esp32s3.

Also here is the exampel code used to test the Pulse Counter on ESP32S3 on GPIO 4.

https://github.com/insighio/micropython-esp32-ulp/blob/master/examples/pulse_counter_esp32s3.py

We are based on micropython 1.19.1 so we made changes in the code of micropython to enable ULP for ESP32S3.
Though the most important note that we can make is that ESP32S3 closes down all RTC Peripherals upon deepsleep. Thus the user needs to instruct the ESP32S3 to keep the RTC IO powered on when in deepsleep and this instruction must executed every time the device wakes up from deep sleep.

For this reason, we added one more function in the ULP module which initializes the GPIO port, defines it as Input, enables Pull Down and instructs to keep RTC peripherals powered on when in deep sleep. This is called every time the device wakes up from deep sleep or else the counting of the edges stops.

The call to this function: https://github.com/insighio/micropython-esp32-ulp/blob/90fe843783ac63acf6224bd3e9e11b3522ce5469/examples/pulse_counter_esp32s3.py#L117-L124

And the function itself in custom micropython branch:
https://github.com/insighio/micropython/blob/33b18e0b67f73f27affff86acc796539a14f8b07/ports/esp32/esp32_ulp.c#L115-L131

In this way, we have a functioning Pulse Counter. Though our tests, by setting the ulp.set_wakeup_period from 20.000 to 100 we are reliably reading 100 pulses per second.

Hope all the above are helpfull. Thanks one again.

Updated as per ESP-IDF v5.0.2. Also added reference URL to those
constants in the ESP-IDF.
@wnienhaus
Copy link
Collaborator Author

@ftylitak Thank you so much for that feedback and testing this PR. Great to know that it works for you.

I have been working slowly on improving the PR to add support for the S2 and S3 peripheral register addresses. I just pushed that work to this PR.

I took a different approach than you for supporting the S3. Since the S2 and S3 have the same binary format on the ULP-FSM (in order words opcode_s2.py and opcode_s3.py would be identical in all ways except the peripheral register addresses accepted), I decided to keep having only the esp32s2 cpu for both the S2 and S3 and make this clear in the documentation. This also aligns with how Espressif does this in their binutils-gdb/esp32-ulp-as assembler, which only has cpu options esp32 and esp32s2 and they use the esp32s2 also for the S3.

That choice did lead me to something I am not 100% happy with, but thought it might be overall positive nonetheless, namely that opcode_s2 now accepts peripheral register addresses from any of the ESP32 variants. The reason is that what ends up in the actual processor instruction is a relative offset, relative to the base address of the specific variant. Thus within the instruction, it doesn't matter what the real input address was. And since especially across the S2 and S3 most of the relative offsets are actually the same, this "feature" allows assembly code with register addresses in the S2 range to work unmodified for the S3 (actually the binary would even be the same). You can see in the examples I added that I now only need 1 example for the s2/s3 combined rather than 2 examples.

I am not sure what you think of this. You took the alternative of adding a new cpu esp32s3. Perhaps this is the clearer approach? That way one doesn't need to even know that the S2 and S3 are binary compatible? On the other hand that feels useful knowledge to me, because one can reuse code and binaries.

Feel free to rebase onto what I just pushed. Note that commit 2aff8a1 contains an important fix for ST instructions with labels.


Small comment on your Micropython patch: That function is specific to your need and would not be good in MicroPython in general (e.g. it hard codes the direction to input, etc).

However, it might not even be needed, because so far I have always managed to do all that initialisation inside the ULP code, given that it also has access to the peripheral registers, which are configured by the rtc_gpio_* functions of the ESP-IDF. It requires a bit of digging in the ESP-IDF, to find which registers are set by those higher level function, but so far it worked.

What might not work (I am still busy testing), is enabling power of the RTC peripherals from ULP code (especially once deep_sleep has already been entered). So perhaps that is the "generic" function that would be useful in MicroPython, i.e. this part:

    // instruct to keep RTC peripherals powered on after when in deep sleep
    esp_sleep_pd_config(ESP_PD_DOMAIN_RTC_PERIPH, ESP_PD_OPTION_ON);

@ftylitak
Copy link

ftylitak commented Aug 8, 2023

@wnienhaus great comments.

Indeed our modification in the Micropython branch is too specific. Those parts of setting the direction of the input etc were added there in the beginning as testing because we were not getting any results (plus we did not have much knowledge on the ULP so it was a bit of testing here and there). In the end we have found the RTC peripheral power on, it all started working though we did not take a step back to use the ULP code for the rest.
Also, the initialization of the GPIO was a direct copy of the setup procedure of the ESP-IDF example that we took as reference for testing.
https://github.com/espressif/esp-idf/tree/master/examples/system/ulp/ulp_fsm/ulp

The interesting part is that the ESP-IDF example does not use esp_sleep_pd_config though it works. So, may be it is a setting of Micropython that affects the power of the RTC peripherals.

Regading to this project (micropython-esp32-ulp) we wil rebase to your changes.

@wnienhaus
Copy link
Collaborator Author

Thanks for that input. It should help me resolve my current problem: readgpio_s2.py doesnt work on either the S2 or S3 - GPIO input is always 0.

Very interesting that the ESP-IDF example does not use the esp_sleep_pd_config - I hadnt noticed yet. Because readgpio_s2.py runs without entering deepsleep, I didnt think esp_sleep_pd_config would make a different (i'd assume the RTC peripherals are on, while the entire device is on), but I was going to try that next.

So I will actually try your approach next (how you patched MicroPython), given that it works for you, and then see if that makes readgpio_s2.py work for me. Assuming that it will, I can then narrow down from there to figure out which of those rtc_gpio_* statements actually makes the difference, and then see whether doing the same from ULP code would also work.

@wnienhaus
Copy link
Collaborator Author

@ftylitak So, it turns out it's actually the "IO MUX clock gate" that needs to be enabled, which the ESP-IDF does as part of rtc_gpio_init for the S2 and S3 (not the original ESP32).

rtc_gpio_init calls rtcio_hal_function_select here, which is mapped to a variant-specific rtcio_ll_function_select implementation here, which for the s3 is this one, the IO MUX clock gate is enabled here, like this:

SENS.sar_peri_clk_gate_conf.iomux_clk_en = 1;

When I create a MicroPython function in esp32_ulp.c that executes that same 1 line (instead of the full rtc_gpio_init as in your (@ftylitak) patch) and I run that before starting my readgpio_s2.py ULP program, everything works (note I select mux RTC, input-enable and pulldown/pullups from within the ULP code, so that all works as intended from the ULP). Without running that 1 line, I cannot read GPIO input.

I tried doing the equivalent with WRITE_RTC_FIELD(SENS_SAR_PERI_CLK_GATE_CONF_REG, SENS_IOMUX_CLK_EN, 1) from within the ULP code (inspired by the ESP-IDF example here), but it doesn't have any effect. Maybe it has no effect in the ESP-IDF example either, because they already run rtc_gpio_init before starting the ULP code. But given that this example code exists, I wonder if we stumbled upon a bug in the chip. (Someone else also ran into this problem: see).

One "simple" fix could be to patch MicroPython to set this flag everytime esp32.ULP.run() is called, but I guess there is a reason this is off-by-default, and someone might want to keep it off in some applications (perhaps it saves some power).

The other, perhaps "more correct" option could be to simply expose the rtc_gpio_init function as e.g. esp32.ULP.rtc_gpio_init(pin_number) - without additionally setting direction, configuring pullups/pulldowns, hold, etc - as those things all can be done from within the ULP code.

Anyway, just reporting my findings so far. Will still think a bit more about this.

(@ftylitak - if you have some time, it would be interesting to know whether this single line is enough to make your use case work, also in deepsleep, because i have not tested with deepsleep yet.)

Peripheral registers of the ESP32-S2 and S3 are at a different
location than those of the original ESP32. The location within
the address space is mostly the same, however we need to use
the correct base address when calculating periph_sel (type of
register) used in REG_RD and REG_WR instructions.

Note 1: To avoid creating an entirely new CPU (esp32s3) just for
handling the different peripheral register addresses of the S3,
while their binary format (instruction set) is identical, our
esp32s2 CPU support will now accept both ESP32-S2 and ESP32-S3
addresses. This should make a binary for the one seamlessly
work on the other (without reassembly), given that the offsets
of different peripheral registers between the S2 and S3 are
mostly (but not entirely) identical.

Note 2: Our esp32s2 cpu support will also accept peripheral
register addresses of the original ESP32. This was originally
done because Espressif's binutils-gdb/esp32-ulp-as incorrectly
validates addresses for the esp32s2 cpu, and to make our compat
tests pass, this was needed. However this also has a nice side-
effect of allowing some assembly written for the original ESP32
to work unmodified when assembled for an S2/S3, because some of
the peripheral registers live at the same offset from the base
for all three variants.
We also now use the correct include files from the ESP-IDF
when building the defines DB, correct for the cpu type we're
testing with. (That also means the defines DB is built once
per cpu type).

That, together with some ESP32-S2 specific test cases from
Espressif's esp32s2 assembler test-suite, make those test
cases more interesting to run, compared to only assembling
ESP32 examples with the esp32s2 cpu selected.

Note: This change no longer runs the ulp_tool examples for the
esp32s2 case, because those examples use contants (from the
ESP-IDF include files), which no longer exist for the ESP32-S2,
such as `RTC_IO_TOUCH_PAD*_HOLD_S`. Since the ulp_tool examples
primarily test the preprocessor's ability to resolve constants
from include files (via the defines DB), testing those examples
only once with the ESP32 cpu should be enough.
@wnienhaus
Copy link
Collaborator Author

So, I finally have made progress. It turns out the issue with failing to write to the SENS_SAR_PERI_CLK_GATE_CONF_REG register was due to a bug in our assembler, which already existed in our original ESP32 implementation. The bug was in how we translated peripheral register addresses (see commit which fixes it: 520a7f7)

This issue also once existed in the ESP-IDF, which is where we got the incorrect translation logic from. After asking Espressif (espressif/esp-idf#12158) about not being able to write to the SENS_SAR_PERI_CLK_GATE_CONF_REG register they pointed me to their fix (espressif/esp-idf#11652), which was exactly what we needed.

I have now added the fix to this PR and added working examples (all tested on a real device) to the examples/ directory. Those examples end in _s2 or _s3 (where no _s3 example exists the s2 example also works on the s3).

So we don't need any changes to MicroPython after all! I am happy.

cc: @ftylitak

@wnienhaus wnienhaus force-pushed the s2s3_support branch 2 times, most recently from 5be3f10 to bc2ee7e Compare August 30, 2023 19:35
The ESP32-S2/S3 support a negative offset in ST/LD instructions.
Those offsets are two's-complement encoded into a field that is
11-bits wide.

This change corrects the decoding of negative offsets given the
field width of just 11-bits, rather than relying on the 32 or
64 bits of a MicroPython `int`.

Note 1: Negative offsets used in JUMP instructions are encoded
differently (sign bit + positive value), and their decoding is
already done correctly.

Note 2: The LD/ST instructions in of the ESP32 do not support
negative offsets (according to Espressif tests), so their
decoding remains as is.
This was already incorrect in the original ESP32 implementation
but was discovered while testing the new S2/S3 implementation.

This was also wrong within the ESP-IDF, that we based the translation
logic on. Espressif fixed the issue in this pull request:
espressif/esp-idf#11652

We now also have unit tests and compat (integration) tests, that
compare our binary output against that of binutils-gdb/esp32-ulp-as,
which already did this translation correctly, but we didnt have a
test for the specific cases we handled incorrectly, so we didn't
notice this bug.

This fix has also been tested on a real device, because S2/S3 devices
need the IOMUX clock enabled in order to be able to read GPIO input
from the ULP, and enabling that clock required writing to a register
in the SENS address range, which didnt work correctly before this fix.
There are now example files for the S2 and S3 (with ``_s2`` or ``_s3``
appended to their filenames).

Note: The s2 examples also work unmodified on the ESP32-S3, except the
readgpio example which needs different peripheral register addresses
on the S3.

The ``counter_s2.py`` example is unmodified compared to the original
example, except that the assembler is told to generate esp32s2 output.

The ``blink_s2.py``, ``readgpio_s2.py`` and ``readgpio_s3.py`` examples
have their rtc_io base address updated, as well as the constants
referring to the GPIO pins and channels and the peripheral register bits
used to read/write the GPIO inputs/outputs. These addresses/bits have
changed from the original ESP32. Otherwise the examples are identical to
the examples for the original ESP32.
@wnienhaus
Copy link
Collaborator Author

Now that the examples work on real devices (S2 and S3) and because I have tested this quite extensively by now, I will proceed to merge this. Any further improvements can follow in future PRs.

@wnienhaus wnienhaus merged commit 25bf34e into micropython:master Sep 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support ESP32-S2
2 participants