Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing stack setup on SVC calls #230

Closed
wants to merge 1 commit into from
Closed

Optimizing stack setup on SVC calls #230

wants to merge 1 commit into from

Conversation

SenRamakri
Copy link

This is work in progress needing initial feedback on proposed changes.

The optimization we are proposing is to remove the SVC_SETUP_PSP on svc_indirect() calls and check for which stack(MSP or PSP) was active when SVC call was made and then using that stack pointer(MSP or PSP) accordingly in SVCHandler. By doing this we saved around 260 bytes of code space when I tried building a test with mbed-os and there will be some improvement in performance as well since calls to intrinsic macros like __get_CONTROL, __set_PSP, __get_MSP in SVC_SETUP_PSP can be avoided although I have not measured quantitatively. Adding @c1728p9 as well.

@JonatanAntoni
Copy link
Member

Hi @SenRamakri,

Many thanks for your contribution.
We discussed your proposal in the team controversially.

The current solution was implemented with all the xxxNew calls in mind. Only those are typically usable before osKernelStart, i.e. running on MSP. As soon as the Kernel runs we are in thread context using PSP.

The new solution would save some code space per xxNew call used (sums up to 260 bytes max). But on the other hand it adds a common overhead of about 2 or 3 cycles for each and every SVC call (which is about 1-2% to the 100-200 cycles we currently need).

So it is a trade of between code space and execution speed. What do you think? What does count more for you? 260 Bytes flash usage or 1-2% performance per SVC call?

Cheers,
Jonatan

@SenRamakri
Copy link
Author

Hi @JonatanAntoni,

Thanks much for your time looking into this and reviewing.
And I agree that the new changes added 2 more instructions to the SVC_Handler. But, isn't that every call into RTX kernel go through one of SVC0_xx macro calls(defined in core_cm.h). And many of those macros did the __get_CONTROL, __set_PSP, __get_MSP inside the SVC_SETUP_PSP macro. Aren't we saving on those? So, overall, we performance impact due to 2 more instructions would be negligible or may be advantageous depending on what SVC_xx calls are used. Is that right? Let me know what you think.

Regards,
Senthil

@RobertRostohar
Copy link
Collaborator

Only the osXxxNew Calls (and osKernelInitialize/Start) used the SVC macro with SVC_SETUP_PSP. So only those functions had a few cycles of overhead. More precise the osXxxNew functions (which already take quite some time) had the additional 5 cycles overhead (when called after osKernelStart).

Summary:
2..3 additional cycles for every osXxx function (except osXxxNew) and 2..3 less cycles for osXxxNew functions.

RobertRostohar added a commit that referenced this pull request Oct 25, 2017
@RobertRostohar
Copy link
Collaborator

RTX5 has been updated as suggested: Stack setup for Cortex-M has been replaced with stack detection in SVC_Handler in order to save code space.

@JonatanAntoni
Copy link
Member

Hi @SenRamakri,

Robi changed the code according to your suggestions. May I ask you to double check if the latest version fits your expectation? Please close this PR if you don't have further remarks.

Thanks for contributing,
Jonatan

@SenRamakri
Copy link
Author

Hi @RobertRostohar and @JonatanAntoni,

Thanks for your time looking into this and adopting the enhancements I suggested for SVC_Handler.
We will plan on syncing-up these changes into mbed-os now( @sg- ).
Closing this PR.

-Senthil

@SenRamakri SenRamakri closed this Oct 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants