Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Small String Optimisation (SSO) #1951

Merged
merged 13 commits into from
Nov 9, 2019

Conversation

mikee47
Copy link
Contributor

@mikee47 mikee47 commented Nov 6, 2019

Based on the Arduino ESP8266 core implentation.

  • An empty String object now consumes 12 bytes (from 8) but provides an SSO capacity of 10 characters (plus NUL).
    The heap will therefore not be used until capacity exceeds this figure.
  • Capacity and length types changed to size_t, thus String is no longer restricted to 64K.
    (Handy for writing test applications running on the Host Emulator.)

If anyone has superclassed String for their own needs then this PR may affect them.
However, it should be possible to rewrite using only public String methods; this has been done
with CStringArray.

Binary string support

The following methods have been updated to work with Strings containing NULs and other non-ASCII characters:

  • toLowerCase()
  • toUpperCase()
  • lastIndexOf()
  • substring()

The behaviour of lastIndexOf() has changed. Previously, if fromIndex exceeded the String length the
method failed. It now just starts the search from the end of the string.
This is more useful/logical/friendly and consistent with ArduinoCore-avr code.

The following methods have been added:

  • int compareTo(const char* cstr, size_t length)
  • bool equals(const char *cstr, size_t length)
  • bool equalsIgnoreCase(const char* cstr, size_t length)
  • int indexOf(const char* cstr, size_t cstrlen, size_t fromIndex)
  • int lastIndexOf(const char* cstr, size_t cstrlen, size_t fromIndex)
  • bool replace(const char* find_buf, size_t find_len, const char* replace_buf, size_t replace_len)

These allow operations which previously required construction of a separate String object,
so are used internally anyway.

Other fixes/changes

  • Add return bool from replace(String, String) method as allocation could fail
  • Fix concat(const char*, size_t) to handle self-append, e.g. s.concat(s.c_str(), 10), s += s, etc.
  • Fix substring() and lastIndexOf() so they don't self-modify
  • In substring(), fix an 'off-by-one' bug which produces an incorrect result if left == right == length().
  • In move(), invalidate the String if rhs is null.
  • Remove STRING_IRAM_ATTR and commented-out code (references to Print)

A number of trivial methods and operators have been moved into the header, reducing
code size and eliminating un-necessary function calls.

@mikee47
Copy link
Contributor Author

mikee47 commented Nov 6, 2019

Given that adding SSO is high-impact anyway, I've rolled up a bunch of other improvements which should now make String fully binary-compatible.

I'm happy to split this PR up but what we could do with is a comprehensive set of integration tests for this. In the meantime I'm making this available for inspection and testing.

@mikee47
Copy link
Contributor Author

mikee47 commented Nov 7, 2019

For comparison, here's what MallocCount has to say running HostTests on the emulator:

Without SSO:

  • Heap allocations: 3645, total: 166675 bytes, peak: 16617, current: 2063

With SSO:

  • Heap allocations: 2383, total: 159471 bytes, peak: 16590, current: 2065

@slaff slaff added this to the 4.0.1 milestone Nov 7, 2019

if (left > right)
{
unsigned int temp = right;
size_t temp = right;
right = left;
left = temp;
}
Copy link
Contributor Author

@mikee47 mikee47 Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check (left > right) is present in both AVR and Esp8266 Arduino cores. The parameter names also differ from the Arduino docs. (The AVR source differs as well, calling them beginIndex and endIndex.) Maybe there's a historical reason for it but in my view the method should fail if the parameters have been interchanged. Here's the doc. page:
https://www.arduino.cc/reference/en/language/variables/data-types/string/functions/substring/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slaff any thoughts on this one? Should we change it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, change it please. But put this change as a separate PR. If we break some Arduino libraries it should be easy to revert it.

@mikee47
Copy link
Contributor Author

mikee47 commented Nov 7, 2019

Once #1954 is merged I'll rebase.

@mikee47 mikee47 changed the title Implement Small String Optimisation (SSO) (WIP) Implement Small String Optimisation (SSO) Nov 8, 2019
@slaff
Copy link
Contributor

slaff commented Nov 8, 2019

@mikee47 Ping me when you are ready with this PR so that I can test it.

…at(s.c_str(), 10)`, `s += s`, etc.

Simplify concat(char) and move into header
Instead of failing if `fromIndex` is too large, start from end of string.
This is more useful/logical/friendly and consistent with ArduinoCore-avr code.

Add `lastIndexOf(const char*, size_t, size_t)` and rationalise methods

Added `memrchr.c` from glibc.
…se methods

Return `bool` as allocation could fail
@mikee47 mikee47 changed the title (WIP) Implement Small String Optimisation (SSO) Implement Small String Optimisation (SSO) Nov 8, 2019
@slaff slaff removed the 0 - Backlog label Nov 9, 2019
@slaff
Copy link
Contributor

slaff commented Nov 9, 2019

@mikee47 Is this PR ready for merging?

@mikee47
Copy link
Contributor Author

mikee47 commented Nov 9, 2019

@slaff Yes, all done thanks.

@mikee47
Copy link
Contributor Author

mikee47 commented Nov 9, 2019

@slaff I've just run HostTests on a nodeMCU and there's no change in the heap count, which is suspicious, so maybe hold off until I've found out what's happening?

@mikee47
Copy link
Contributor Author

mikee47 commented Nov 9, 2019

@slaff It's an issue with malloc_count, realloc isn't getting hooked properly. I'll investigate that separately, this PR is all done.

@mikee47
Copy link
Contributor Author

mikee47 commented Nov 9, 2019

Yep, malloc_count needs to hook both malloc and pvPortMalloc, etc. explains why allocations are so low. I'll have a PR to fix that.

@mikee47
Copy link
Contributor Author

mikee47 commented Nov 9, 2019

OK, with the fixed malloc_count here's what I get running HostTests (with SSO tests disabled) for this PR vs. current develop:
With SSO: Heap allocations: 2845, total: 170659 bytes, peak: 16426, current: 1901
Without: Heap allocations: 4107, total: 178174 bytes, peak: 16453, current: 1899

@slaff slaff merged commit 1fb600e into SmingHub:develop Nov 9, 2019
@slaff slaff removed the 0 - Backlog label Nov 9, 2019
@mikee47 mikee47 deleted the dev/String-SSO branch November 9, 2019 12:55
@mikee47
Copy link
Contributor Author

mikee47 commented Nov 9, 2019

As an addendum to this PR, I was curious to see what happens if SSO_CAPACITY is increased from the standard 10 bytes:

SSO_CAPACITY = 10: Heap allocations: 2383, total: 159467 bytes, peak: 16590, current: 2065
SSO_CAPACITY = 30: Heap allocations: 1383, total: 144372 bytes, peak: 16670, current: 2125
SSO_CAPACITY=62: Heap allocations: 768, total: 120900 bytes, peak: 16757, current: 2180
SSO_CAPACITY=126: Heap allocations: 474, total: 105155 bytes, peak: 17013, current: 2372

All tests done on the emulator.

The tradeoff is the amount of RAM used for global/static String instances, as well as intermediate stack usage although that's not really an issue with Sming.

If there's interest, this could easily be added as a tunable parameter.

@slaff
Copy link
Contributor

slaff commented Nov 9, 2019

If there's interest, this could easily be added as a tunable parameter.

This plus a small sample that based on the usage of Strings and the size of the memory recommends the optimal value for that parameter :) OR just a precalculated table with recommended values as part of the documentation.

@slaff slaff mentioned this pull request Jan 2, 2020
4 tasks
slaff pushed a commit to slaff/Sming that referenced this pull request Feb 6, 2020
New features
------------------
- No-WiFi build option SmingHub#2004 - get more resources if your application is not using WIFI.
- Multiple SSL adapters based on axTLS and BearSSL.  SmingHub#1999
- Added basic Crypto support library SmingHub#2014
- Updates framework to build using GCC 9.2.0 toolchain for C++17. SmingHub#1825
- Modbus master SmingHub#1992
- Implemented Small String Optimisation (SSO). SmingHub#1951
- Webcam stream and sample webcam web server. SmingHub#1981
- Allow HTTP connections to ignore rejected body content SmingHub#1928

Improvements
-------------------
- Some improvements to multipart parser SmingHub#2007
- Update ArduinoJson to 6.13.0 SmingHub#1979
- Added precaching from Arduino for ESP8266. SmingHub#1965
- Add support for 'Expect: 100-continue' in HTTP server. SmingHub#1931
- Upgrade to FlashString Library  SmingHub#1974, SmingHub#2013

Bug fixes
-------------
- Updated mqtt-codec to allow publish messages without payload. SmingHub#1976
- HttpConnection freed twice. SmingHub#1938
- Hangs at startup when custom heap enabled. SmingHub#1996
- Fix issues reported by valgrind SmingHub#2017

Breaking changes and Migration
-------------------------------------------
- See our [dedicated page](https://sming.readthedocs.io/en/latest/upgrading/4.0-4.1.html) for migration from 4.0.0 to 4.1.0.

All PRs scheduled for this release can be seen from [here](https://github.com/SmingHub/Sming/milestone/23)
slaff pushed a commit to slaff/Sming that referenced this pull request Feb 6, 2020
New features
------------------
- No-WiFi build option SmingHub#2004 - get more resources if your application is not using WIFI.
- Multiple SSL adapters based on axTLS and BearSSL.  SmingHub#1999
- Added basic Crypto support library SmingHub#2014
- Updates framework to build using GCC 9.2.0 toolchain for C++17. SmingHub#1825
- Modbus master SmingHub#1992
- Implemented Small String Optimisation (SSO). SmingHub#1951
- Webcam stream and sample webcam web server. SmingHub#1981
- Allow HTTP connections to ignore rejected body content SmingHub#1928

Improvements
-------------------
- Some improvements to multipart parser SmingHub#2007
- Update ArduinoJson to 6.13.0 SmingHub#1979
- Added precaching from Arduino for ESP8266. SmingHub#1965
- Add support for 'Expect: 100-continue' in HTTP server. SmingHub#1931
- Upgrade to FlashString Library  SmingHub#1974, SmingHub#2013

Bug fixes
-------------
- Updated mqtt-codec to allow publish messages without payload. SmingHub#1976
- HttpConnection freed twice. SmingHub#1938
- Hangs at startup when custom heap enabled. SmingHub#1996
- Fix issues reported by valgrind SmingHub#2017

Breaking changes and Migration
-------------------------------------------
- See our [dedicated page](https://sming.readthedocs.io/en/latest/upgrading/4.0-4.1.html) for migration from 4.0.0 to 4.1.0.

All PRs scheduled for this release can be seen from [here](https://github.com/SmingHub/Sming/milestone/23)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants