Disable P8SCII unescaping to fix mangling of emoji characters #106
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR
This PR addresses an issue where any emoji symbols in the input lua script would be replaced by a garbled sequence of characters. The proposed solution is to remove picotool's current handling of P8SCII escape sequences which does not seem to function as intended.
The Details
I encountered an issue where any use of the🅾️ emoji in my lua script would be replaced by "ユか✽ゆヤま◆" after building a .p8 cart with picotool. The cause of this issue seems to stem from P8SCII being treated as an encoding in itself. In practice, this treatment boils down to two steps:
lua.p8scii_to_unicode
, which seems meant to convert all P8SCII characters in the passed string to their utf-8 counterparts. The formatter assumes, at this point, that the lua script is P8SCII encoded. As a side note, this substitution routine runs on the entire script and not just on the string tokens that had their escape sequences converted by the lexer in step 1.Both of the above steps have inherent issues:
Future Improvements
print
function and passing them through unchanged. If pre-interpreting these escape sequences is still a desired feature, I'd suggest it be done in one go when parsing or writing the string tokens instead of passing through an intermediate format.