-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search and replace UTF-8 #14
Comments
My immediate answer is that I'm embarrassed that TECO C can do something that TECO-64 can't, so I will certainly look into it. I am not sure why it is behaving this way, but it was not an intentional limitation. I am currently preparing another version for release, and I will endeavor to include a fix for you. Thanks for bringing this to my attention. |
I expect to have a new version available tomorrow, once I finish some other changes. But Unicode characters are now displayed and echoed as I think they should be: teco -n foo If you are curious, it wasn't so much that TECO C was doing anything special, nor that I had broken anything. Rather, I had provided backwards-compatibility for TECO-32's handling of 8-bit characters on VMS, and hadn't realized how that might affect users in other (and more modern) operating environments. (And to be honest, I hadn't anticipated that anyone might use TECO with UTF-8, so I never thought to test with it.) Also, the way this will work is that there is a new bit for the E3 flag, which is enabled by default for non-VMS builds. Which reminds me that I should probably update the documentation for that. |
Thanks for the explanation. I’m pumped up for the next release! I don’t use TECO as my daily editor since there are many non ASCII encoded files there I need to handle. I do enjoy use TECO as a terse script language. Your TECO-64 has definitely made programming more convenient. |
Version 200.36.1 has been released. I will close out this issue once you have confirmed that it has been resolved. Please note that the change I made does not affect anything in display mode, which uses ncurses to handle output, and therefore would require different, and quite possibly much more extensive, modifications. |
Ah, I can confirm the visual display is now working, but the
|
Btw I did the test on OS X, I’m going to try on Linux later today. |
Strange. The FS command worked for me, as in the following macro:
Which prints out:
|
By the way, I had intended for TECO-64 to work on OS X, and had made some work toward porting it when I had access to a MacBook at my last job, but then Covid happened and my company had to downsize, so I don't presently have any way to test in that environment. |
It does work on Linux to me. Could be I didn’t do a clean before rebuild on OS X. I’ll retry on OS X and report back any updates. |
So I did some experiment and find First I patched
Then I run this test file both on Linux and OS X
on Linux it is:
On OSX:
Now, I don't know how to interpret the negative integer in |
I don't have a complete answer for you, but what I can say is that the -31, -76, and -121 are the result of sign-extending an 8-bit value representing the Unicode characters. In unsigned decimal, they would be 225, 180, and 135, respectively. What I'm guessing is that you've tripped across a difference in either processor architecture or compiler options between your Linux and Mac systems, such that a plain char isn't treated identically in both environments when it is negative. I thought that I had specified that char was to be unsigned by default, but I'm obviously misremembering that, or perhaps there was a good reason for it being signed by default that I've forgotten. I think anything in the edit buffer should certainly be positive, as it otherwise creates confusion when trying to debug, as we have both discovered. In any case, I will continue to investigate. |
Okay, I have a test I'd like you to run. Please change line 46 of
This will change a plain Then rebuild on OS X, and let me know if it makes any difference to the result. You may retain the Thanks. |
This shouldn't break any existing commands, so I will probably leave it in regardless. I re-ran my entire test suite, and nothing failed. |
Yes it works now! |
You're welcome. For what it's worth, I noted that although the edit buffer has type |
I have not posted a new release yet because I had some progress in getting Unicode characters to display correctly in display mode, and I thought I'd see how far I could go with that. But since you had a workaround, I didn't think you needed anything else just yet. Feel free to let me know if that's not the case. For what it's worth, I had originally wanted to use Another reason I'm holding off on a new version, though, is to make sure there isn't any issue with the use of the In any event, I expect that I'll upload whatever I have by this weekend. |
Version 200.36.2 has been posted. I have decided not to try to fix the display of UTF-8 character sequences, as it would involve major changes to TECO, which historically always treated bytes and characters as synonymous. I'm sure it could be done, but I just don't see any reason to embark on such as huge effort right now. Thanks again for your assistance with this. |
This issue is now closed. |
I try to replace some UTF-8 characters in a file use
FS
command: (save the text below to a file and callEI
)This works with TECOC since I guess that is 8bit clean. However this failed to work with TECO-64.
Neither using
^Q
to quote would work:I also tried to put
^Q
before every byte ofᴇ
but had no luck. Inputᴇ
using225i180i135i
works so I'm surprised searching UTF-8 would not work.Is this a limitation of TECO-64 or there is a way to work around this?
The text was updated successfully, but these errors were encountered: