Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Position is counted incorrectly when reading a file with mixed EOL #4

Closed
jfthuong opened this issue Oct 20, 2023 · 4 comments
Closed
Labels
wontfix This will not be worked on

Comments

@jfthuong
Copy link

jfthuong commented Oct 20, 2023

When a file contains mixed EOL (i.e. LF and CR.LF) with VS Code , the offset is not computed correctly.

In the file below mixing first LF and then CR + LF, in the first part you loose one offset per line with the extension

image

module2.txt

The correct position for offset 88 should be (like in Notepad++):
image

If you go to offset 88, you get this in VS Code you are 5 characters before (corresponding to 5 lines before):
image

Note: tried with VS Code 1.83.0 on Windows 11

@jfthuong jfthuong changed the title Position is counted incorrectly in Windows when reading a file with Unix EOL Position is counted incorrectly in Windows when reading a file with mixed EOL Oct 20, 2023
@jfthuong jfthuong changed the title Position is counted incorrectly in Windows when reading a file with mixed EOL Position is counted incorrectly when reading a file with mixed EOL Oct 20, 2023
@relative
Copy link

vscode automatically normalizes line endings when you open a file
microsoft/vscode#127

@soulshined
Copy link
Collaborator

@jfthuong

Thank you very much for the detailed report. This extension does absolutely nothing custom in regards of reading/parsing a file to secure offsets. It exclusively uses vscodes built-in api's. This is the entirety of the logic this extension requires in that respect:

https://github.com/soulshined/vscode-jump-to-character-position/blob/852c5094f57ca3339f9c2324d538329867ac4b35/src/extension.ts#L55-L59

The API method here:
https://code.visualstudio.com/api/references/vscode-api#TextDocument.positionAt

So I assume this is assuming you know what the index is with eol taken in consideration. So if anything, this should be passed to the vscode team to resolve, unless it is resolved in a newer api version.

For now, until I can sit down and think about what options I have to accommodate, you can go to your command palette and type:

>Change End of Line Sequence

And then select LF as your option

@soulshined
Copy link
Collaborator

@jfthuong

I have done some light review locally and here is my analysis so far:

  • I have updated the extension to use the latest api/vscode engine and it still produces the same result you defined
  • I have tested against a cleanly installed vscode version, no extensions or outside influence, and it still produces the same result you defined

So this clearly points to vscode's api as the culprit. That method I linked above should be passed to the vscode team to rectify this shortcoming. Even if this becomes a vscode out-of-box feature, I'm guessing you're going to run into the same problem with the 'built-in' version of this if it's not addressed with them.

As mentioned above, because this extension doesn't do anything extravagant with parsing files and relies exclusively on the vscode api this leaves me with few options:

  1. I can account for these scenarios where there are multiple line feed variants.
    a. I don't think it's a good idea to normalize or do anything for the user because many reasons but I do think at least maybe notifying or adding some kind of popup that disclaims multiple line feeds are present so users aren't confused why the offset is misleading
  2. I was thinking maybe adding some editor decorations that highlight 'alternative' positions when files like this are identified
  3. I can go full on custom mode and parse files manually and act accordingly. Please note this is not something I want to do, but if it must happen it must happen. This is the last resort option for me. As you can see this extension doesn't get updated too often so I would really like to keep that cadence, introducing custom parsing/reading logic add's a ton of complexities and testing and edge cases to consider. I really don't want to make this a full time hobby if that makes sense

My vote is for #1 at this time. I think that would require some user education, but less education than it requires for #2 and surely #3

Do you have any considerations or suggestions based on that feedback before I commit to something?

I would say, if #3 is your option I would at least prefer to create another issue on the vscode project explaining the situation and see what they say before going that route.

In the meantime, I will go ahead and create that bug on their project and I'll also add some remarks to your feature request about issues I've ran into they should consider

@soulshined
Copy link
Collaborator

@jfthuong

I've had time to sit down and do a thorough review.

While my initial assessment still stands, I just wanted to check and see if this possible to do from an extension perspective. Yes and no. I'll elaborate

vscodes api editor.document.getText() ref method does only pass you normalized text as mentioned by @relative

For my testing I used the following regexps:

1. /(?<!\r)\n/g 
2. /\r\n/g
3. /\f/g
4. /\v/g

Using vscodes built-in api methods:

::::::::NL:::::::

::::::::CR NL:::::::

Found CR+NL start=27 end=27.
Found CR+NL start=51 end=51.
Found CR+NL start=54 end=54.
Found CR+NL start=65 end=65.
Found CR+NL start=83 end=83.
Found CR+NL start=100 end=100.
Found CR+NL start=102 end=102.
Found CR+NL start=126 end=126.
Found CR+NL start=129 end=129.
Found CR+NL start=131 end=131.
Found CR+NL start=140 end=140.
Found CR+NL start=143 end=143.
Found CR+NL start=161 end=161.
Found CR+NL start=180 end=180.
Found CR+NL start=200 end=200.
Found CR+NL start=227 end=227.
Found CR+NL start=250 end=250.
Found CR+NL start=278 end=278.
Found CR+NL start=308 end=308.
Found CR+NL start=335 end=335.
Found CR+NL start=366 end=366.
Found CR+NL start=398 end=398.
Found CR+NL start=427 end=427.
Found CR+NL start=458 end=458.
Found CR+NL start=493 end=493.
Found CR+NL start=531 end=531.
Found CR+NL start=566 end=566.
Found CR+NL start=607 end=607.
Found CR+NL start=649 end=649.
Found CR+NL start=690 end=690.
Found CR+NL start=728 end=728.
Found CR+NL start=792 end=792.
Found CR+NL start=851 end=851.
Found CR+NL start=897 end=897.
Found CR+NL start=941 end=941.
Found CR+NL start=984 end=984.
Found CR+NL start=1016 end=1016.
Found CR+NL start=1055 end=1055.
Found CR+NL start=1101 end=1101.
Found CR+NL start=1147 end=1147.
Found CR+NL start=1178 end=1178.
Found CR+NL start=1207 end=1207.
Found CR+NL start=1234 end=1234.
Found CR+NL start=1260 end=1260.
Found CR+NL start=1302 end=1302.
Found CR+NL start=1344 end=1344.
Found CR+NL start=1385 end=1385.
Found CR+NL start=1423 end=1423.
Found CR+NL start=1487 end=1487.
Found CR+NL start=1546 end=1546.
Found CR+NL start=1592 end=1592.
Found CR+NL start=1636 end=1636.
Found CR+NL start=1679 end=1679.
Found CR+NL start=1711 end=1711.
Found CR+NL start=1750 end=1750.
Found CR+NL start=1796 end=1796.
Found CR+NL start=1842 end=1842.
Found CR+NL start=1873 end=1873.
Found CR+NL start=1902 end=1902.
Found CR+NL start=1929 end=1929.
Found CR+NL start=1955 end=1955.
Found CR+NL start=1991 end=1991.
Found CR+NL start=2033 end=2033.
Found CR+NL start=2072 end=2072.
Found CR+NL start=2114 end=2114.
Found CR+NL start=2159 end=2159.
Found CR+NL start=2201 end=2201.
Found CR+NL start=2251 end=2251.
Found CR+NL start=2299 end=2299.
Found CR+NL start=2346 end=2346.
Found CR+NL start=2381 end=2381.
Found CR+NL start=2414 end=2414.
Found CR+NL start=2445 end=2445.
Found CR+NL start=2475 end=2475.
Found CR+NL start=2517 end=2517.
Found CR+NL start=2562 end=2562.
Found CR+NL start=2604 end=2604.
Found CR+NL start=2654 end=2654.
Found CR+NL start=2702 end=2702.
Found CR+NL start=2749 end=2749.
Found CR+NL start=2784 end=2784.
Found CR+NL start=2817 end=2817.
Found CR+NL start=2848 end=2848.
Found CR+NL start=2877 end=2877.
Found CR+NL start=2904 end=2904.
Found CR+NL start=2929 end=2929.
Found CR+NL start=2952 end=2952.
Found CR+NL start=2973 end=2973.
Found CR+NL start=2992 end=2992.
Found CR+NL start=3009 end=3009.
Found CR+NL start=3024 end=3024.
Found CR+NL start=3037 end=3037.
Found CR+NL start=3049 end=3049.
Found CR+NL start=3074 end=3074.
Found CR+NL start=3096 end=3096.
Found CR+NL start=3120 end=3120.
Found CR+NL start=3133 end=3133.
Found CR+NL start=3144 end=3144.
Found CR+NL start=3153 end=3153.
Found CR+NL start=3161 end=3161.
Found CR+NL start=3182 end=3182.
Found CR+NL start=3208 end=3208.
Found CR+NL start=3215 end=3215.
Found CR+NL start=3220 end=3220.
::::::::LF:::::::

::::::::V:::::::

As demonstrated, vscodes api method calculates a position based on the normalized line endings.

You can however read files by way of nodes fs module

Using NodeJs fs module

::::::::NL:::::::

Found NL start=27 end=27.
Found NL start=50 end=50.
Found NL start=52 end=52.
Found NL start=62 end=62.
Found NL start=79 end=79.
Found NL start=95 end=95.
Found NL start=100 end=100.
Found NL start=123 end=123.
Found NL start=125 end=125.
Found NL start=126 end=126.
Found NL start=134 end=134.

::::::::CR NL:::::::

Found CR+NL start=136 end=136.
Found CR+NL start=154 end=154.
Found CR+NL start=173 end=173.
Found CR+NL start=193 end=193.
Found CR+NL start=220 end=220.
Found CR+NL start=243 end=243.
Found CR+NL start=271 end=271.
Found CR+NL start=301 end=301.
Found CR+NL start=328 end=328.
Found CR+NL start=359 end=359.
Found CR+NL start=391 end=391.
Found CR+NL start=420 end=420.
Found CR+NL start=451 end=451.
Found CR+NL start=486 end=486.
Found CR+NL start=524 end=524.
Found CR+NL start=559 end=559.
Found CR+NL start=600 end=600.
Found CR+NL start=642 end=642.
Found CR+NL start=683 end=683.
Found CR+NL start=721 end=721.
Found CR+NL start=785 end=785.
Found CR+NL start=844 end=844.
Found CR+NL start=890 end=890.
Found CR+NL start=934 end=934.
Found CR+NL start=977 end=977.
Found CR+NL start=1009 end=1009.
Found CR+NL start=1048 end=1048.
Found CR+NL start=1094 end=1094.
Found CR+NL start=1140 end=1140.
Found CR+NL start=1171 end=1171.
Found CR+NL start=1200 end=1200.
Found CR+NL start=1227 end=1227.
Found CR+NL start=1253 end=1253.
Found CR+NL start=1295 end=1295.
Found CR+NL start=1337 end=1337.
Found CR+NL start=1378 end=1378.
Found CR+NL start=1416 end=1416.
Found CR+NL start=1480 end=1480.
Found CR+NL start=1539 end=1539.
Found CR+NL start=1585 end=1585.
Found CR+NL start=1629 end=1629.
Found CR+NL start=1672 end=1672.
Found CR+NL start=1704 end=1704.
Found CR+NL start=1743 end=1743.
Found CR+NL start=1789 end=1789.
Found CR+NL start=1835 end=1835.
Found CR+NL start=1866 end=1866.
Found CR+NL start=1895 end=1895.
Found CR+NL start=1922 end=1922.
Found CR+NL start=1948 end=1948.
Found CR+NL start=1984 end=1984.
Found CR+NL start=2026 end=2026.
Found CR+NL start=2065 end=2065.
Found CR+NL start=2107 end=2107.
Found CR+NL start=2152 end=2152.
Found CR+NL start=2194 end=2194.
Found CR+NL start=2244 end=2244.
Found CR+NL start=2292 end=2292.
Found CR+NL start=2339 end=2339.
Found CR+NL start=2374 end=2374.
Found CR+NL start=2407 end=2407.
Found CR+NL start=2438 end=2438.
Found CR+NL start=2468 end=2468.
Found CR+NL start=2510 end=2510.
Found CR+NL start=2555 end=2555.
Found CR+NL start=2597 end=2597.
Found CR+NL start=2647 end=2647.
Found CR+NL start=2695 end=2695.
Found CR+NL start=2742 end=2742.
Found CR+NL start=2777 end=2777.
Found CR+NL start=2810 end=2810.
Found CR+NL start=2841 end=2841.
Found CR+NL start=2870 end=2870.
Found CR+NL start=2897 end=2897.
Found CR+NL start=2922 end=2922.
Found CR+NL start=2945 end=2945.
Found CR+NL start=2966 end=2966.
Found CR+NL start=2985 end=2985.
Found CR+NL start=3002 end=3002.
Found CR+NL start=3017 end=3017.
Found CR+NL start=3030 end=3030.
Found CR+NL start=3042 end=3042.
Found CR+NL start=3067 end=3067.
Found CR+NL start=3089 end=3089.
Found CR+NL start=3113 end=3113.
Found CR+NL start=3126 end=3126.
Found CR+NL start=3137 end=3137.
Found CR+NL start=3146 end=3146.
Found CR+NL start=3154 end=3154.
Found CR+NL start=3175 end=3175.
Found CR+NL start=3201 end=3201.
Found CR+NL start=3208 end=3208.
Found CR+NL start=3213 end=3213.
::::::::LF:::::::

::::::::V:::::::

As you can see the \n exclusive line endings are accounted for

But are they?

Save the file in vscode and poof they are gone again, even using nodes fs module because it was saved with vscode which does normalize the line endings as noted above.

So sadly, even if we were to go the route of committing to a purely custom solution there will never be a guaranteed outcome for this scenario due to many factors:

  • Where you got your position may be different than someone else's position for the same scenario
  • Is the file saved yet or not?
  • Is a custom solution guaranteed to work the same way for browser vscodes? (this extension is compatible with githubs .dev vscode env for example)
  • Saving files in vscode is both implicit and explicit so I feel like adding any custom logic to account for this is just a shot in the dark and going to lead to more head scratches from users (and most likely negative reviews)

I don't even think this will be something vscode can implement in-house unless they change their strategy on what 'normalized' means or how that affects positioning

I'm still going to file an issue on vscodes board for visibility but legitimately, once again, this is out of my hands much like the 'not working in large files' issue. I have no choice but to close this. It's very frustrating I keep getting the blame but what ya gonna do 🤷🏻‍♀️ I'll link that ticket to this one tomorrow when I create it. But at minimum thanks for at least bringing this to my attention, I haven't run into this yet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants