Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

joni onigurama library does not fully implement oniguruma syntax #677

Open
sebthom opened this issue Jan 29, 2024 · 10 comments
Open

joni onigurama library does not fully implement oniguruma syntax #677

sebthom opened this issue Jan 29, 2024 · 10 comments

Comments

@sebthom
Copy link
Member

sebthom commented Jan 29, 2024

TextMate uses the oniguruma syntax for regular expressions.

TM4E currently uses jruby/joni as onigurama implementation. Unfortunately joni does not implement the the full onigurama syntax, e.g. variable length look-behind. This means that not all TextMate language files can be parsed with TM4E and loading fails with invalid pattern in look-behind. Since the goal of the joni library is to emulate Ruby's behavior and Ruby does not support variable length look-behinds, joni won't support them anytime soon.

vscode uses the original oniguruma C library from https://github.com/kkos/oniguruma/

It might make sense to investigate if we can do the same, e.g. using kkos/oniguruma via https://github.com/jnr/jnr-ffi/

@akurtakov
Copy link
Contributor

I would strongly recommend going for https://openjdk.org/jeps/442 (Java 21 requirement) as jnr-ffi is not fun with Mac OS signing and the whole Orbit fuzz (https://github.com/eclipse-orbit/orbit-simrel/blob/6926eeed8ffaf2174b0fd42bc3c4e94208bda216/maven-sign/tp/MavenSign.target#L6)

@sebthom
Copy link
Member Author

sebthom commented Jan 29, 2024

Well, I have no experience with JNI/JNA/JNR/FFM so I hope someone else picks this up :-)

@mickaelistria
Copy link
Contributor

FWIW, moving away from Joni to embedding upstream Oniguruma could be a good improvement for the ecosystem.

@sebthom
Copy link
Member Author

sebthom commented Jan 30, 2024

@mickaelistria any idea when we can expect eclipse to be bundled with justj 21 by default? I don't think moving forward with this via FFM is a good idea as long as Eclipse still uses justj 17

@akurtakov
Copy link
Contributor

@sebthom As per https://github.com/eclipse-simrel/.github/blob/main/wiki/SimRel/Simultaneous_Release_Requirements.md#execution-environment-partially-tested it should be fine for after 2024-03 release.

@angelozerr
Copy link
Contributor

angelozerr commented Jan 30, 2024

It should be amazing to consume original oniguruma C but as so many plugins uses tm4e (I m so happy to have spent so many time to create this plugin and so happy that you continue to improve it dragsticly, thanks for that!) we need to be carefull.

IMHO I think tm4e should try to parse the textmate with the original oniguruma C and if it fails it should fallback to use joni like today.

@mickaelistria
Copy link
Contributor

but as so many plugins uses tm4e

Those consumers will have to contribute back is they want to keep some Joni support. But at the moment, it seems like keeping Joni is not in the best interest of anyone. If APIs are broken, this would be worse moving to a new major release.

@angelozerr
Copy link
Contributor

Tm4e is working well with most textmates since several years.

User will not understand why their textmate will crash if oniguramm1 C using have a bug or if some OS doesnt support it correctly.

And users will havr bad feelings with tm4e

@sebthom
Copy link
Member Author

sebthom commented Jan 30, 2024

Btw. IntelliJ has the same issue: https://youtrack.jetbrains.com/issue/IDEA-336274

The main issue atm are variable length look-behind. we could try to dynamically rewrite such unparseable regex to some alternative syntax as suggested here https://stackoverflow.com/a/24591663/5116073 - if that is possible.

Using the C lib directly could also improve overall parsing speed as discussed here jruby/joni#43

@mickaelistria
Copy link
Contributor

Tm4e is working well with most textmates since several years.

And it's likely to work even better if we can use upstream/more used/better maintained/... oniguruma instead of Joni.

User will not understand why their textmate will crash if oniguramm1 C using have a bug or if some OS doesnt support it correctly.

I don't think using upstream oniguruma is more error-prone than using Joni. Usually, the less indirections there are are, and the less room there is for bugs, the higher quality is.

And users will havr bad feelings with tm4e

Moving to upstream is more likely to increase the quality of the stack than to decrease it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants