-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index() function returns wrong offset for non-ascii chars #1430
Comments
There is some documentation about this on the "Pitfalls" page (https://github.com/stedolan/jq/wiki/How-to:-Avoid-Pitfalls) In brief, you can use
This works in jq 1.5 and later. By the way, could you please give more details about the failure of
|
My bad. I haven't even realized there is a wiki. I took all the information from the manual, which didn't mention anything about index being byte wise. I'll give match a go. I still haven't figured out when exactly the Segmentation fault is happening, as I couldn't find the input yet which is producing it. But I went by the assumption it is related to issue 922 until I can proof the opposite. I guess we can close this one, and I'll open a new ticket, in case my segmentation fault issue is not related to 922. |
No, this is a bug. We should fix it. |
Previsouly byte index was used. Fixes jqlang#1430 jqlang#1624 jqlang#3064
I'm trying to strip away some text from part of a text. Trying to use something like
sub("!.*"; "")
doesn't work, as it is giving me a Segmentation fault when text is too long. So I tried to go this route:$ jq '.msg | .[0:index("!")]'
which works fine with input like:
{"msg": "hello world!"}
but fails when text contains wide characters:
{"msg": "здравствуй мир!"}
The text was updated successfully, but these errors were encountered: