Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: captures unsafe casting from TSQueryMatch #24

Merged
merged 3 commits into from
Nov 24, 2024

Conversation

topi314
Copy link
Contributor

@topi314 topi314 commented Nov 21, 2024

I ran into the issue that captures didn't work correctly and apparently this code was breaking unsafe rules. I fixed it with some help of a friend.

I also changed QueryCapture.Index to uint which does not make the struct C-compatible anymore but nicer to use since you don't need to do that manually anymore

@amaanq
Copy link
Member

amaanq commented Nov 21, 2024

can you actually explain the captures "not working correctly" and what "unsafe rules" it broke?

I also would rather not change the layout of C-compatible structs, they're intentionally designed like that.

@topi314
Copy link
Contributor Author

topi314 commented Nov 21, 2024

can you actually explain the captures "not working correctly" and what "unsafe rules" it broke?

I can't really explain why this first doesn't work other than showing you an actual example:
this commit uses go-tree-sitter master
image

this commit uses this pr
image

They both print different things for some reason.

from https://pkg.go.dev/cmd/cgo#hdr-C_references_to_Go

Not all Go types can be mapped to C types in a useful way. Go struct types are not supported; use a C struct type. Go array types are not supported; use a C pointer.

I also would rather not change the layout of C-compatible structs, they're intentionally designed like that.

I can keep that 👍🏻

@zephyrtronium
Copy link
Contributor

The unsafe rules are enumerated under https://pkg.go.dev/[email protected]#Pointer. In particular, since [bigConstant]T is larger than T, converting a pointer to a slice by going through *[bigConstant]T violates rule 1:

(1) Conversion of a *T1 to Pointer to *T2.

Provided that T2 is no larger than T1 and that the two share an equivalent memory layout, this conversion allows reinterpreting data of one type as data of another type.

The fact that it violates that rule is my best guess for why adding an explicit loop to build the slice of Go types fixes the problem. The Go compiler optimizes rather aggressively around uses of package unsafe, which means that even seemingly innocent rule violations can have surprising consequences.

An unsafe conversion from []C.struct_TSQueryCapture to []QueryCapture instead of the explicit copying loop should also be correct if the slice construction uses unsafe.Slice.

@amaanq
Copy link
Member

amaanq commented Nov 23, 2024

Okay, I did some debugging of your highlighting code and the problem is you're trying to "copy" over each capture when iterating them. This is my bad on not clarifying this in the docs of the code, but QueryCaptures is stateful, and each *QueryMatch is overwritten on each successive call to Next. So what happens is, you have an array of say, 5 "queryCapture"s in your code, but they all point to the same memory location, that being the last match, hence why only the string is highlighted.

This was a problem upstream too, because in Rust we implemented Iterator, which doesn't have a concept of iterators that are stateful, so if anyone called collect without mapping/processing the data beforehand, they'd get the same issue. We solved this by using StreamingIterator, but that comes with its own drawbacks of introducing a 3rd party crate + losing out on a lot of iterator methods, and so the upstream highlight code is a bit of a mess because a) GitHub gets cranky when you mess w/ that code, so I cast it to a type that does implement Iterator and b) the code is really bad to begin with, and is in deep need of a rework.

That being said, the current code without this PR does work, provided that you understand that you have to copy/take out what you want during iteration of a QueryMatches or QueryCaptures, and not store the captures thinking the data is valid afterwards. The reason this fixes your problem is because you are now copying every element when you are iterating over the C struct, instead of using data from the C pointer, which is going to be more expensive 🙂. I would totally accept a PR that adds this wording to the code docs, but the fix in this PR is a red herring.

To also add, the go "unsafe" rule is not being violated, the large size is just a trick to say to the go compiler "an array at least this big, but then we set the length and capacity to the exact same size". A thread on it can be found here https://groups.google.com/g/golang-nuts/c/sV_f0VkjZTA

@zephyrtronium
Copy link
Contributor

Memory reuse is certainly a cleaner explanation than unsafe magic. However, it is still a fact that converting to a slice through a large array type violates unsafe rules. The thread you linked is over ten years old; Go has changed since then. This kind of pattern violating unsafe rules was a major motivation for adding unsafe.Slice in the first place. See golang/go#19367, especially around mentions of -d=checkptr (which is notably enabled implicitly when building with -race).

@amaanq
Copy link
Member

amaanq commented Nov 23, 2024

Oh nice, thanks for that thread - but for some reason I couldn't get any warning/error to come up when running go test -race or go test -gcflags=all=-d=checkptr, maybe I'm missing something.

Is it possible we can use unsafe.Slice and then cast directly to the Go type to reuse the memory? It'd be better if we avoided copying, but if not then that's fine

@zephyrtronium
Copy link
Contributor

There's a CL linked from that issue which disabled flagging specifically the cast-through-array pattern on -d=checkptr, but the commit message says it "should be revisited" after unsafe.Slice would be added (and also mentions that the pattern is a violation of unsafe rule 1). I wasn't sure whether that special case had ever been removed, but it must not have been if the code works under -race.

Assuming the C and Go types remain layout-compatible (which in particular should imply structs.HostLayout), the correct-est pattern to convert from the former to the latter without copying is to use unsafe.Slice to create the slice of []C.struct_TSQueryCapture (or whatever is the correct type name, I don't know where to find the C in question) and then convert that slice through unsafe.Pointer to []QueryCapture:

cc := unsafe.Slice(m.captures, m.capture_count)
captures = *(*[]QueryCapture)(unsafe.Pointer(&cc))

@topi314 topi314 changed the title fix: captures unsafe casting from TSQueryMatch refactor: captures unsafe casting from TSQueryMatch Nov 23, 2024
@topi314
Copy link
Contributor Author

topi314 commented Nov 23, 2024

I went ahead & applied the suggested code from @zephyrtronium & added the suggested documentation.

Thank your for patience with this issue and debugging it.

@topi314
Copy link
Contributor Author

topi314 commented Nov 24, 2024

maybe not exactly pretty, but this then resolves the issue in my code, thanks again for your time

gopad-dev/go-tree-sitter-highlight@aa913ce

captures := make([]_queryCapture, 0)
queryCaptures := cursor.Captures(config.Query, tree.RootNode(), source)
for {
	match, i := queryCaptures.Next()
	if match == nil {
		break
	}

+	match.Captures = slices.Clone(match.Captures)
	captures = append(captures, _queryCapture{
		Match: *match,
		Index: i,
	})
}

query.go Outdated Show resolved Hide resolved
query.go Outdated Show resolved Hide resolved
query.go Outdated Show resolved Hide resolved
@amaanq
Copy link
Member

amaanq commented Nov 24, 2024

Thank your for patience with this issue and debugging it.

No worries, thank you to you both 🙂 I'm no Go expert so it's nice to receive improvements like this. Just a few minor comments and then we can get this in

@amaanq amaanq merged commit 5e5682f into tree-sitter:master Nov 24, 2024
3 checks passed
@amaanq
Copy link
Member

amaanq commented Nov 24, 2024

thank you both!

@topi314
Copy link
Contributor Author

topi314 commented Nov 24, 2024

Thank you too, love the work in this

@topi314 topi314 deleted the fix/uint branch November 24, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants