Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tags.scm queries #18

Merged
merged 4 commits into from
Jan 8, 2022
Merged

add tags.scm queries #18

merged 4 commits into from
Jan 8, 2022

Conversation

the-mikedavis
Copy link
Member

@the-mikedavis the-mikedavis commented Dec 17, 2021

tags.scm powers GitHub code navigation for languages that use tree-sitters.

For example with ruby...

You can see class/method declarations:

ruby

And see the references (where those classes/methods end up being used):

ruby


These queries ended up taking inspiration from other tags.scm in the @tree-sitter org repos, mostly python and ruby.

I don't think there's a framework in place for testing these queries, but I think it wouldn't be too much trouble to write a small shell script that diffs the output of tree-sitter tags fixture.ex against a known solution for some fixtures. There is the ability to test queries like so: tree-sitter query --test query/tags.scm test/query/fixture.ex (added in tree-sitter#775) but the precedence rules seem to be backward between querying and determining tags (e.g. module/function definitions end up as @reference.calls in the query output but are correctly identified by tree-sitter tags). I can look at making a PR to tree-sitter that adds a flag for testing tree-sitter tags or maybe somehow reverses precedence in tree-sitter query. We should be able to use tree-sitter#1547 to test these queries if/when merged.

A sample output of tree-sitter tags path/to/mint/mix.exs with this Mint mix.exs:

Mint.MixProject	| module  	def (0, 10) - (0, 25) `defmodule Mint.MixProject do`
Mix.Project	| call    	ref (1, 6) - (1, 17) `use Mix.Project`
project   	| function	def (6, 6) - (6, 13) `def project do`
Mix       	| call    	ref (11, 23) - (11, 26) `start_permanent: Mix.env() == :prod,`
env       	| call    	ref (11, 27) - (11, 30) `start_permanent: Mix.env() == :prod,`
elixirc_paths	| call    	ref (12, 21) - (12, 34) `elixirc_paths: elixirc_paths(Mix.env()),`
Mix       	| call    	ref (12, 35) - (12, 38) `elixirc_paths: elixirc_paths(Mix.env()),`
env       	| call    	ref (12, 39) - (12, 42) `elixirc_paths: elixirc_paths(Mix.env()),`
deps      	| call    	ref (13, 12) - (13, 16) `deps: deps(),`
package   	| call    	ref (29, 15) - (29, 22) `package: package(),`
application	| function	def (46, 6) - (46, 17) `def application do`
Mint.Application	| call    	ref (49, 12) - (49, 28) `mod: {Mint.Application, []}`
package   	| function	def (53, 7) - (53, 14) `defp package do`
elixirc_paths	| function	def (60, 7) - (60, 20) `defp elixirc_paths(:test), do: ["lib", "test/support"]`
elixirc_paths	| function	def (61, 7) - (61, 20) `defp elixirc_paths(_env), do: ["lib"]`
deps      	| function	def (64, 7) - (64, 11) `defp deps do`

So the queries tag function/module definitions and usages of functions/modules. It's notable that modules are tagged as @reference.call when not being defmoduled: this lines up with the behavior of the ruby queries and allows you to jump to definitions of modules, it's not saying that those modules are function invocations. These queries aren't smart enough to figure out some language semantics like resolving aliased modules. For "precise navigation" which respects language semantics, we'd need to write some stack-graph construction queries (see this blog post), but stack-graphs are very new and it may be wise to wait until they're matured. Also the python queries aren't open-sourced afaict so I haven't really figured out the tricks yet 😅. Once I figure all that out, I'll try sending in some stack-graphs queries :)

@patrickt is this something I can trouble you with setting up if/once it's merged in (and maybe lend some review eyes)? I appreciate all your help with the integration stuff :)

@the-mikedavis
Copy link
Member Author

the-mikedavis commented Dec 17, 2021

oh whoops I didn't see #17, I'll try these queries against that branch and see if it changes anything

edit: looks good, these queries are producing the same results on both branches

@josevalim
Copy link
Member

This is awesome @the-mikedavis! Quick question: do arities play a role in those queries? Should they?

If they do, then we need to consider foo |> bar() is calling bar() with one argument. If they don't, then we are good!

@the-mikedavis
Copy link
Member Author

the-mikedavis commented Dec 17, 2021

I believe arities are not considered, I think that wouldn't play well with some languages like javascript where arities can be flexible. It seems to me like it's very similar to ctags: the tokens on the left-hand side of the table (albeit parsed out by tree-sitter instead of ctags) become the search candidates.

I haven't dug far enough into stack-graphs and precise navigation to be sure, but that's probably something we'll have to keep in mind for writing those stack-graph queries 👍

@josevalim
Copy link
Member

@the-mikedavis perhaps one option is to include the arity in the token name:foo/3. But I am not sure if this is a good or bad idea.

@the-mikedavis
Copy link
Member Author

Ah I like that idea. Afaik you don't have any control over modifying that token with how tree-sitter tags curently works, but that would make navigation a breeze if it did work, especially in big projects where you have a bunch of functions with the same names but different arities

@jonatanklosko
Copy link
Member

jonatanklosko commented Dec 17, 2021

Hey @the-mikedavis, fantastic!

the precedence rules seem to be backward between querying and determining tags

This is definitely a tree-sitter bug, it's easy to verify using the highlight queries.

# Running highlighting test passes
npx tree-sitter test --filter test/highlight

# Running the query tests with the corresponding queries doesn't work
npx tree-sitter query --test queries/highlights.scm test/highlight/**/*

cc @patrickt @maxbrunsfeld


Do you know how github/semantic fits into the picture? It seems necessary to add Elixir bindings there, but having another look I see that tag matching does the same as the corresponding tag query, so I'm wondering how these two relate.

queries/tags.scm Outdated
right: (identifier) @name) @reference.call

; * modules
(alias) @name @reference.call
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intentionally use @reference.call for all references or could we use @reference.module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the Python queries have neither of those for classes/modules and yet it works 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Yep I tried out @reference.module initially but I thought it was not working. I just gave it another try and the above output changes to:

Mint.MixProject	| module  	def (0, 10) - (0, 25) `defmodule Mint.MixProject do`
Mix.Project	| module  	ref (1, 6) - (1, 17) `use Mix.Project`
project   	| function	def (6, 6) - (6, 13) `def project do`
Mix       	| module  	ref (11, 23) - (11, 26) `start_permanent: Mix.env() == :prod,`
env       	| call    	ref (11, 27) - (11, 30) `start_permanent: Mix.env() == :prod,`
elixirc_paths	| call    	ref (12, 21) - (12, 34) `elixirc_paths: elixirc_paths(Mix.env()),`
Mix       	| module  	ref (12, 35) - (12, 38) `elixirc_paths: elixirc_paths(Mix.env()),`
env       	| call    	ref (12, 39) - (12, 42) `elixirc_paths: elixirc_paths(Mix.env()),`
deps      	| call    	ref (13, 12) - (13, 16) `deps: deps(),`
package   	| call    	ref (29, 15) - (29, 22) `package: package(),`
application	| function	def (46, 6) - (46, 17) `def application do`
Mint.Application	| module  	ref (49, 12) - (49, 28) `mod: {Mint.Application, []}`
package   	| function	def (53, 7) - (53, 14) `defp package do`
elixirc_paths	| function	def (60, 7) - (60, 20) `defp elixirc_paths(:test), do: ["lib", "test/support"]`
elixirc_paths	| function	def (61, 7) - (61, 20) `defp elixirc_paths(_env), do: ["lib"]`
deps      	| function	def (64, 7) - (64, 11) `defp deps do`

which looks happy.

I see in some other queries now that there is a little variance like in these c-sharp tags.scm: https://github.com/tree-sitter/tree-sitter-c-sharp/blob/3104df21065af0f3d51e05a96cd0e2ff16a6f982/queries/tags.scm, which also makes me think that what is currently being tagged as @definition.implementation should actually be @reference.interface for code like

defimpl Jason.Encoder, for: List do

On the other hand, the ruby query has this line:

  [(identifier) (constant)] @name @reference.call

which tags modules/classes as @reference.calls though and I've seen module references work in ruby code (for example Appsignal), so I'm not sure what the right move is. Maybe all definitions/references are pooled together when performing the search? Or maybe it just helps with ranking?

I think I'll update this line and the @reference.interface as well 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, (constant) is the inner node for both class and module, I missed initially!

If the Jason.Encoder alias appears anywhere else we won't know if it's a module or a protocol. I think we should just use module for protocols as well (which is actually true), unless I'm missing some benefits of tagging them separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah I think that makes sense, I could see things being ambiguous if you do a @derive {Jason.Encoder, ..} and search doesn't match up that @reference.module with the @definition.interface

I'll cut out all that interface stuff in preference of modules 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, definition.function is not consistent with reference.call (as opposed to reference.function), but we're probably missing some information here, so let's wait ^^

@the-mikedavis
Copy link
Member Author

the-mikedavis commented Dec 17, 2021

Based on this documentation, I think semantic is not in the navigation picture, it seems to all be driven by tree-sitter and tags.scm, but we'll need a GitHubber to weigh in to be sure

@jonatanklosko
Copy link
Member

Oh wow, maybe the "Used for code navigation on github.com" in github/semantic is no longer accurate then. I believe the GitHub docs were updated since I last checked too :D

Let's wait for some insights then :)

@patrickt
Copy link
Contributor

patrickt commented Jan 3, 2022

@the-mikedavis @jonatanklosko Correct: semantic is no longer part of the tags pipeline (we switched to tree-sitter native tags support so as to avoid the overhead of pulling in a whole program analysis framework to do a simple analysis).

I’ll take this patch for a spin locally with our tagging backend. I can’t make guarantees about if or when it’ll land—we have to figure out, product/documentation-wise, how to pull in this functionality without making it part of our core supported languages. But I’m super excited about this work; thank you all!

@patrickt
Copy link
Contributor

patrickt commented Jan 7, 2022

I can confirm that this patch works for our internal code-nav systems. Again, I can’t make promises as to when it will land—but this patch should be merged, in my opinion!

@jonatanklosko
Copy link
Member

@patrickt fantastic news, thanks a lot! Fingers crossed for the integration to happen :D

@the-mikedavis awesome job! :shipit:

@patrickt
Copy link
Contributor

Elixir code navigation is now active on GitHub!

@jonatanklosko
Copy link
Member

@patrickt fantastic, thanks again for all the integration work!

@the-mikedavis beautiful job with the tags 🐱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants