Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conditions memoization #600

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

atticus-sullivan
Copy link
Contributor

@atticus-sullivan atticus-sullivan commented Sep 15, 2022

first draft of memoization and condition objects as suggested in #592.

Just a draft PR so we have a place for discussion. If you've another idea of implementing, we cen delete/overwrite this one, no problem ; )

Collection of TODOs:

  • advise the users to put the heavier conditions at the end of the chain
  • more logic operators (currently: and, or, not) -> xnor
  • Prebuilt conditions (end of line, start of line, ...)
  • Prebuilt invalidation predicates
  • pass cursor position (etc.. ?) to the condition as well

@L3MON4D3
Copy link
Owner

This is pretty much what I had in mind as well, very nice 👍

The overhead of using memoization would be interesting, should be measurable by adding a bunch of snippets with the same trigger, but different conditions.

@L3MON4D3
Copy link
Owner

L3MON4D3 commented Sep 15, 2022

Ah, we should consider passing the cursor-position through from expand to check it in invalidate, querying that each time sounds horrible.

@atticus-sullivan
Copy link
Contributor Author

Ah, we should consider passing the cursor-position through from expand to check it in invalidate, querying that each time sounds horrible.

Oh yes, sounds reasonable. Then we'll need some changes in he code after all (and think about if we want to make these args accessible for normal conditions as well (but I've got no real idea how to hide them tbh))

@L3MON4D3
Copy link
Owner

Then we'll need some changes in he code after all

True, but as long as they don't explicitly accomodate the memoization, I'm happy :D

It should be alright to just pass the cursor behind the other stuff, and it could actually come in useful for regular conditions, so there's no reason to hide it

@leiserfg
Copy link
Contributor

leiserfg commented Sep 16, 2022

Also, remember that lua boolean operators don't do short-circuit, so this is basically doing all the calls all the time, I think it will be a good idea to do the short-circuit ourselves and advise the users to put the heavier conditions at the end of the chain.

Forget my comment, it does shortcut, I'm just speaking nonsense.

@L3MON4D3
Copy link
Owner

Are you sure they don't short-circuit?
Did a quick test and :lua =((function() print(1) return true end)() or (function() print(2) return true end)()) and 3 doesn't evaluate the second function.
(Or I'm misunderstanding something, does anything in our construction prevent short-circuiting?)

@leiserfg
Copy link
Contributor

Are you sure they don't short-circuit?

Forget my comment, I'm definitely mistaking languages features.

@atticus-sullivan
Copy link
Contributor Author

atticus-sullivan commented Sep 19, 2022

Just collected some TODOs in the initial PR message (let me know if I'm missing something).

What other logic operators do you think are usefull? (xnor since it is simple equality of course, but what else?) Remember that we need an operator to be overloaded with it (see https://www.lua.org/manual/5.1/manual.html#2.8 for what is possible)

What parameters for invalidation do you think are useful? (I still have to read about things like change ticks as my vim-lib background is pretty weak)

Should we make the changes to the luaSnip core (add additional parameters to condition function) in another PR?

EDIT: And we can already think of some useful conditions besides end_of_line and start_of_line (I personally don't use other ones tbh)

@atticus-sullivan
Copy link
Contributor Author

I'm using == for xnor. What do you think of ^ as the xor operator (inspired by C)?

@atticus-sullivan
Copy link
Contributor Author

All the _ are a bit annoying in the invalidate function, maybe we should discuss what of the possible arguments really are useful for that (on the other side it's always the question if we want to constrain the arguments any make it less flexible)

@atticus-sullivan
Copy link
Contributor Author

And I just though about invalidation conditions like same line. Not quite sure if it's done by tracking the number of the line. In most cases one would have some condition like line_start.
Just checking the line number wouldn't be enough in that case, as the condition relays on the line's content didn't change (checking if the line didn't change would be no problem (string interning -> simple pointer equality check) but storing all the strings would eat quite some memory in larger files).
But maybe it's possible to "ask" vim if the lines content has changed? 🤷 (we just have to keep in mind that the invalidation check has to be really fast as this is the part which will be executed every time without memoization)

@L3MON4D3
Copy link
Owner

What other logic operators do you think are usefull? (xnor since it is simple equality of course, but what else?) Remember that we need an operator to be overloaded with it (see https://www.lua.org/manual/5.1/manual.html#2.8 for what is possible)

Can't think of any others :/
But I I think and,or,xor,xnor,not should be pretty expressive already
(also +1 for ^ as xor)

What parameters for invalidation do you think are useful? (I still have to read about things like change ticks as my vim-lib background is pretty weak)

Should we make the changes to the luaSnip core (add additional parameters to condition function) in another PR?

Mhmmmm would be a tad cleaner, but I won't complain if you tack it on here, it originates here after all

EDIT: And we can already think of some useful conditions besides end_of_line and start_of_line (I personally don't use other ones tbh)

I started looking into treesitter-conditions in treesitter-ideas, basically I want to add

  • Am I currently inside some node? (immediately or all parents)
  • Does my cursor-position match some query?

Those would have to be invalidated on pretty much any change, unfortunately.

@L3MON4D3
Copy link
Owner

L3MON4D3 commented Sep 19, 2022

And I just though about invalidation conditions like same line. Not quite sure if it's done by tracking the number of the line. In most cases one would have some condition like line_start. Just checking the line number wouldn't be enough in that case, as the condition relays on the line's content didn't change (checking if the line didn't change would be no problem (string interning -> simple pointer equality check) but storing all the strings would eat quite some memory in larger files). But maybe it's possible to "ask" vim if the lines content has changed? shrug (we just have to keep in mind that the invalidation check has to be really fast as this is the part which will be executed every time without memoization)

line_start would also depend on the cursor-position inside the line :/
I can't really think of uses for same_line, but having the invalidation as a callback sounds very flexible (though we can only optimize predefined invalidations, so there might be a reason for accepting both)

In general, having nvim push the invalidations to us (via nvim_buf_attach) seems very wasteful, there will be many more changes than snippet-expansions, so we should definitely query changedtick (basically pull- instead of push-model).

We should really start measuring the impact of condition-evaluations before focusing this much on the memoization-options, it will most likely only be worth it if multiple snippets have the same condition and trigger (because only then will a condition be evaluated twice for a single expand-call)

For example, via

diff --git a/lua/luasnip/init.lua b/lua/luasnip/init.lua
index a79ddc5..390ba47 100644
--- a/lua/luasnip/init.lua
+++ b/lua/luasnip/init.lua
@@ -27,11 +27,15 @@ end
 -- parameters(trigger and captures). params are returned here because there's
 -- no need to recalculate them.
 local function match_snippet(line, type)
-	return snippet_collection.match_snippet(
+	require("jit.p").start("10,i1,s,m0,G", "match")
+	local match, expand_params = snippet_collection.match_snippet(
 		line,
 		util.get_snippet_filetypes(),
 		type
 	)
+	require("jit.p").stop()
+
+	return match, expand_params
 end
 
 -- ft:

(I get no samples at all, but I also don't have snippets with conditions)
(It would be very useful to provide an easy-to-use way to profile the match_snippet-sections exclusively, then we could try collecting some data from users. Also important for them to decide which functions to memoize, and measure performance-impact of those decisions)

@L3MON4D3
Copy link
Owner

All the _ are a bit annoying in the invalidate function, maybe we should discuss what of the possible arguments really are useful for that (on the other side it's always the question if we want to constrain the arguments any make it less flexible)

I'd say pass them in a table, that would at least be comfortable. (maybe create one table per condition and reuse it (don't create a table each call), but I'd say we measure first)

@atticus-sullivan
Copy link
Contributor Author

I started looking into treesitter-conditions in treesitter-ideas, basically I want to add

Can't find treesitter-ideas, is this a github repo?

@atticus-sullivan
Copy link
Contributor Author

so we should definitely query changedtick (basically pull- instead of push-model).

👍

having the invalidation as a callback sounds very flexible (though we can only optimize predefined invalidations, so there might be a reason for accepting both)

I agree, why close this interface for the user (providing custom invalidate functions). Not quite sure what you want to optimize tough.

We should really start measuring the impact of condition-evaluations

Do you refer to switching trigger checking and condition checking?

we could try collecting some data from users

I guess we should make a testing branch which every user can checkout if they want to provide us with data. Any ideas on how to get to the users? (making an announcement)
But I'm not sure how much the condition option is even used (I for instance only use it for checking for beginning of line to restrict some automatically triggered snippets to these occurrences). Without users using conditions we won't get a good result of this profiling.

@atticus-sullivan
Copy link
Contributor Author

Having all this discussion on memoization/invalidation, maybe we should outsource the implementation of condition objects and the logic operators to another (new) PR and handle memoization separately (still in this PR).
What do you think?

@atticus-sullivan
Copy link
Contributor Author

I'd say pass them in a table, that would at least be comfortable. (maybe create one table per condition and reuse it (don't create a table each call), but I'd say we measure first)

No idea how you want to share a table with is simply used to pass arguments 🤷, but that's for later

@atticus-sullivan
Copy link
Contributor Author

Just not sure about the vim stuff. But do you think there is any way a condition could change without setting another changedtick? (thinking about using this as the smallest unit of measure for some sort of invalidation caching)

But indeed we should run some benchmarks to check if memoization even has the potential to give some performance benefit (tbh I doubt it based on our current discussion, memoization sounds great but I don't think it's worth it. Maybe it's just a think to keep in the back of our mind to come back if someday this becomes an issue).

@L3MON4D3
Copy link
Owner

I started looking into treesitter-conditions in treesitter-ideas, basically I want to add

Can't find treesitter-ideas, is this a github repo?

Ah, my bad, it's a branch that I thought I'd pushed already 😅

@L3MON4D3
Copy link
Owner

I agree, why close this interface for the user (providing custom invalidate functions). Not quite sure what you want to optimize tough.

We could invalidate on "buffer" by actually clearing the value of conditions instead of checking the condition-function (otoh, this might also clean to often, maybe there won't be any expansions in the new buffere, and then we'd cleaned it for naught). But this is the only one where I can actually think of an optimization

What do you think?

Yeah, that makes sense, the logic-operators are pretty much complete, so no reason to wait with merging them.
Good call 👍

No idea how you want to share a table with is simply used to pass arguments 🤷

:D
I meant share across calls, so regularly we'd have

...
invalidate_args = {
  line = ...,
  cursor = ...
  ...
}
if invalidate(invalidate_args) then ... end

so we'd create a new table for each of those calls, but we could instead store this table in the condition-object, and just set the new values

...
self.invalidate_args.line = ...
self.invalidate_args.cursor = ...
...

if invalidate(self.invalidate_args) then ... end

Not sure if this has any performance-impact at all, I just thought of it and wanted to share :D

Just not sure about the vim stuff. But do you think there is any way a condition could change without setting another changedtick?

Well, maybe, depends on the condition. For example if filteype changes (oh, or the cursor moves ofc), I don't think that's covered by changedtick. We could handle any uncertainty here by calling that memoization just "changedtick" (otoh, "buffer" is clearer)

But indeed we should run some benchmarks to check if memoization even has the potential to give some performance benefit (tbh I doubt it based on our current discussion, memoization sounds great but I don't think it's worth it. Maybe it's just a think to keep in the back of our mind to come back if someday this becomes an issue).

Yup :/

@atticus-sullivan
Copy link
Contributor Author

Not sure if this has any performance-impact at all

Me neither (basically the question is if allocate + init is faster than assigning elements)

oh, or the cursor moves ofc

Oh yes of course.

@atticus-sullivan atticus-sullivan changed the title Condition objects conditions memoization Oct 1, 2022
@atticus-sullivan
Copy link
Contributor Author

I'd suggest we keep this in the back of our mind, if someone reports bad performance on conditions (I see no point in doing memoization if there is no real benefit (ofc would be cool to simply have it, but for that it's just too much work I think)). We'd need user support for benchmarking anyways.

So I'd close this PR, but maybe we could add a label (maybe something like poss_performance_gain) to not forget this.

@L3MON4D3
Copy link
Owner

L3MON4D3 commented Oct 1, 2022

I'd suggest we keep this in the back of our mind, if someone reports bad performance on conditions (I see no point in doing memoization if there is no real benefit (ofc would be cool to simply have it, but for that it's just too much work I think)). We'd need user support for benchmarking anyways.

Agreed 👍

I'd close it just in spirit, leaving it as draft seems the perfect thing to do, since we'll need to do more investigation to do this properly (or, to find out if this needs doing properly :D)

@L3MON4D3 L3MON4D3 added the performance - not planned Leads to better performance, but not currently planned label Oct 1, 2022
@atticus-sullivan
Copy link
Contributor Author

Fine with that. Just thought properly close this might avoid cluttering the PR section. But you're also right, closed PRs might get forgetten

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance - not planned Leads to better performance, but not currently planned
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants