Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a "data"-like field on CompletionList that is also returned to the server in completionItem/resolve to avoid duplication in CompletionItem.data #1802

Closed
DanTup opened this issue Sep 7, 2023 · 61 comments · Fixed by #2018
Labels
info-needed Issue requires more information from poster

Comments

@DanTup
Copy link
Contributor

DanTup commented Sep 7, 2023

(Edit 2023-12-04: This request has changed slightly throughout the discussion - see #1802 (comment) below)


(from microsoft/vscode-languageserver-node#1237 (comment))

There's an itemDefaults.data field for completion that allows data to be included once in a completion response rather than duplicated across all items.

For Dart, the data field contains a mix of data that is the same for all items (eg. the file the completion is being inserted into so we can compute edits for adding imports where required - since /resolve doesn't get any context) and data that is different (an ID to get back to the element being inserted so we can resolve things like documentation). Since itemDefaults.data replaces the whole of data we can't use it here, so we end up with a large payload with a lot of duplicated into.

It would be very helpful to have an option to merge data from items over the default (for ex. Object.assign(itemDefaults.data, item.data)?).

I'm happy to send PRs for this, but I want to agree an approach first:

  • Do we support merge only for data or all fields?
  • What should the merged default be called? (itemDefaults.mergedData?)
  • Is Object.assign(itemDefaults.data, item.data) flexible enough? (you can use null to erase something from the defaults for a given item?)
  • The capability can just be the client including the new field (eg. mergedData) in the completionList.itemDefaults set?

@dbaeumer WDYT?

@puremourning
Copy link

Just remember that If we are going to specify a merge operation the client must perform, then the exact detailed merge behaviour and semantics must be fully specified by the protocol.

Saying 'what JavaScript does' is not helpful if the client isn't JavaScript.

@dbaeumer
Copy link
Member

I do see the need for something like this due to performance reasons. But before we go down the pass can you collect some numbers that show how much this will speed things up (e.g time it take to show the completion in the UI). It will complicate the protocol and adds effort for clients to implement this and I want to ensure it is worth the effort.

@dbaeumer dbaeumer added the info-needed Issue requires more information from poster label Sep 10, 2023
@DanTup
Copy link
Contributor Author

DanTup commented Sep 11, 2023

It's difficult to give concrete numbers because it varies a lot across environments and payloads, and I can't currently measure what difference it would make without implementing it. My main motivation is to reduce completion payload sizes because currently they can be a few megabytes (more on that below) and in Codespaces this can be really slow (like 5-6 seconds). I'm not sure why it's so slow, but I see the data over the websocket is batched into 256kb chunks and the timestamps seem much further apart than I'd expect ().

There are a number of contributing factors to the payload being so large:

  1. We avoid using isIncomplete=true where possible because of several issues in VS Code that make it feel slower (plus some implementation decisions). Users expect everything in the completion list (imported or not), so that makes the total number of items very large.
  1. Because there is no context round-tripped with /resolve we don't know the file that the completion will be imported into, so we cannot compute the edits to add imports. We end up round-tripping a bunch of information for each unimported symbol to be able to compute this, and that includes things like the filename - which is identical for every item:
    image

There are many other things making the payload large which I'm working on, but being able to round-trip some context without duplicating it on every item will definitely help. The whole purpose of itemDefaults was to reduce payload size, and this request is just the same, but being able to use data in both cases (some shared data, and some per-item data).

Another possibility would be to allow the server to keep some state between /completion and /resolve, but it would require LSP to clearly define the rules there. For example stating that /resolve is only valid for the last /completion request, so that the server is able to keep only one-requests worth of state. This would allow round-tripping some indices for some items instead of full strings.

If you have other ideas that would be better, I'm all ears :-)

@dbaeumer
Copy link
Member

Have you every tried not sending the data at all but instead doing the following:

  • keep them on the server and index by id
  • only send the id

I know that this holds onto memory on the server however code complete requests are so frequent that you can always free the data hold for the last request when a new one arrives.

This would not only benefit the communication between servers and the extension host but also in VS Code a remote setup. ItemDefaults are a LSP concept and don't exist in VS Code. The data is basically inflated when VS Code talks from the extension host to the renderer.

@DanTup
Copy link
Contributor Author

DanTup commented Oct 13, 2023

I've thought about this a few times (I think we actually did this in the past) but I wasn't sure how reasonable it was for a server to only support resolve for the "last completion request". I can imagine some situations where this might have issues:

  • completion request is sent
  • user types an additional character
  • editor sends both a new completion request (because isIncomplete=true) but also filters the current client-side list (to avoid not updating completions until the new request completes) which triggers a resolve of the new top item but the server has potentially just thrown away its state.

I don't think this would affect VS Code (I don't think it filters client-side when it's sending another request?) and it's probably likely that the new completion request would complete and a new item resolved anyway, so perhaps that's not an issue.

I'll think about this a bit more and maybe try it out and see how it works.

@rchiodo
Copy link

rchiodo commented Oct 13, 2023

For me I ran a test with a whole bunch of completions. Something that returned over 900 items for a completion list. With the data not being passed (I was only sending the data that was unique), it was about a 30% speedup.

@DanTup
Copy link
Contributor Author

DanTup commented Oct 23, 2023

@dbaeumer if we decide to do this, could we add something to the spec about it? Right now the spec doesn't say anything about whether calling resolve on a completion from 4 hours and 2,000 completions ago should work. If servers might start relying on state from completion requests it might be worth calling out that clients should not assume that resolve will work the same for a non-last completion request.

Ideally, the spec would say that clients should only call it for the last one, because that would avoid needing to even round-trip some identifier (which would still be a bit of junk to duplicate on every single completion item in a large list).

@dbaeumer
Copy link
Member

@DanTup only allowing to resolve the a completion item from the last completion request makes total sense to me.

@r3m0t
Copy link

r3m0t commented Oct 25, 2023

I think per filename would make more sense. You could have a completion running in one editor and switch away to make a change in another editor. Same with signature help.

@DanTup
Copy link
Contributor Author

DanTup commented Oct 25, 2023

@dbaeumer great, I'll open a PR (and post a link back here in case others in this thread have feedback/suggestions).

@r3m0t

I think per filename would make more sense. You could have a completion running in one editor and switch away to make a change in another editor. Same with signature help.

This wouldn't be possible without resolve actually knowing the filename (which it doesn't - which is actually the main issue here - I'm currently having to duplicate the filename on every single completion item that will need to compute imports).

But it also means the server would have to keep a lot more state (the last completion for every file). I think if you switch to another editor and back, it's probably reasonable to just re-invoke code completion (VS Code already does not keep the completion widget open if you switch editor).

@DanTup
Copy link
Contributor Author

DanTup commented Oct 30, 2023

I opened a PR here that says /resolve should only be called for items from the last /completion response (and that calling it with an older completion item might produce incorrect results):

#1834

@dbaeumer
Copy link
Member

dbaeumer commented Nov 1, 2023

I agree with @DanTup here. When switching editors the client should actually cancel the last completion request since its result might be not correct anymore anyways.

@DanTup
Copy link
Contributor Author

DanTup commented Nov 1, 2023

@dbaeumer are you happy with #1834? (there's a "Community PR Approvals" check that seems stuck?). I'd prefer to have that merged before I start making server changes assuming that's good 🙂

(if we merge that, I think we can close this, as keeping the state on the server provides the ability to do everything that this would)

@dbaeumer
Copy link
Member

dbaeumer commented Nov 10, 2023

We had a longer discussion about that problem and due to that fact the code could directly talk to the server we can't spec that a resolve request can only be sent for items from the last code complete request (although this is the case in 99%).

The only way I can see to tackle this is to have an explicit release call that client can send to the server. Something like this:

  • we add a special ID field to the completion item or have a special property name in the data field
  • when the client will not need to the completion item anymore it will send a textDocument\completion\release with an array of ids that can be freed.

This will allow servers to keep state for a completion item on the server instead of attaching all state to the completion item itself.

This approach however has to go behind a capability flag but it is implementable for VS Code.

@DanTup
Copy link
Contributor Author

DanTup commented Nov 10, 2023

@dbaeumer do you mean releasing each completion item? Adding a unique ID to each completion item feels like it's going to add more to the payload that the goal was to remove.

I wonder if we'd be better trying to do the original plan here (a mergeable data) instead?

Or, how about a new field (context?) that functions just the same as data but can only be supplied in the top level CompletionList (or even in itemDefaults) that is attached (alongside data) during resolve?

Something like that seems way simpler - both for LSP/spec, and for servers (no need to keep state, worry about it not being cleaned up, no extra per-item data to track IDs).

@dbaeumer
Copy link
Member

do you mean releasing each completion item? Adding a unique ID to each completion item feels like it's going to add more to the payload that the goal was to remove.

Yes. But I doubt that this will add more data since a single ID field / property would make the whole data property go away on these completion items. I am pretty sure that in total that will be a smaller payload in the cases were servers add state to the data property

The problem with the merge is that users will ask for more and more complicated merging algorithms. The next thing I already see users asking for is to allow to template paths since the majority of the paths only differ in small parts.

I do understand the need of lowering the payload but I am not convinced that the merging is the right solution.

@DanTup
Copy link
Contributor Author

DanTup commented Nov 14, 2023

Yes. But I doubt that this will add more data since a single ID field / property would make the whole data property go away on these completion items.

It would definitely be smaller than it is today, but it still feels needless verbose. My current goal is to strip everything I don't need from the payload, so trading a large property for a small property for potentially a large number of items is not as good as removing it :-)

The problem with the merge is that users will ask for more and more complicated merging algorithms.

To be clear, my last suggestion above involved no merge. I was asking that we have a second field (in addition to data) that goes back to the server that comes from the CompletionList, something like:

textDocument/completion result:

{
	"context": {
		"foo": "bar",
	},
	"items": [
		{
			"label": "...",
			"data": { "a": "b" }
		}
	]
}

completion / resolve:

{
	"label": "...",
	"data": { "a": "b" }
	"context": {
		"foo": "bar",
	},
}

This seems like a much simpler solution than having to release completion items (something I'm not sure clients would bother to implement), has no complexity of merge, and has no restrictions on the ability to call resolve later.

Edit: For my specific case, even just sending the original completion arguments as context would work. The above feels slightly more flexible though.

@mfussenegger
Copy link

For what's worth from a client implementation perspective I'd much rather have an additional property in the CompletionList that's transferred on completion/resolve than having to deal with extra state management with a release.
Having to release items that are no longer needed would be especially ugly to implement in Neovim, given that some aspects of completion are often extended in plugins.

@DanTup
Copy link
Contributor Author

DanTup commented Nov 28, 2023

@dbaeumer any thoughts/opinions on the above?

@dbaeumer
Copy link
Member

I am still not convinced that this will drastically reduce the amount of data servers add to the data field of a completion item. @DanTup could you provide some example of before and after.

@dibarbet and @jdneo do you have any insights / number you can share about the payload C# / Java encodes into the data field and if such a context on completion list would help lowering the payload significantly.

@DanTup
Copy link
Contributor Author

DanTup commented Nov 29, 2023

@dbaeumer it's difficult to give specifics because it depends very much on the specific context (for example how many completion items there are, how many of them need context to provide auto-imports, etc.) but as an example, I just created a new file in a Flutter project which has no dependencies other than Flutter and invoking completion at the top level has the file path repeated 6220 times:

image

That's 78 * 6220 = which is 485,160 characters just to provide the server with the filename of where it will be inserting imports (which it requires to compute the edit). There's also a cost to serialising and deserializing all of this data.

This number will go down if some of the items are already imported (because they won't need this in resolve), but it would also go up for each new dependency/package the user adds (because users want all symbols to show up in completion regardless of whether they're imported yet).

Of course, this can be reduced with isIncomplete and capping the number of results, but the experience of that is worse and visible different in VS Code (for example microsoft/vscode#147830) so my current aim is to reduce the payload sizes to avoid having to truncate the list too frequently.

(I'm aware there are other savings to be made in the screenshot above - I've made some that haven't shipped in the SDK I'm using, and I've still some to make :) )

@DanTup
Copy link
Contributor Author

DanTup commented Nov 29, 2023

Completing inside a function actually included more of this duplication, because there are symbols that weren't valid in the top level location where I took the first screenshot.

image

@jdneo
Copy link
Member

jdneo commented Nov 29, 2023

From the Java side, we don't have such request for now. But just in several months ago, we did some refactoring to remove some unnecessary data field in the completion items which helps to improve the completion performance.

One example is: We previously stored the document uri into the data field for each completion items (which looks similar as @DanTup mentioned in the dart), and then we found that is not necessary, so we remove it from the data field.

After removing that uri string, triggering completion via 'S' in Spring Petclinic project, the response (textDocument/completion) payload size can be reduced from 3.05MB to 2.63MB. (Directly copy the trace string to a text file). And the completion time becomes a little bit faster. More details: eclipse-jdtls/eclipse.jdt.ls#2614.

@DanTup
Copy link
Contributor Author

DanTup commented Nov 29, 2023

@jdneo do I understand correctly that you're stitching this data back in in the LSP client? (eg., it won't work for other generic LSP clients)?

I was also considering something like that for Dart if LSP doesn't support it, but it seems silly not to include it in LSP if clients and servers are going to build custom support for exactly this anyway. If both Dart and Java benefit significantly from extracting this from CompletionItem to CompletionResult, I think other languages might too.

@jdneo
Copy link
Member

jdneo commented Nov 29, 2023

In our case, the uri is used during resolve stage. The way we remove that uri is:

For every completion item, we have a generated unique id for it. And we add a cache at the server side that maps id -> the context of the completion item (things like uri, etc...). Then we only put that id to the data field.

At the completion resolve stage, the server side can recover the context via the id and do some further tasks.

@DanTup
Copy link
Contributor Author

DanTup commented Nov 29, 2023

@jdneo what happens for items that are never resolved? When do you clean up the context?

There were some suggestions about this above, but it seemed complicated to manage releasing the context, which is why I was advocating for just adding a new field to CompletionResult that can be also included in resolve. It would've been nice if resolve accepted {item: CompletionItem} instead of the CompletionItem directly, although that ship has sailed.

@DanTup
Copy link
Contributor Author

DanTup commented May 22, 2024

@dbaeumer the issue of payload sizes came up again and I'd like to make some progress on this. Would you accept PRs based on the above? To summarize, my understanding is:

Add to CompletionClientCapabilities.completionList:

  // Client supports the `data` field on `CompletionList` and will merge into CompletionItem.data
  // as described in the CompletionList.data docs
   data?: LSPAny;

Add to CompletionList:

  // If the client supports `CompletionClientCapabilities.completionList.data`, this field can contain
  // additional data that will be combined with `CompletionItem.data` when sending a
  // `completionItem/resolve` request.
  // If any fields are defined in both CompletionList and CompletionItem, the value from
  // CompletionItem will be used - no merging of values will occur.
  // This is equivalent to `Object.assign({}, list.data, item.data)` in JS (TODO: define how itemDefaults works here)
   data?: LSPAny;

Update docs on CompletionItem.data

  // A data entry field that is preserved on a completion item between a completion
  // and a completion resolve request.
  // If the client supports `CompletionClientCapabilities.completionList.data` and
  // `CompletionList.data` was provided by the server, this field will have been
  // supplemented with `CompletionList.data` by the client before calling `completionItem/resolve`.
  data?: LSPAny

The only thing that's not clear to me is what we do if itemDefaults has data too?

  • Object.assign({}, list.data, item.data ?? list.itemDefaults.data)?
  • Object.assign({}, list.data, list.itemDefaults.data, item.data)?

And for the VS Code npm client specifically, do we merge this data on the way in (during handling of the incoming completion list), or on the way out (when we call resolve)? This might impact how it appears to middleware?

@heejaechang
Copy link

heejaechang commented Jun 19, 2024

can we do this for every feature that has resolve or data field? such as inlayHint or TypeHierarchy and etc?

they all have the exact same issue. since people usually doesn't want to hold onto state that is not clear when to invalidate, all those features, item's data field contains some kind of self-sufficient data to resolve the given item and usually 80% of that data is exactly same. for example, name of workpace, filepath, or range to replace and etc.

so, this approach could be generally helpful for any feature that support data field or resolve.
...

that said, I think another generic approach to reduce size of message for any LSP message is providing a way to intern message data. Basically, making any message to have internedMap whose type is {id: string, data: any} and any part of LSP message can refer value as id for the internedMap

so rather than returning

{ ... additionalTextEdits: [ { start: {line: ...}, end: {line: ...}, newText: ...}, ... ], textEdits: [ { ... } ] ... }

one could return

{ ... additionalTextEdits: { { start: "id#start", end: "id#start", newText: ... }, ...], textEdit: [ { start: "id2#start", end: { ... }, newText: ... } ] ... }

that way when server return any message, they can get rid of any duplicated messages in any way that fits them.

and resolve either always get those part resolved or get internedMap with them.

@DanTup
Copy link
Contributor Author

DanTup commented Sep 4, 2024

@dbaeumer do you have any opinions on my proposal above? (#1802 (comment))

I'm happy to open PRs if we have a general agreement on this (otherwise I may do it using custom flags/middleware - but that's not ideal because it won't be supported by clients other than VS Code).

@dbaeumer
Copy link
Member

dbaeumer commented Sep 5, 2024

@DanTup now that I read the issue again I am a little bit puzzled why we need a CompletionList.data field at all since there is a CompletionList.itemDefaults.data field which comes back in the resolve request. Can you provide me with an example why you would need two of those.

@DanTup
Copy link
Contributor Author

DanTup commented Sep 5, 2024

@dbaeumer the issue is that there are two sets of data, the "per completionItem data", and the "same for every completionItem data". Because itemDefaults.data is not merged with completionItem.data, we currently have to either have the same data sent on every resolve (that comes from itemDefaults), or we have to duplicate the same data on each individual completionItems data.

Concretely, to support resolve correctly in Dart, we need to know the file that the completion item will be inserted into, because we need to compute edits for inserting import statements. This file is currently repeated in every completionItem.data, because we need additional per-item data. This results in a payload that repeats some fairly large strings (and the field name, and "data") for every completion item.

In my original comment I suggested supporting merging those, but in #1802 (comment) you didn't seem keen on that.

@dbaeumer
Copy link
Member

dbaeumer commented Sep 6, 2024

@DanTup now that I finally understand what the problem is (your original description with a new mergeData field made it too complicated for me :-)). From what I understand now I would do the following:

  • add a property applyKind: 'merge' | 'replace' to CompletionItemDefaults. Default it replace
  • depending on its value the defaults either replace the current values or are merged.
  • merge will be a shallow merge (only first level) and only for the following properties: commitCharacters and data
  • I wouldn't support any null values to delete stuff since this needs to be guarded by capabilities to not break.

Will this solve your issue?

@DanTup
Copy link
Contributor Author

DanTup commented Sep 6, 2024

@dbaeumer yes, it would :) But some comments:

merge will be a shallow merge (only first level) and only for the following properties: commitCharacters and data

commitCharacters is an array, so I presume here you mean to concatanate (or a set of both values)?

I wouldn't support any null values to delete stuff since this needs to be guarded by capabilities to not break.

Doesn't there need to be a capability for this anyway (so the server knows the client suports merge, in which case could supporting nulls to delete just be specified as part of merge? (this is less critical to me, but I do think it'd be nice to support if there are cases where something is included for 90% of items). Handling null differently to undefined would also allow a difference between inheriting the default versus not having data at all, for ex:

{
  itemDefaults: { data: { foo: "bar" } },
  items: [
    { label: 'x' } // gets itemDefaults.data,
    { label: 'y', data: null } // does not get itemDefaults.data
  ],
}

Again, I don't think that's as critical, but might be nice to support.

Thanks!

@dbaeumer
Copy link
Member

Actually you are correct that we need a client capability to ensure that servers doesn't request a merge and the client doesn't support it. And then we can support null as well.

Are you still up for a PR?

@DanTup
Copy link
Contributor Author

DanTup commented Sep 10, 2024

Are you still up for a PR?

Yep! I'm still a bit unsure about commitCharacters though and wonder if they're best left alone (eg. you can supply defaults, and you can replace those defaults with a new set, but there is never a merge).

Otherwise, if you want to use applyKind: merge for data, you now can't use itemDefaults.commitCharacters for a default set if you might ever want to remove/replace those defaults for even a single entry.

So I think my suggestion would be either, applyKind is only for data, or the kind for data and commitCharacters should be seperate, perhaps something like:

applyKind?: {
    data?: "replace" | "merge",
    commitCharacters?: "replace" | "merge",
}

And some ability for the client to indicate which fields it supports applyKind for (like it does for itemDefaults)?

@dbaeumer
Copy link
Member

Then I would go with

applyKind?: {
    data?: "replace" | "merge",
    commitCharacters?: "replace" | "merge",
}

but only one client capability. If it supports applyKind it must support it for data and commitCharacters

@DanTup
Copy link
Contributor Author

DanTup commented Sep 11, 2024

but only one client capability. If it supports applyKind it must support it for data and commitCharacters

Is it possible in future there might be new fields that might make sense to support merge? If so, wouldn't using a single flag make it more difficult to add that?

DanTup added a commit to DanTup/language-server-protocol that referenced this issue Sep 11, 2024
…om `completionList.itemDefaults` and `completion` are combined.

Fixes microsoft#1802
@DanTup
Copy link
Contributor Author

DanTup commented Sep 11, 2024

@dbaeumer I've started a PR at #2018 for feedback. I use a list as noted above for the reasons I gave, but happy to change if you think we shouldn't do this (I slightly worry that in future there may be new fields we want to support merge for, but the boolean doesn't let us add them or the applyKind boolean becomes code for applyKindForCommitCharactersAndData and we add additional applyKind flags).

@dbaeumer
Copy link
Member

Is it possible in future there might be new fields that might make sense to support merge? If so, wouldn't using a single flag make it more difficult to add that?

Yes, but the new field must be guarded by a capability. If a client supports the new filed and merge it MUST then support the new filed in the applyKind literal. I think that is fair.

@DanTup
Copy link
Contributor Author

DanTup commented Sep 12, 2024

If a client supports the new filed and merge it MUST then support the new filed in the applyKind literal. I think that is fair.

Ok, I'll change this to just a flag then - thanks!

@DanTup
Copy link
Contributor Author

DanTup commented Sep 12, 2024

Yes, but the new field must be guarded by a capability. If a client supports the new filed and merge it MUST then support the new filed in the applyKind literal. I think that is fair.

Oh, one thing to note is that this means when new fields are added, it must be decided in that same version if they will get applyKind support. We can't add a new completion field in one version of LSP and then in the next version decide it needs applyKind support.

@mfussenegger
Copy link

My vote would also go for the previous approach with applyKinds being a list of the properties that are supported.
If it's a boolean flag it makes it difficult for a client to implement this in a forward compatible way where servers might start showing an additional field within applyKind

@DanTup
Copy link
Contributor Author

DanTup commented Sep 12, 2024

If it's a boolean flag it makes it difficult for a client to implement this in a forward compatible way where servers might start showing an additional field within applyKind

I think @dbaeumer's point is that a client needs to be updated to use any new completion fields, so at the point of adding support for a new field in a client, if that client supports merge (and it is a mergeable field), the client should also add support for merging it.

However, I do also think listing the fields may be slightly better, because it's harder to accidentally do it wrong (support a new field for forget to support merge for it).

@dbaeumer
Copy link
Member

I think it is fair to request this from clients. And having separate lists makes it harder for servers. They need to honor them :-)

DanTup added a commit to DanTup/vscode-languageserver-node that referenced this issue Sep 18, 2024
…d per-item commitCharacters/data are combined

Implements the changes in the LSP spec PR at microsoft/language-server-protocol#2018.

(Also see microsoft/language-server-protocol#1802)
dbaeumer added a commit to microsoft/vscode-languageserver-node that referenced this issue Oct 4, 2024
…d per-item commitCharacters/data are combined (#1558)

* Add support for CompletionList "applyKind" to control how defaults and per-item commitCharacters/data are combined

Implements the changes in the LSP spec PR at microsoft/language-server-protocol#2018.

(Also see microsoft/language-server-protocol#1802)

* Update meta model

* Add non-null falsy test

* Change ApplyKind to ints

* Tweaks + typos

---------

Co-authored-by: Dirk Bäumer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info-needed Issue requires more information from poster
Projects
None yet
9 participants