-
Notifications
You must be signed in to change notification settings - Fork 29.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change format of cell uris so that it is harder to guess #141717
Comments
/cc @rebornix |
/cc @weinand I believe debugging does or did make assumptions about the cell uri format |
@jrieken in the notebook debugging samples the debugging functionality did not make assumptions about cell uri format. But since VS Code persists breakpoints independent from debuggers via their resource URI, it is possible that a persisted cell breakpoint "gets broken" when the cells of a notebook are rearranged and then the notebook cell URIs get new sequential numbers in their URI fragment. For this problem the notebook debugging samples tried to "fix" the persisted breakpoints when they detected that cell rearrangements happened. If cell URI would be robust, this ugly workaround would not be necessary. @roblourens is my assessment still true for the current notebook debugging approach? |
That's currently not possible because cells don't have a stable identifier. So, it's not really part of this effort |
I can't think of anything in debugging that makes assumptions about the format. Changing the format will break anything that is stored by cellURI, like persisted breakpoints. So if we change it we might consider a simple one-time migrationwith the update |
I am not sure how |
I do understand the desire for stable URIs but this specific case is also a good sample for why the format should be obfuscated. This specific "massage" isn't working since roughly a year more (due to ce02917). We don't want to invite people to make assumptions... Stable cell uris/identity is likely another topic and while I believe this is coming (has come?) to jupyter it isn't defined for other notebooks. There is also a specific issue with displaying cell uris in the workbench: we don't have the concept for pluggable resource sorters and therefore cell uris are crafted so that they sort lexicographically well. This makes the problems and references view look nice... |
/cc @rchiodo I found something that looks like an assumption about the format of the fragment. What's that for and can that work without these assumptions? That was the only usage I could find but I am curious if you know about more such places? |
Yes actually this is used here: We use this in pylance for computing where a notebook cell is in a document when it is created. I think @dbaeumer's new LSP messages will eliminate this need though (as the cells positions are contained in his new messages). |
The first location you mentioned is a debug tool we use for replaying customer's logs. We'd need some way to pull apart the cell URI for that to continue to work because we don't have the actual notebook originally associated with the URI. |
Can you explain what pulling apart uris means? |
Take the cell index out of the fragment. We need that (during the log replay) to determine what cell a 'textDocument/didChange' event to apply an edit to. The log replace service is used to try and repro intellisense problems. Customer sends a log of all the LSP requests and a notebook that they started with. We open the notebook and for every |
Ok, thanks for clarifying. I believe a better approach is checking for the if(step.textDocument.uri.scheme === 'vscode-notebook-scheme') { // check fixed scheme
const replaced = this.activeNotebook
.cellAt(0)
.document.uri.with({ fragment: step.textDocument.uri.fragement }); // use with, no string-replace
step.textDocument.uri = replaced
} |
Do you have an ETA for adopting that? |
The pylance team would also need to switch to adopting it. Probably this month some time. |
@rchiodo has the new LSP messages makes it possible to not rely on the cell URL format. |
No it did not. |
I believe @msfterictraut is in the process of making an experimental version of pylance to adopt this new API. |
@jrieken The changes for the experiment were checked in today. Prerelease builds with experiment support will ship tomorrow. Current plan is to start the experiment on Monday. |
👯 Thanks for the update |
@jrieken, our experiment is still running. Also, the Interactive Window is currently still using the old (non-LSP) approach due to microsoft/vscode-jupyter#10630. |
Post that being fixed we'd still make assumptions about notebook URIs. We need this in order to determine if the TextDocument for the input box in the interactive window is tied to the notebook for the interactive window or not. @jrieken I'm not sure if you're changes will make this impossible or are they only dealing with the fragment portion? |
The 'logReplayService' above isn't really necessary. It's just a debug tool. |
@rchiodo Only the fragment portion. The new format is more obscure but has the same properties: sorts by cell orders and encodes the cell index. @debonte Assuming we are going ahead with this change. What's the impact on your end? What will break and what intermediate change can you make? |
Because we depend on the lsp-notebook-concat code that Rich referenced above, I believe this would cause the ordering of cells to be wrong for anyone using Pylance's concat doc-based notebook support. Depending on how the cells are misordered, this could cause most language server features to misbehave. This would impact any user that is not in the LSP notebooks experiment treatment group or who is using Interactive Window. The Jupyter team owns this code and would be best able to say how it might be adjusted to support both the old and new formats. FYI, Rich is OOF this week. |
The new format makes the same ordering guarantees, e.g when sorting by the toString()'ed uri of the cell-documents you get the same order as before. It is just harder to extract the actual (numeric) value from the URI |
The lsp-notebook-concat code we are currently using gets the cell fragment as a number like this: parseInt(cellUri.fragment.substring(2) || '0') My understanding is that this won't work properly with the new format. That was my point. |
Yeah, parsing the fragment into a number won't work anymore but looking at how the parse result is actually used gives me confidence. The value is kept in Let's wait for @rchiodo to comment but I am fairly confident that a small change in lsp-notebook-concat will do the job. (edit) I didn't bother to create a fork but I believe this change will do it diff --git a/src/notebookConcatDocument.ts b/src/notebookConcatDocument.ts
index f3e8cec..9aee8ab 100644
--- a/src/notebookConcatDocument.ts
+++ b/src/notebookConcatDocument.ts
@@ -26,7 +26,7 @@ import { createPosition, createRange } from './helper';
type NotebookSpan = {
uri: vscodeUri.URI;
- fragment: number;
+ fragment: string;
inRealCell: boolean;
startOffset: number;
endOffset: number;
@@ -233,7 +233,7 @@ export class NotebookConcatDocument implements ITextDocument {
// Compute 'fragment' portion of URI. It's the tentative cell index
const fragment =
- cellUri.scheme === InteractiveInputScheme ? -1 : parseInt(cellUri.fragment.substring(2) || '0');
+ cellUri.scheme === InteractiveInputScheme ? '' : cellUri.fragment;
// That fragment determines order in the list (if we're not forcing append)
const insertIndex = forceAppend ? this._spans.length : this.computeInsertionIndex(fragment);
@@ -659,7 +659,7 @@ export class NotebookConcatDocument implements ITextDocument {
): NotebookSpan {
// Compute fragment based on cell uri
const fragment =
- cellUri.scheme === InteractiveInputScheme ? -1 : parseInt(cellUri.fragment.substring(2) || '0');
+ cellUri.scheme === InteractiveInputScheme ? '' : cellUri.fragment;
return {
fragment,
uri: cellUri,
@@ -676,7 +676,7 @@ export class NotebookConcatDocument implements ITextDocument {
private createTypeIgnoreSpan(cellUri: vscodeUri.URI, offset: number, realOffset: number): NotebookSpan {
// Compute fragment based on cell uri
const fragment =
- cellUri.scheme === InteractiveInputScheme ? -1 : parseInt(cellUri.fragment.substring(2) || '0');
+ cellUri.scheme === InteractiveInputScheme ? '' : cellUri.fragment;
return {
fragment,
uri: cellUri,
@@ -700,7 +700,7 @@ export class NotebookConcatDocument implements ITextDocument {
}
return [
{
- fragment: -1,
+ fragment: '',
uri: cellUri,
inRealCell: false,
startOffset: 0,
@@ -715,7 +715,7 @@ export class NotebookConcatDocument implements ITextDocument {
return [
{
- fragment: -1,
+ fragment: '',
uri: cellUri,
inRealCell: false,
startOffset: 0,
@@ -877,13 +877,13 @@ export class NotebookConcatDocument implements ITextDocument {
return result as protocol.Range;
}
- private computeInsertionIndex(fragment: number): number {
+ private computeInsertionIndex(fragment: string): number {
// Remember if last cell is already the input box
const inputBoxPresent = this._spans[this._spans.length - 1]?.uri?.scheme === InteractiveInputScheme;
const totalLength = inputBoxPresent ? this._spans.length - 1 : this._spans.length;
// Find index based on fragment
- const index = fragment == -1 ? this._spans.length : this._spans.findIndex((c) => c.fragment > fragment);
+ const index = fragment === '' ? this._spans.length : this._spans.findIndex((c) => c.fragment > fragment);
return index < 0 ? totalLength : index;
}
|
I think you're saying a string comparison of the fragments will order them correctly? Maybe it should be more explicit that it's a string compare (instead of using '<'). |
I was also thinking the lsp concat stuff could be refactored to use the new messages that Dirk created. We were trying to build the order ourselves but I believe that isn't necessary anymore? |
Unsure - the comparison operator might be best suited here because it faster and more narrow-spec'd than localeCompare (I believe). I also see this as a temporary fix, e.g using the LSP concat is the way to go but the question is what do you do assuming we make the fragment-change at the end of the week? The patch I have suggested is a good stepping-stone to prevent breakage but it isn't an alternative to LSP. It just less fragile: today we assume the fragment (1) parses to a number and (2) the numbers are comparable and with my patch we assume (1) the fragment is comparable. That's better but not the optimum, using LSP is the optimum but I haven't seen that happening in the last months and TBH I want to make progress |
Shortly this will be the case. That's the work that Erik has been doing. It's already shipped but it's just behind an experiment flag at the moment. Once that experiment is at 100%, this middleware code won't be needed anymore. Given that, I think your temporary fix would be good. Although it requires the fragment change, right? So it needs to go in at the same time? Oh maybe it doesn't. Fragment today would compare the same I think. |
Yes, the change can happen today because comparing the fragment is already behaving like that |
Okay the fragment change made it through all the layers for jupyter. Our build tonight will have the new fragment change. |
Notebooks use the the
CellUri
to create a uri for a specific cell of a notebook. The rules arevscode-notebook-cell
file
Samples:
Cell 1 of notebook
file:///some/notebook.ipynb
isvscode-notebook-cell:///some/notebook.ipynb#ch0000001
Cell 21 of notebook
memfs:///some/notebook.ipynb
isvscode-notebook-cell:///some/notebook.ipynb#ch0000021memfs
This format is an implementation details and the problem is that it is so obvious that folks starting to program against it (instead of using API, e.g for discovering a notebook for a cell etc). To prevent this we should make the format a little less obvious, e.g. by encoding the fragment a little.
The text was updated successfully, but these errors were encountered: