-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package location of document models #149
Comments
Before voting, I'd appreciate a bit more clarification on the extent to which this proposal impacts some of the specific points raised by @dmonad during the #12359 discussion. If I understand correctly, this is mostly about the package locations, but @dmonad was also concerned about other technical issues, specifically related to the complexity of correctly initializing these data structures and later manipulating them without introducing inconsistencies. Does this particular proposal intersect with those issues, or is the intent that for now, the package "home" for these APIs is the only change, with those issues being left for later discussion? I'd just like a bit more clarity on the scope of this particular vote. I realize there's probably been a massive amount of discussion at meetings I haven't participated in, and I'm not requesting a rehash of all the details, but in key public votes I think it's useful to summarize key points with sufficient precision for anyone who reads these logs to understand the decision-making process (Python PEPs do an outsanding job with this, and I can attest to having read the discussion/pros/cons sections of important PEPs many times in detail, to understand the evolution of the language). |
I don't think this approach strictly obviates any RTC implementation beyond defining where the underlying classes reside. So I think @dmonad can reasonably describe this proposal as orthogonal to the conversation in the RTC pull request (Kevin, please correct me if I have misstated your position). The proposal in this issue does coincide with the implementation in Carlos' PR but a different code implementation could still contain a package structure that accords with the proposed package architecture. I think the intent of this issue is to shrink the contested API surface areas by proposing a first principles question about how we organize the packages and APIs related to cells, notebooks, and their respective data models. The idea is to propose a question that does not require delving into a huge code base or a huge proposed change to that code base while still being able to offer an informed opinion. |
In the past few weeks, there have been a lot of discussions on the best next steps for RTC, as the deadline for JupyterLab 4.0 is approaching. Some contextThe first version of JupyterLab including collaborative editing features was JupyterLab 3.1, although it is only enabled by setting a flag. In JupyterLab 3.1, collaborative editing is built upon the Yjs CRDT implementation, which provides a framework for building collaborative applications. For this first iteration, a key objective was to provide a collaborative experience while retaining backwards compatibility for extension authors that were relying on the current ModelDB-based model implementation. This was a requirement for integrating this work into a 3.x release. This first version reached that objective by having the Yjs-based models (the so-called shared documents) exist in parallel with ModelDB, and synchronize with it. This first iteration has major issues. First, the bidirectional binding of Yjs and ModelDB model states created significant problems with respect to state initialization, such as occasionally causing data loss. Second, saving the documents was still done through a request from the front-end to the contents API, also causing issues with race conditions, forcing us to make use of a locking approach for the initial loading and the saving of documents from disk. Finally, this created significant overhead both in terms of performance and memory. Major goals for JupyterLab 4 were (among others) to
The first objective has been implemented - and merged into the relevant packages. The second objective is the subject of this issue. Carlos' PRIn PR 12359, Carlos (@hbcarlos) resolves the duplication problem by removing the new "shared documents" added in Lab 3.1 altogether, and by adopting an approach similar to ModelDB, except that the ModelDB replacement is based on Yjs. This ModelDB replacement is essentially Yjs, except that it exposes a subset of API and prevents exposing Yjs data types to other JupyterLab components. This PR has been significantly iterated upon, and is technically ready to go. It does resolve the catastrophic data loss issues that we were experiencing with JupyterLab 3, and gets us to the point of providing a real working collaborative editing experience in JupyterLab. Notably, Carlos' PR also includes a large number of new tests and docstrings. The ModelDB equivalent that is implemented on top of Yjs also adds to the line count. Kevin's objectionsKevin objects to Carlos' approach in PR 12359 and proposes instead to retain the "shared documents" introduced in 3.x, and to remove ModelDB. I don't want to misrepresent the arguments, but here is my short summary of their nature: Kevin's arguments in favor of this alternative approach are of several natures. Some pertain to code organization, such as that "shared state" and "application state" should live in two different classes and packages, architectural decisions, such as that the way abstractions for e.g. cells and output areas should be articulated around shared types. He made a draft implementation of this approach in PR 12695. What is common in both approaches - Yjs becoming the main source of truthIn both Carlos' and Kevin's approaches, the model state is solely represented as a Yjs document even in non-collaborative mode, making Yjs very core to JupyterLab. A single YDoc represents the entire state of a shared document. In both approaches, Yjs becomes the source of truth. So it better work, otherwise JupyterLab users will face significant regressions in JupyterLab 4.0. For example, it becomes critical that the so-called "move feature" that is currently missing in Yrs (the rust implementation of Yjs used in the backend) works as expected, or users of JupyterLab won't be able to make "undo/redo'' operations in cells after moving them. We were also resolving significant issues in Yrs/Ypy until recently, that caused some crashes during the test collaborative sessions. Hence, the move feature is an important priority for JupyterLab 4.0. The approach of constraining the decision makingThere have been many debates on this issue, in the public RTC meeting and in the various PRs and issues, one approach to get closer to an agreement that was proposed was to constrain the problem in an exogenous manner. If there were intrinsic constraints coming from our goals for JupyterLab that could guide some of the decisions, this would be a way to arbitrate the issue. This is why Afshin opened two votes for the JupyterLab council in this issue and Issue 150 that concern the location of the model implementation for documents, and the modeling of metadata. (I am not touching the question of modeling metadata in this post. This is another place where Carlos' approach differs from the current state of shared documents, although either implementation could probably implement the modeling of the notebook metadata in the way of the other). |
Now, on the crux of the matter of this issue (package location of document models), I think it is important that the document model lives in the same package as the document implementation. If we were to write a spreadsheet extension for JupyterLab, we would have spreadsheet model be in the spreadsheet NPM package. One thing that could (and should probably) be separate is a static schema definition fot the shared state, so that it can be consumed by other implementation. |
The voting period for this issues has closed:
Given the 14 council members, the 50% quorum is met and the proposal passes. |
(This vote is closed but the issue is open to allow others to find it more easily.) |
There are already users of our shared-models package that currently has very few dependencies. Ideally - and I believe everyone agrees with that - we want everyone to be able to consume Jupyter packages to build derivates. This will allow others (e.g. VSCode Jupyter extension) to build their product around the same sync API as JupyterLab (they can collaborate via the same backend!). Merging the shared models into the other packages might still allow others to consume the shared models, but we will force them to install almost the complete jupyter ecosystem. The argument about "bundlers will just eliminate this away" is a really awkward one (700MB of dependencies still need to be downloaded). We implemented this really awesome, complex feature "collaborative editing". It is fine to dedicate a single package to this very complex problem. We are not really missing out on much (maybe this is not the most ideomatic code architecture for some). But it is very practical and might mean a lot to others. |
The entire I don't think it is helpful to re-ignite an argument to save authors of non-JupyterLab applications from an additional 63MB download. I don't think that's a huge imposition. We're not talking about extension authors — they already need to install many other JupyterLab packages, after all. We have spent months in this exact conversation and as recently as last week's RTC call, I thought we had consensus on a path forward. In addition, we held this vote for this issue specifically because we had difficulty arriving at the answer otherwise. It's absolutely fine to disagree. If somebody on the @jupyterlab/jupyterlab-council feels like proposing the opposite of this issue and putting it to a vote that is okay. I advise against doing that. I'd rather release software than legislation. I recommend we move forward and actually release RTC one day soon. There has been plenty of discussion, disagreement, and compromise so far. To re-litigate the issue now is a step backward. We've agreed to go with @dmonad's approach despite how difficult the conversation around it has been and how much time and effort was poured into a different approach. We even agreed to ship the exact same API footprint. Is it really worth arguing over whether a package adds 7MB to |
I agree with @afshin that the decision has already been made. If a JupyterLab council member wants us to reconsider that decision/vote, they are free to propose a new vote. Discussion should continue, but at this point, I don't see any reason to not move forward with the approved plan. I realize that this is a new decision making approach for Jupyter and will likely take time for us to get used to. For those who want to read up on how it all works the decision making approach is described here: https://github.com/jupyter/governance/blob/master/decision_making.md |
This issue is to call a vote on the proposal in jupyterlab/jupyterlab#12708
(Full text of top-line comment)
Problem
We have an open PR jupyterlab/jupyterlab#12359 from @hbcarlos that refactors our models for notebooks and text documents to improve the real-time collaboration (RTC) user experience and APIs for extension developer. There are a couple of questions that are coming up in the review of that PR that are not really RTC questions. I am opening a few issues to help us reach consensus/vote to resolve those questions to make it easier for us to review and make decisions about #12359.
In the current issue, we are trying to answer the following question:
In what npm package should our models live?
In JupyterLab master, our models live in a couple of places:
ISharedFile
,ISharedNotebook
,ISharedCell
, etc. that wrap Yjs and expose public API.shared-models
as the output model and is basically the JSON representation of a single output.ICellModel
that "has a"ISharedCell
and is also a public model API.OutputModel
that wraps theIOutput
into a TS object with a public API.OutputAreaModel
object that has a list ofOutputModel
instances.CellList
model that is an observable list ofCellModel
objects.NotebookModel
that "has a"ISharedNotebook
andCellLlist
The key point here is that we have multiple layers of model APIs across those in the shared-models package and the (notebook, cells, outputarea packages). Both are currently public APIs that are high-level and semantic - they refer to native Jupyter things rather than the lower-level of Yjs types or ModelDB types.
The approach in #12359 removes the public API in shared-models and incorporates all of that logic into the (notebook, cells, outputarea) packages. As a result of these changes all of the models are in the npm package for that entity – (cells, notebook, outputarea) respectively.
Separate from all of the details of how we are doing RTC, we need to decide which of these two approaches is best for where our models are:
Proposed Solution
The proposal in this issue is to remove the shared-models npm package and move all of our models to the (notebook, cells, outputarea) packages. This decision only pertains to which npm package these models are in, rather than to the details of what those models are.
Pros:
shared-models
package reduces the number of npm packages across which our model APIs are spread.shared-models
package is only a partial representation of the overall model APIs, so they are less useful on their own.Cons:
shared-models
to build something without our (notebook, cells, outputareas) packages.shared-models
is a breaking API. to be clear though, there are other breaking API changes in #12359Summary
Vote Yes if you agree with the following:
Vote No if you disagree.
@jupyterlab/jupyterlab-council votes
Edited to correct issue link
The text was updated successfully, but these errors were encountered: