-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental Parsing #49
Comments
Update: I tried implementing the |
I've been thinking about this a little bit and I can think of a completely different paradigm for handling this case neatly - probably enough of a change for a v2.0 type hypothetical version though... Instead of parsing into whatever the top-level (owned) #[rust_sitter::language]
pub enum Expr {
#[rust_sitter::leaf(pattern = r"\d+", transform = |v| v.parse().unwrap())]
Number(u32),
#[rust_sitter::leaf(pattern = r#"\"([^\"]|\\.)*\""#, transform = |v| v)]
String(&'src text),
Add(Box<Expr>, Box<Expr>)
} we would generate the following: pub struct Program {
src: String,
exprs: Arena<Expr>,
root: Handle<Expr>,
}
#[derive(Copy, Clone)]
pub struct TextRange(std::ops::Range<usize>);
pub enum Expr {
Number(u32),
String(TextRange),
Add(Handle<Expr>, Handle<Expr>)
}
impl std::ops::Index<TextRange> for Program {
type Output = str;
fn index(&self, index: TextRange) -> &str {
&self.src[index.0]
}
}
impl std::ops::Index<Handle<Expr>> for Program {
type Output = Expr;
fn index(&self, index: Handle<Expr>) -> &Expr {
&self.exprs[index]
}
} Besides just a performance advantage (as we see from the arena approach of Naga and the potential for 0-copy string literals etc), this would let us track exactly what changes are made to the AST and enforce that the user either only make changes by modifying the underlying source and re-parsing (in which case the AST nodes can be re-used where possible via a traversal) or allow the user to make changes to the parsed AST directly but by doing so they invalidate the ability to re-parse: pub struct ReParseable(());
pub struct Modifiable(());
#[derive(Clone)]
pub struct Program<T> {
...
_phantom: PhantomData<T>,
}
impl Program<ReParseable> {
pub fn change_source(&mut self, delta: ...) {
...
self.reparse()
}
/// Disables reparsing - can't be undone
/// (but `Program` is cloneable so clone before modifying if you want the best of both worlds)
pub fn make_modifiable(self) -> Program<Modifiable> {
Program {
..self
}
}
}
impl std::ops::IndexMut<Handle<Expr>> for Program<Modifiable> {
fn index_mut(&mut self, index: Handle<Expr>) -> &mut Expr {
&mut self.exprs[index]
}
}
... Interested in any thoughts/criticisms before I (potentially) look further into this. |
I know the original blogpost mentions this, so I'm surprised there's not also a tracking issue for this enhancement. I've been playing with this a little bit, and there are a few issues that I think might need to be solved:
In my version this is just a wrapper around the internal tree, but maybe we don't want that to be exposed to the user? Because of macro sanitation stuff this might overlap with Deduplicate field logic into runtime utility #14, because moving the logic of the generated parser function into the runtime means the internals of the EditHandle can be private to that crate.
Either way, this would definitely be a breaking API change.
Clone
(so we can keep a clone in our possession), this can't really be overcome.The solution would be to require the user to pass ownership to their object back for reparsing. Which they could then choose to avoid by making everything cloneable, or they could just deal with the consequences.
As a compromise, maybe there is a way to allow the user to give only the cloneable parts back? Like, a
PseudoClone
wrapper around non-clonable values, which (likeBox
andSpanned
and such) would be ignored by the grammar, but could be filled with a default value for the edit pass? But this would mean we need to keep track of the branches that have these wrappers in them, as those definitely need to be visited on every extraction.The most trivial, but also most annoying solution would be to never give the user full ownership: instead have the structure wrapped inside the EditHandle, and only give out immutable references.
But perhaps even more trivial: maybe we can just ignore this. Changing the contents of a syntax tree like this doesn't seem like a very common use case, and if the user already decides to keep it up to date with edits then they probably wouldn't want to do that anyway. It's just important to make them aware of the consequences: "All nodes that are involved in an edit will have their changes overwritten, but the remaining nodes aren't automatically reset." This could be slightly unpredictable, but a risk they'd have to take.
PseudoClone
nodes could then also have the role of allowing the user to explicitly control what parts of the tree should be reused.The text was updated successfully, but these errors were encountered: