RFC: AST node IDs #5689

overlookmotel · 2024-09-11T01:04:21Z

overlookmotel
Sep 11, 2024
Maintainer

We have decided to add NodeIds to AST node types. Here are my suggestions on how we should approach this.

Motivating use cases

Store data about nodes out of band (e.g. comments).
Get parent of a node via an ID -> parent ID table.
Get children of a node via an ID -> child IDs table.
Uniquely identify AST nodes (as requested by Rolldown, also useful for e.g. traverse: insert new statements(If any) after calling each walk_statement in walk_statements #4767).

I feel storing data out of band is the strongest motivation. There are viable alternatives for the other 3:

Getting parents: Build stack of ancestors in visitor (like Traverse does).
Getting children: Traverse down the AST.
Uniquely identify nodes: Use pointer equality (it's faster).

Blocking issues:

Related discussions:

Previous attempts:

Proposed design

What kind of IDs?

I suggest that we have a single node ID for all nodes, rather than separate IDs (StatementId, ExpressionId etc). This aligns with the existing AstNodeId.

We may want in future to split into multiple IDs (StatementId etc). But for first implementation, a single ID is simpler.

What to call it?

NodeId. AstNodeId seems overly verbose. What other kinds of nodes do we have?

Rename the existing AstNodeId. refactor(semantic): s/AstNodeId/NodeId #5740

What gets an ID?

Only structs should have an ID. Enums e.g. Statement should not itself have an ID. But all the enum's variant "payloads" (e.g. BlockStatement) will have an ID. Rationale:

It is difficult to find a place to store IDs in enums.
This aligns with our current AstNodeId.
Unclear if any use cases exist which require Statement or Expression to have its own ID.
We can use a trait GetNodeId implemented on Statement etc to get its "payload" type's ID.

In my opinion, best to not add IDs to enums initially, and we'll discover if that blocks any use cases, and if the workarounds to support those use cases are painful/expensive.

ID type

Currently our IDs are wrappers around a NonMaxU32. That's motivated by making Option<Id> 4 bytes (vs Option<u32> which is 8 bytes).

This imposes some costs:

Every read or write of an ID has to convert from the internal representation to a zero-based ID. This is mostly a single XOR assembly instruction (id = internal_id ^ u32::MAX), but in some cases may have further knock-on effects. A small overhead, but it's a very hot path e.g. in linter.
In a post-semantic AST, you know that all IDs are Some, but constantly have to call unwrap on Option<Id>s. This complicates our code, and adds panicking branches to a lot of code.
Extra dependency on nonmax crate.

I propose an alternative:

Use 0 as a sentinel value equivalent to Option::<NodeId>::None.
If we have the need to store Option<NodeId> in places where extra bytes are undesirable, we can get a niche in NodeId with a different NonMaxU32 implementation.

(Ideally we would also change ScopeId, SymbolId and ReferenceId to be stored as plain u32s, instead of Option<NonMaxU32>s, for same reasons as above)

When to create IDs

This is the tricky part.

Ideally we'd create IDs in parser and set the node_id field on AST nodes when they are created. This:

is the most performant solution, as you don't have to write to the node_id field twice (dummy value first, then real value later).
means node_id fields can be a plain NodeId, not Cell<NodeId>.

However, this is not currently feasible. We'll need to generate NodeIds in SemanticBuilder initially.

Why not in parser?

1. Order of IDs

Currently AstNodeIds are in traversal order. Program is node ID 0. So iterating over AstNodes::nodes gives you nodes in traversal order.

But the parser creates nodes in a different order. Program is the last node to be created, so would have the highest NodeId.

Note: The parser does not create nodes in reverse travesal order either. Numbered in order of node creation:

foo * bar + qux
_1_   _2_
____3____
            _4_
_______5_______

When traversing this AST, node IDs in visitation order are: 5, 3, 1, 2, 4.

It is unclear if any linter rules rely on current iteration order of AstNodes::nodes or not. If they do, they can be refactored so they don't, so this isn't a deal-breaker, but we'd need to do that first.

Don thinks that visiting nodes in parse order rather than traversal order might actually turn out to be a perf gain, as nodes would be visited in same order that they're layed out in arena - perfect access pattern for CPU caches.

2. Unused node IDs

This is the hard problem.

In some places (e.g. when parsing what may or may not be an arrow function) the parser creates nodes, but then may rewind and discard them, and then re-parse that section of code again. If parser creates a NodeId for every AST node created (e.g. in AstBuilder), this would result in gaps in the sequence of NodeIds which appear in the AST.

Why is this a problem?

Later on SemanticBuilder will read NodeIds from the AST and insert data into side arrays indexed by NodeId (AstNodes::nodes, AstNodes::parent_ids).

If we cannot guarantee that there are no gaps in the sequence of NodeIds, then these side arrays are likely to end up containing sections of uninitialized bytes. There is nothing to stop user creating a NodeId from an arbitrary u32 and then reading from AstNodes::parent_ids or AstNodes::nodes with that NodeId. If that NodeId hits an uninitialized "gap", then that's instant UB. Bounds checks cannot prevent this, as gaps will be in middle of the sequence, so the NodeId can be in bounds, but still read uninitialized bytes.

How can we solve this?

Fill the side arrays at the start with zeros (assuming that's a legal bit pattern for the types stored in the arrays). This means all data is initialized so UB is not possible. But it's expensive writing zeros over a large chunk of memory.
Track in SemanticBuilder which IDs have been seen, and where gaps are. At the end of SemanticBuilder's traversal, fill any gaps with zeros, so no uninitialized gaps remain. Unfortunately, because IDs will be encountered out of order, keeping track of what ranges of IDs have been seen, and where gaps are is non-trivial, and probably expensive.
Fill a side array with bools to represent "node exists" or "node does not exist" for each NodeId. Initialize it with zeros, and set each entry to true when that NodeId is found. This array is only bools so will not take as much memory as the main arrays, and so is cheaper to set up. But downside is it imposes a cost on every read from the arrays - now every lookup into nodes or parent_ids needs to first check the node_exists array, to make sure the node exists before reading from nodes.

All of these are possible, but all come with a significant performance penalty.

I can imagine these replies:

"We can just make sure we don't look up non-existent NodeIds and tests will catch any bugs." I don't think that's a good idea. Undefined behavior is not a normal bug. It can cause strange behavior in unrelated and unexpected places. It may even cause tests to pass when they shouldn't, and UB is extremely hard to debug. e.g. inserting a dbg!() can magically make the problem you're trying to track down disappear. We've seen exacty this problem in transformer due to AstBuilder::copy.
"We can make sure the parser doesn't produce any gaps in NodeId sequence". We can try to do that, but can we statically prove it's impossible? I don't think we can, in which case we open the door to UB.
"Prevent user creating arbitrary NodeIds, so any NodeId that exists is always valid". Yes, but nothing prevents using a NodeId from one AST to index into the arrays relating to another AST. UB!

There is one possibility which could work, but is quite a sizeable undertaking (see below).

3. Cost of growing `Vec`s

Let's say we are creating NodeIds in parser, and also building the side arrays e.g. parent_ids in parser too.

How many nodes are there going to be? It's impossible to know at the outset. #4367 demonstrated that growing large Vecs because you didn't allocate enough capacity initially is very costly. And for large source files, the number of AST nodes can be huge (100,000+) so these Vecs are big.

Maybe we can just allocate capacity initially to the number of bytes in source text + 2 (I think that's the maximum number of nodes you can get per byte of source e.g. 0+0+0 = 5 bytes, 7 nodes). That will be way too big for normal JS code, but it guarantees no reallocations. Then shrink the arrays to fit after parsing is complete, to free the excess memory again.

We would need to check:

Shrinking doesn't cause a reallocation and copy - depends on behavior of system allocator.
This works OK on WASM, which doesn't have virtual memory.

So maybe this isn't a deal-breaker, but there are potential challenges.

What to do

I suggest that we tackle this in 3 phases:

Phase 1: Generate IDs in semantic.
Phase 2: Optimize.
Phase 3: Move generating IDs into parser.

Phase 1: Add `NodeId`s

`node_id` fields

Add node_id: Cell<NodeId> fields to all AST types.

In my view, the fields should be visible in the type defs and not "hidden away" by adding them in a macro. So we'd need to add them manually. But we could do it in #[ast] macro initially while we're testing this out.

node_id would ideally be the first field on each type (before span).

`GetNodeId` trait

Similar to GetSpan. We can #[generate_derive(GetNodeId)].

If we make node_id the first field on each type, because the AST is now #[repr(C)], the memory offset of the field is guaranteed the same for every type. This makes GetNodeId::node_id extremely cheap even for the massive enums like Expression, as compiler sees it needs no branches to fetch the ID: https://godbolt.org/z/PGfWx6T57

2nd `AstBuilder` for transformer

Create a 2nd AstBuilder for use in transformer + minifier (AstNodeBuilder?).

struct AstNodeBuilder<'a> {
    allocator: &'a Allocator,
    next_node_id: u32,
}

Contains a counter for next NodeId.
Sets node_id on new nodes.
Is not Copy, as needs to be used as &mut AstNodeBuilder.
Stored as a field on TraverseCtx, so only a single instance exists - no RefCells or unsafe required.

If we don't pass the "parser" version of AstBuilder to any transformers, then they have to create nodes via AstNodeBuilder, making it impossible to accidentally create any AST nodes without an ID.

AstNodeBuilder can also help us get other things right in the transformer. It can enforce that:

New IdentiferReferences must have a valid ReferenceId.
New BindingIdentifiers must have a valid SymbolId.
Nodes with a scope_id field must be created with a valid ScopeId.

Make all AST node creation happen via an `AstBuilder`

refactor(parser): use AstBuilder #5743

Mark all AST structs #[non_exhaustive]. This means they cannot be constructed by code outside of the oxc_ast crate.
Remove any methods from oxc_ast which construct AST nodes except in AstBuilder (e.g. ForStatement::new).
Go through all Oxc's codebase and replace any explicit AST node construction (e.g. IdentifierReference {span, name, reference_id}) with AstBuilder calls.

If we did not create the 2nd AstBuilder for transformer yet, we'll need to add methods to current AstBuilder for the transformer to use:

Create IdentifierReference with a reference_id.
Create BindingIdentifier with a symbol_id.
Create various AST nodes with a scope_id.

Without these methods, we won't be able to transition all the transformer over to creating all AST nodes via AstBuilder.

Other IDs

I think we should leave the other ID fields (scope_id, symbol_id, reference_id) present in AST, as they are now. @rzvxa suggested accessing them via side tables indexed by NodeId. I don't think this is ideal because:

Those fields are not so common, so they take up little memory.
Fetching ScopeId etc via a side table adds an extra indirection.
These side tables would be mostly empty space - most nodes do not have a SymbolId.
Transformer only has access to ScopeTree + SymbolTable, not Nodes.

Phase 2: Optimize

Type layouts

Because NodeId is a u32, the field order of types will not be optimal, and in most types it will have 4 bytes padding after it. So every AST type will grow by 8 bytes.

To fix this, optimize field order in oxc_ast_tools, filling the 4-byte gap after NodeId with other u32 / bool / single-byte enum fields. As many types currently have 4 or more bytes padding, rearranging the fields will win back that 8 bytes in most cases, and adding NodeId will be free for many/most types.

`AstNodes` SoA

Convert AstNodes into full struct-of-arrays (split AstNode up).

AstNode will no longer be required (though we may need to keep it around until we've transitioned the linter to stop using it).

Access methods on `NodeId`

Rather than getting info about a node via methods on Semantic, I propose that we implement methods on NodeId itself.

// Before
let flags = ctx.semantic.nodes.get_flags(node_id);
// After
let flags = node_id.flags(ctx);

Along the same lines as: https://github.com/oxc-project/backlog/issues/99

I feel this is cleaner and easier to read. It also avoids the huge pile-up of verbosely-named methods on Semantic.

Count nodes, scopes, symbols, references in parser

SemanticBuilder::build traverses the entire AST to count nodes, scopes, symbols and references. These counts are used to allocate sufficient capacity in Nodes, ScopeTree and SymbolTable to avoid those Vecs growing during the main traversal.

oxc/crates/oxc_semantic/src/builder.rs

Lines 223 to 238 in 5482e73

    
           // Count the number of nodes, scopes, symbols, and references. 
        
           // Use these counts to reserve sufficient capacity in `AstNodes`, `ScopeTree` 
        
           // and `SymbolTable` to store them. 
        
           // This means that as we traverse the AST and fill up these structures with data, 
        
           // they never need to grow and reallocate - which is an expensive operation as it 
        
           // involves copying all the memory from the old allocation to the new one. 
        
           // For large source files, these structures are very large, so growth is very costly 
        
           // as it involves copying massive chunks of memory. 
        
           // Avoiding this growth produces up to 30% perf boost on our benchmarks. 
        
           // TODO: It would be even more efficient to calculate counts in parser to avoid 
        
           // this extra AST traversal. 
        
           let mut counter = Counter::default(); 
        
           counter.visit_program(program); 
        
           self.nodes.reserve(counter.nodes_count); 
        
           self.scope.reserve(counter.scopes_count); 
        
           self.symbols.reserve(counter.symbols_count, counter.references_count);

Move this logic into parser. @DavidHancu mentioned on Discord that he has a working implementation, but he's not submitted a PR yet.

Probably we could implement the counters in AstBuilder.

The tricky part will be getting an accurate count when the parser rewinds - currently counts will be too high because some nodes are created and then discarded again.

Or possibly we could just allocate too much capacity (source text length + 2), and then shrink the arrays to fit once we know the real number (see "3. Cost of growing Vecs" section above). An investigation into this option: #5703

Phase 3: Move generating `NodeId`s into parser

TODO: I'll write this up later.

Boshen · 2024-09-11T02:03:41Z

Boshen
Sep 11, 2024
Maintainer

Previous comment from @TzviPM

Don't we store ast nodes in a bumpalo arena? Can we just use their index as an id for the node to avoid the idea of a "lazy" id. My concern with lazy ids is it will cause code that needs an id to have to either branch or risk panic. Either case will lead to a performance degredation.

5 replies

Boshen Sep 11, 2024
Maintainer

I am also confused about the sequential ID requirement. Why can't it be a UID?

Can we somehow workaround the ast nodes count requirement for reserving the nodes SoA 🤔?

Boshen Sep 11, 2024
Maintainer

We should state that NodeId is UID, it has no characteristics.

overlookmotel Sep 11, 2024
Maintainer Author

Don't we store ast nodes in a bumpalo arena? Can we just use their index as an id for the node

Bumpalo does not have indexes, only pointers. We could use pointers as unique IDs, but then Nodes, ScopeTree etc need to become hash maps instead of arrays. I imagine this would hurt perf.

Can we somehow workaround the ast nodes count requirement for reserving the nodes SoA 🤔?

I've added a section on counting nodes to main text above.

We should state that NodeId is UID, it has no characteristics.

Agreed. In particular, consumers cannot rely on NodeIds being numbered in traversal order (or any other order).

DavidHancu Sep 11, 2024

Can we use a slotmap? It would also help with maintaining cache locality when mutating ASTs. It's also indexed, so we can use the IDs as the key.

overlookmotel Sep 12, 2024
Maintainer Author

Slotmaps:

The problem is that AST types are of widely varying sizes - NullLiteral is 8 bytes, Class is 160 bytes. So:

Storing all AST nodes in a single array indexed by NodeId would take 160 bytes for each node. A lot of wasted memory.
Can solve problem (1) by making nodes array contain pointers, so 8 bytes per node. But then there's another problem - every time you dereference a Box, it's double-indirection (look up the pointer in nodes array, then follow the pointer).
The number of nodes is unknown at start of the parser. How much capacity to allocate in the array? perf(semantic): calculate number of nodes, scopes, symbols, references before visiting AST #4367 demonstrated that growing large Vecs because you didn't allocate enough capacity initially is very costly.

slotmap crate adds an overhead of 8 bytes to each entry, which is also pretty inefficient.

The other solution is to have a separate array for each AST type (i.e. Vec<NullLiteral>, Vec<Class> etc etc). Then each AST type would have its own ID (NullLiteralId, ClassId). I believe this is usually referred to as a "slab allocator".

The slab design does have some advantages. It would solve problems (1) and (2) above, but:

It doesn't solve problem (3).
Unergonomic API - if Box just stores just an ID, then you can't auto-dereference a Box any more, you need something like boxed.deref_with(&allocator).
As every node type is stored in separate memory region, unclear if this would be good for cache locality. It could actually be worse.
(this is the main reason) This would be a huge breaking change, and a very large re-architecting of Oxc's fundamentals. We have considered this option, but it's too massive a step to consider taking at present. There are other more pressing priorities.

Perhaps I've misunderstood you, and missed the point. If so, please tell me!

DavidHancu · 2024-09-11T11:45:41Z

DavidHancu
Sep 11, 2024

Regarding the parser improvements, a concern has been raised in Discord regarding binary size increases.

1 reply

overlookmotel Sep 12, 2024
Maintainer Author

As I replied on Discord, I'm not concerned by the 0.3% cost of counting nodes/scopes/symbols/references in the parser. We can do that unconditionally, at which point the binary size increase problem disappears.

I would be very keen to see your WIP implementation. Would you be able to make a PR? (rough, incomplete, or buggy are all fine at this stage!) It'd just be useful to see it and be able to run our benchmarks on it via CodSpeed to confirm the 0.3% cost.

overlookmotel · 2024-09-12T12:39:56Z

overlookmotel
Sep 12, 2024
Maintainer Author

@Boshen, @rzvxa and I had a meeting earlier today. We reached consensus between the 3 of us to follow the plan discussed above at least up to end of Phase 1.

I've tweaked the text above to reflect a couple of conclusions we came to in that meet (#[non_exhaustive]).

We plan to start implementing in a few days time. But hope others who may have comments have time to give feedback in next couple of days.

0 replies

rzvxa · 2024-09-12T18:28:49Z

rzvxa
Sep 12, 2024

Please disregard my previous comment!

We need both of them, I wasn't thinking clearly. However, Maybe we should implement this second ast builder as part of the oxc_semantic crate and perhaps use the same name. Since these are both ast builders, Just in different domains(and you never would want them both at the same time; It is one or the other). This second AST builder can be behind a feature.

5 replies

overlookmotel Sep 12, 2024
Maintainer Author

I think putting it in oxc_semantic is a good idea. But... I don't think we can. We plan to make all the AST types #[non_exhaustive], so any code to construct nodes will have to be in oxc_ast crate.

But:

We can put both builders behind cargo features - as you say, we only need 1 or the other, never both.
oxc_semantic can re-export the "with IDs" builder under the name AstBuilder.

So that's almost as good as what you suggested.

overlookmotel Sep 12, 2024
Maintainer Author

PS I still like this: oxc-project/backlog#79 But it seems no-one else does!

rzvxa Sep 12, 2024

I think putting it in oxc_semantic is a good idea. But... I don't think we can. We plan to make all the AST types #[non_exhaustive], so any code to construct nodes will have to be in oxc_ast crate.

I was thinking that we could use the semantic ast builder as a wrapper around the one in the oxc_ast, But yeah doing what you've suggested is better since we can feature-gate both. With the wrapper approach, we are at the mercy of the compiler - to optimize it - even though both builders are going to be inline and the extra instructions(zeroing cells) would ideally go away.

rzvxa Sep 12, 2024

PS I still like this: oxc-project/backlog#79 But it seems no-one else does!

I'm not really opposed to this change; However, I'm not too keen on it either. As Boshen mentioned on that issue, They both have their pros and cons so if I use Occam's razor I'd say let's stick to what we already have. Obviously, my opinion will change as soon as I see some sort of win from this refactor.

So basically; If we commit to this change I don't mind it at all, But even if you want to do it let's wait until after all three phases here. It'd be best to wait and find out about the new requirements of AstBuilder now that we are adding state to it, Then we can do this with a clearer vision(so we don't trap ourselves into a corner).

overlookmotel Sep 12, 2024
Maintainer Author

Yes, you're right. If we're ever going to do it, we're certainly not going to do it now, or any time soon. I probably shouldn't have brought it up.

DavidHancu · 2024-09-12T20:13:53Z

DavidHancu
Sep 12, 2024

Prior art: rustc uses pointers as node identifiers and they are using them as keys in HashMap with FxHasher.

1 reply

rzvxa Sep 12, 2024

There are multiple versions of AST/IR and IDs for different parts of the Rust, But AFIK; It's defined like so in the rustc_ast which seems like the place you are referring to.

https://github.com/rust-lang/rust/blob/master/compiler/rustc_ast/src/node_id.rs

Can you link this pointer-based node identifier? Might help to see how they are achieving this, Since we had a similar idea before.

Boshen · 2024-09-13T04:58:08Z

Boshen
Sep 13, 2024
Maintainer

Does it make sense to add the id generator to Allocator given `Allocator is a singleton.

ASTBuilder will be able to call Allocator::next_id from anywhere, without needing to make ASTBuilder a singleton.

4 replies

overlookmotel Sep 13, 2024
Maintainer Author

Hmm. Interesting idea. Allocator only achieves that by using Cell to wrap its mutable state. So, yes, we could add Cell<NodeId> to Allocator.

But... what problem are we trying to solve? Is there a difficulty with AstBuilder in the transformer being a singleton?

We never need to hold multiple copies of AstBuilder, or pass it around. So we can store it in TraverseCtx, which is already also a singleton. Every transformer visitor has access to it via ctx.ast.

Cell does not have the large runtime cost of RefCell, so it's probably fairly cheap in this case. But there's a reason why Rust has separate & and &mut types, rather than everything being "maybe mutable" like Cell is - compiler can optimize better. So in my view it's preferable to avoid Cell, unless we really need it.

Boshen Sep 13, 2024
Maintainer

User land can still use Visit and VisitMut, or even ad hoc &mut Program with AstBuilder.

There are already user land code using the ad hoc approach to change top level import statements.

overlookmotel Sep 13, 2024
Maintainer Author

If you're using VisitMut, you can put the AstBuilder in the visitor:

struct MyVisitor<'a> {
    ast: AstBuilder<'a>,
}

impl<'a> VisitMut<'a> for MyVisitor<'a> {
    fn visit_expression(&mut self, expr: &mut Expression<'a>) {
       *expr = self.ast.expression_boolean_literal(SPAN, true);
    }
}

or ad hoc code:

let mut ast = AstBuilder::new(&allocator);
for stmt in &mut program.body {
  *stmt = ast.expression_statement(SPAN, ast.expression_boolean_literal(SPAN, true));
}

I don't think these are so unergonomic. AstBuilder wasn't Copy until about 2 months ago when I made it so (which I now regret, obviously!)

(edited after I originally erroneously said that "parser-style" AstBuilder still be Copy - it can for now, but not once we use it to count AST nodes)

overlookmotel Sep 13, 2024
Maintainer Author

This also reminds me that we need some way to explain to consumers what the difference between the two AstBuilders is, and when to use each.

Probably we should name them something different from each other, otherwise it's quite hard to talk about. "parser-style AstBuilder" and "transformer-style AstBuilder" is a bit "you whaaaat?"

overlookmotel · 2024-09-19T13:27:52Z

overlookmotel
Sep 19, 2024
Maintainer Author

In my opinion, we should make the node_id fields on AST types pub(crate) only, so you can only access a node's ID via GetNodeId trait.

Only AstBuilder will create NodeIds, and only AstBuilder will set node_id fields on nodes.

This will prevent user from creating arbitrary NodeIds, or copying a NodeId from one node to another. This makes it impossible to accidentally produce 2 nodes in same AST with the same NodeId, which upholds the uniqueness property that we're after.

1 reply

overlookmotel Sep 19, 2024
Maintainer Author

As an optimization, we can add APIs to reuse the old NodeId when you replace one node with another in transformer:

// Wrapper around a `NodeId` which is not used in AST.
// Not `Copy` or `Clone`, so you can only use it once.
struct FreeNodeId(NodeId);

impl FreeNodeId {
    /// Consume AST node and get its node ID, wrapped in a `FreeNodeId`.
    /// This method takes `node`, an owned type, so can only get `FreeNodeId` from a node once.
    pub fn from_node<N: GetNodeId>(node: N) {
        Self(node.node_id())
    }

    /// Not public - only `AstBuilder` can unwrap `FreeNodeId`s
    fn into_inner(self) -> NodeId {
        self.0
    }
}

impl<'a> AstBuilder<'a> {
    /// Create `NullLiteral`, reusing an old node ID.
    /// Consumes the `FreeNodeId`, so it can't be used again.
    pub fn null_literal_replacement(&self, span: Span, node_id: FreeNodeId) -> NullLiteral {
        NullLiteral { node_id: node_id.into_inner(), span }
    }
}

The advantage of reusing old NodeId, rather than generating a new one, is that any data in side arrays indexed by NodeId is still valid - it's associated with the replacement node automatically.

DavidHancu · 2024-09-26T17:59:37Z

DavidHancu
Sep 26, 2024

The (draft) PR for node counting in the parser has been submitted: #6083

0 replies

This comment has been hidden.

Sign in to view

RFC: AST node IDs #5689

overlookmotel Sep 11, 2024 Maintainer

Motivating use cases

Blocking issues:

Related discussions:

Previous attempts:

Proposed design

What kind of IDs?

What to call it?

What gets an ID?

ID type

When to create IDs

Why not in parser?

1. Order of IDs

2. Unused node IDs

3. Cost of growing Vecs

What to do

Phase 1: Add NodeIds

node_id fields

GetNodeId trait

2nd AstBuilder for transformer

Make all AST node creation happen via an AstBuilder

Other IDs

Phase 2: Optimize

Type layouts

AstNodes SoA

Access methods on NodeId

Count nodes, scopes, symbols, references in parser

Phase 3: Move generating NodeIds into parser

Replies: 9 comments · 17 replies

Boshen Sep 11, 2024 Maintainer

Boshen Sep 11, 2024 Maintainer

Boshen Sep 11, 2024 Maintainer

overlookmotel Sep 11, 2024 Maintainer Author

overlookmotel Sep 12, 2024 Maintainer Author

overlookmotel Sep 12, 2024 Maintainer Author

overlookmotel Sep 12, 2024 Maintainer Author

This comment has been hidden.

overlookmotel Sep 12, 2024 Maintainer Author

overlookmotel Sep 12, 2024 Maintainer Author

overlookmotel Sep 12, 2024 Maintainer Author

Boshen Sep 13, 2024 Maintainer

overlookmotel Sep 13, 2024 Maintainer Author

Boshen Sep 13, 2024 Maintainer

overlookmotel Sep 13, 2024 Maintainer Author

overlookmotel Sep 13, 2024 Maintainer Author

overlookmotel Sep 19, 2024 Maintainer Author

overlookmotel Sep 19, 2024 Maintainer Author

overlookmotel
Sep 11, 2024
Maintainer

3. Cost of growing `Vec`s

Phase 1: Add `NodeId`s

`node_id` fields

`GetNodeId` trait

2nd `AstBuilder` for transformer

Make all AST node creation happen via an `AstBuilder`

`AstNodes` SoA

Access methods on `NodeId`

Phase 3: Move generating `NodeId`s into parser

Replies: 9 comments 17 replies

Boshen
Sep 11, 2024
Maintainer

Boshen Sep 11, 2024
Maintainer

Boshen Sep 11, 2024
Maintainer

overlookmotel Sep 11, 2024
Maintainer Author

overlookmotel Sep 12, 2024
Maintainer Author

overlookmotel Sep 12, 2024
Maintainer Author

overlookmotel
Sep 12, 2024
Maintainer Author

overlookmotel Sep 12, 2024
Maintainer Author

overlookmotel Sep 12, 2024
Maintainer Author

overlookmotel Sep 12, 2024
Maintainer Author

Boshen
Sep 13, 2024
Maintainer

overlookmotel Sep 13, 2024
Maintainer Author

Boshen Sep 13, 2024
Maintainer

overlookmotel Sep 13, 2024
Maintainer Author

overlookmotel Sep 13, 2024
Maintainer Author

overlookmotel
Sep 19, 2024
Maintainer Author

overlookmotel Sep 19, 2024
Maintainer Author