Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: refactor go schema extraction #1700

Merged
merged 1 commit into from
Jun 7, 2024

Conversation

worstell
Copy link
Contributor

@worstell worstell commented Jun 7, 2024

initial step towards #1518

  • integrates with (forked) go/analysis to run static analysis tasks for schema extraction
  • implements first extraction task, type aliases
  • refactors existing build stack to use the new extractor, combining results with legacy so we can incrementally migrate
  • removes typealias extraction logic from legacy code

@worstell worstell requested a review from alecthomas as a code owner June 7, 2024 03:50
@worstell worstell requested review from a team and wesbillman and removed request for a team June 7, 2024 03:50
@@ -1,5 +1,8 @@
package compile

// TODO: This file is now duplicated in go-runtime/schema/analyzers/parser.go. It should be removed
// from here once the schema extraction refactoring is complete.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just duplicated this file to avoid having to export everything from it's new location (so the legacy extractor can continue using it) only to make it all private again once the refactor is complete

@ftl-robot ftl-robot mentioned this pull request Jun 7, 2024
@@ -106,44 +107,34 @@ type ParseResult struct {
Errors []*schema.Error
}

// ExtractModuleSchema statically parses Go FTL module source into a schema.Module.
func ExtractModuleSchema(dir string, sch *schema.Schema) (optional.Option[ParseResult], error) {
func legacyExtractModuleSchema(dir string, sch *schema.Schema, out *analyzers.ExtractResult) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got a bit hairy attempting to run the legacy extractor as a completely isolated process since things are so entangled

instead modified the legacy code to accept an output pointer as param (out *analyzers.ExtractResult). the initial caller provides the partial result from the prior extraction step, which gives the legacy extractor access to details (native names and decls) that it needs. the output/result is updated in-place during the legacy extraction

@worstell worstell force-pushed the worstell/20240604-refactor-go-schema-parsing branch 3 times, most recently from 4546309 to 751069c Compare June 7, 2024 05:33
@worstell
Copy link
Contributor Author

worstell commented Jun 7, 2024

so many go.mod changes 😭 separated them into their own commit for easier reviewing

@worstell worstell force-pushed the worstell/20240604-refactor-go-schema-parsing branch 2 times, most recently from 0f2c7ec to 7850af9 Compare June 7, 2024 06:03
Copy link
Collaborator

@alecthomas alecthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing Lizzy. This will be so much cleaner.

A couple of thoughts/questions/comments:

This is not an immediate concern, but if we could switch to using the analysis package's Diagnostic system, we'd get cleaner integration with the package.

Typically with the analysis package each pass will produce a distinct type that other analysers can require, and they will then be run in the correct order, and values passed through. It seems we could utilise this system rather than having a single extractorContext passed through and mutated, and I think (though am not sure) that we would benefit from parallelisation then. For example, the TypeAliasExtractor could output a []*schema.TypeAlias.

Another one for later, but I think we could also utilise the facts system. My first thought along those lines is that a first pass could add a schema.Ref fact to every node that is a reference to an FTL type or value. This would allow latter passes to just retrieve these facts directly from the node, rather than having to recompute them.

But yeah, overall, absolutely awesome.

if err != nil {
return optional.None[analyzers.ExtractResult](), err
}
result, ok := maybeResult.Get()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just return an error from extract.Extract() in this case, so we can make these results not be optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i got a bit optional-happy 😅

@alecthomas
Copy link
Collaborator

so many go.mod changes 😭 separated them into their own commit for easier reviewing

Fear not, I just found this which made it actually quite okay :)
image

@worstell
Copy link
Contributor Author

worstell commented Jun 7, 2024

This is not an immediate concern, but if we could switch to using the analysis package's Diagnostic system, we'd get cleaner integration with the package.

ah in place of shoving the schema errors into the result? will check it out!

Typically with the analysis package each pass will produce a distinct type that other analysers can require, and they will then be run in the correct order, and values passed through. It seems we could utilise this system rather than having a single extractorContext passed through and mutated, and I think (though am not sure) that we would benefit from parallelisation then.

currently the extractors actually do produce their own results (rather than mutate a single result), they just share the same type (extractorContext, maybe a confusing name)— so i believe we're already getting parallelization except for dependencies, which may end up being just finalizer -> everything else

For example, the TypeAliasExtractor could output a []*schema.TypeAlias.

i initially had the same thought, actually, to return []*schema.TypeAlias directly. i moved to the current approach to handle transitive visibility, e.g.:

pass A: export typealias Alias foo.Thing, export data Thing{}
pass B: data Thing{}, data Other{}
final result (merged): export typealias Alias foo.Thing, export data Thing{}, data Other{}

basically everyone producing a partial view of the world; then these get merged and the highest visibility Decl wins out in the case of conflicts. now in thinking about it more, this does feel cleaner as a separate calculation in finalizer that looks at the complete schema and uses refs to update visibility as needed. will take a look at updating tomorrow

Another one for later, but I think we could also utilise the facts system. My first thought along those lines is that a first pass could add a schema.Ref fact to every node that is a reference to an FTL type or value. This would allow latter passes to just retrieve these facts directly from the node, rather than having to recompute them.

amazing idea!!

@worstell worstell force-pushed the worstell/20240604-refactor-go-schema-parsing branch 4 times, most recently from a51bf8c to 6ecedcd Compare June 7, 2024 19:49
@worstell
Copy link
Contributor Author

worstell commented Jun 7, 2024

@alecthomas this has been updated with the following:

  • uses diagnostics instead of manually propagating schema errors
  • extractors just return list of relevant Decls (also native names for now, but we might be able to get rid of this down the line by managing it as a separate pass)
  • Decl visibility is updated transitively in finalizer — at the moment we're also running the same logic in compile/build.go after the legacy extractor runs, but we will remove this once we're migrated and just rely on the finalizer

initial step towards #1518

- integrates with (forked) go/analysis to run static analysis tasks for schema extraction
- implements first extraction task, type aliases
- refactors existing build stack to use the new extractor, combining results with legacy so we can incrementally migrate
- removes typealias extraction logic from legacy code
@worstell worstell force-pushed the worstell/20240604-refactor-go-schema-parsing branch from 6ecedcd to cb33912 Compare June 7, 2024 20:02
@worstell worstell merged commit 21930e3 into main Jun 7, 2024
37 checks passed
@worstell worstell deleted the worstell/20240604-refactor-go-schema-parsing branch June 7, 2024 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants