Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Automatic extraction #58

Merged
merged 17 commits into from
Mar 11, 2021
Merged

[FEATURE] Automatic extraction #58

merged 17 commits into from
Mar 11, 2021

Conversation

darionco
Copy link
Contributor

@darionco darionco commented Mar 2, 2021

Addresses: #48

First pass at auto-generating the type definitions and adding JSDoc style (TypeDoc) comments to the declarations.

I have tested this locally but I haven't tested it with gpuweb/cts so it's not ready to merge but I thought I'd submit this PR to start getting feedback early.

What it does:

  • Added the gpuweb/gpuweb repo as a submodule
  • Added a script to generate the types from the bikeshed file in that repo
  • In the future a hook could be created to automatically generate the types as soon as a new version of the spec is pushed

Screenshot from 2021-03-02 15-15-17

@kainino0x kainino0x self-requested a review March 3, 2021 02:28
@kainino0x
Copy link
Collaborator

Amazing work! I will take a look soon, right now a bit bogged down.
Especially cool that you got documentation importing working.

I'll also be interested to look more closely at how the upstream projects work, and whether I think the style they use will be sufficient. In particular, it looks like it makes similar decisions to https://github.com/microsoft/TypeScript-DOM-lib-generator, which I used to generate the very first version of this file but later diverged from heavily. One reason I did this was to prevent mostly-empty interfaces from being compatible (e.g. prevent a GPUTexture from being implicitly convertible to a GPUSampler), so I wonder if we can find a solution to that (either with private __brands like we have now, or some other solution).

@darionco
Copy link
Contributor Author

darionco commented Mar 3, 2021

Thanks for looking at this PR.

I had looked at TypeScript-DOM-lib-generator but decided against using it because it requires a json object and proposing changes that fix edge cases (like the __brand hack) is out of the question. Instead, I opted for using webidl2 directly and webidl2ts (which is heavily inspired by TypeScript-DOM-lib-generator) for a couple of its parsing functions.

I created bikeshed-to-ts to handle the rest:

  • it loads a .bs file and extracts the IDL using a technique very similar to the code in extract-idl.py
  • extracts all definition blocks (<dl dfn-type=... dfn-for=...>) using regexes
  • parses the result with webidl2
  • uses webidl2ts to generate TS AST per ADL declaration in the bikeshed file
  • consolidates the AST and its objects
  • enriches the type declarations in the AST with documentation comments using the extracted type definitions

I had seen #11 and was wondering about it, hopefully this gets released soon. In the mean time I can implement a hack similar to __brand, do you know of anything that works with interface rather than class? I know that doing something like

interface A {
  readonly __brand: unique symbol;
}

works but it exposes __brand as a public member that would crash if accessed at runtime.

@kainino0x
Copy link
Collaborator

kainino0x commented Mar 3, 2021

Thanks for the details!

  • extracts all definition blocks (<dl dfn-type=... dfn-for=...>) using regexes

Aha, this makes sense. Unfortunately the spec may not ALWAYS use <dl> to define every item in the spec, but this technique should at least work for MOST items.

In the mean time I can implement a hack similar to __brand, do you know of anything that works with interface rather than class? I know that doing something like

Unfortunately I don't. Putting a non-private brand in the interface is probably fine - we can limit this only to our empty interfaces.

There are a number of alternatives out there, which we could consider. enums might work since these aren't numeric types?
https://michalzalecki.com/nominal-typing-in-typescript/
https://betterprogramming.pub/nominal-typescript-eee36e9432d2

I know TypeScript's built-in dom types don't use class, but instead define both an interface (interface HTMLImageElement) and a var (declare var HTMLImageElement). I think that's more accurate as it expresses overrideability of built-in interfaces (e.g. globalThis.HTMLImageElement = ...;).

We definitely want these types to be as high quality as possible, and that probably means at least dropping the class thing if not nominal typing altogether. It's not a huge deal if we lose the nominal typing for now, though it would be great to bring back eventually.

@kainino0x
Copy link
Collaborator

Aha, this makes sense. Unfortunately the spec may not ALWAYS use <dl> to define every item in the spec, but this technique should at least work for MOST items.

A thought: Since the bikeshed source is markdown+html-ish, regular expressions could potentially break down in certain cases (like nested <dl>s?). Wonder how possible it would be to either pull the definitions out of the generated HTML (which is full of a lot of generated stuff, but might be possible) or use Bikeshed's actual parser?

@darionco
Copy link
Contributor Author

darionco commented Mar 4, 2021

It's not a huge deal if we lose the nominal typing for now, though it would be great to bring back eventually.

I feel like we should try to keep nominal typing, the main reason to use TypeScript is type safety/checks and the definitions should provide just that.

There are a number of alternatives out there, which we could consider. enums might work since these aren't numeric types?

yeah, I'll implement the enum approach and we can revisit once TypeScript releases the unique keyword update.

A thought: Since the bikeshed source is markdown+html-ish, regular expressions could potentially break down in certain cases (like nested <dl>s?). Wonder how possible it would be to either pull the definitions out of the generated HTML (which is full of a lot of generated stuff, but might be possible) or use Bikeshed's actual parser?

right now it handles the file line-by-line, which makes dealing with regexes and XML easier so it should be sufficient for now. I am still learning the bikeshed spec though so it might be incorrect, I'll give switching the custom parsing with the bikeshed parser a try and report back.

@darionco
Copy link
Contributor Author

darionco commented Mar 4, 2021

I decided to go with adding a __brand declaration to the interfaces because it can be written in a way that produces less cryptic errors:
Screenshot from 2021-03-04 14-00-36

I used bikeshed-js to parse the .bs, since bikeshed is written in python the integration is awkward, I did toy with the idea of writing this in phyton but decided against it.

I can get an HTML file from the bikeshed parser now, but I am having trouble wrapping my head around what to do with it. It is much easier to parse HTML but it seems to me that things are not as neatly organized/printed as in the bs file, feels like some metadata is missing from the HTML (or it's harder to find, rather).

Can we define what needs to be included in the JSDoc comments?
Using that we could come up with a set of heuristics to extract the data from either the bikeshed file or HTML file and decide which one would be easier to use.

@kainino0x
Copy link
Collaborator

I can get an HTML file from the bikeshed parser now, but I am having trouble wrapping my head around what to do with it. It is much easier to parse HTML but it seems to me that things are not as neatly organized/printed as in the bs file, feels like some metadata is missing from the HTML (or it's harder to find, rather).

I wasn't sure whether this would be the case, but I am not too surprised. Indeed there is a ton of HTML generated for bikeshed code, but I wasn't sure if all the metadata would be included in the end. If not, then I guess it's not a good suggestion, sorry :/

Can we define what needs to be included in the JSDoc comments?

That's a good question. Unfortunately there's no immediately obvious answer - there's no strict delineation between "summary" and "detailed algorithm" for various items in our spec, though it did seem like you had hit a pretty good balance here. For requestAdapter for example, I think we want exactly what you got. More generally:

  • Include summary of the item (interface, attribute, method/constructor, argument, dictionary, dict-member, typedef/callback, etc.)
    • For items defined in <dl>s, I don't know if this should be the full <dd> or just the first paragraph. First paragraph probably works well in general - in fact we could tweak the spec to make sure the first paragraph is always a good summary.
      • (But if we did want to include more, it should include Note:/<div class=note> items, but still exclude algorithm/Issue:/<div class=issue> items.)
    • For anything defined at the top level, like interfaces, I don't know how to extract the right info. Unfortunately right now we don't have a consistent style in the spec - sometimes a <dfn> is inside a header (<h1-6>) and other times it's inside a paragraph.

Anyway, we don't have to solve the documentation perfectly right now. Auto-generating even with NO documentation would be a huge help!

In fact, we don't even have to fully auto-generate, and in fact right now we probably won't want to, because at any given time there are usually some deprecated items that we still include in the types because implementations (*cough*chrome) isn't quite up to spec yet. I'm imagining this workflow for a developer updating these types:

  • Run the autogenerator, it generates into a temporary, local, .gitignored file
  • Open vimdiff/similar between generated and checked-in file
  • "Pull" changes from generated to checked-in file manually, one-by-one, to leave intentional diffs in.

Someday we probably won't need this, but right now it would be a very nice way to reduce the maintenance burden but maintain the flexibility.

@darionco
Copy link
Contributor Author

darionco commented Mar 9, 2021

I'm imagining this workflow for a developer updating these types:

  • Run the autogenerator, it generates into a temporary, local, .gitignored file
  • Open vimdiff/similar between generated and checked-in file
  • "Pull" changes from generated to checked-in file manually, one-by-one, to leave intentional diffs in.

Agreed, that makes sense to me, I have pushed a commit that changes the generation script to write its result to dist/generated.d.ts (which is git ignored). The first version of this is going to be painful as the diff between index.d.ts and generated.d.ts is huge right now.

I don't feel I am equipped to decide which changes should be kept and which shouldn't though :/

Copy link
Collaborator

@kainino0x kainino0x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries - thanks for this awesome contribution!

I think for the first version we can basically replace the checked-in version with the generated version, then look through the old version and port any necessary changes into it. I can take care of that.

I'm happy to start landing this and then we can work through the remainder afterward. Left a few minor comments and then LGTM :)

package.json Outdated Show resolved Hide resolved
dist/index.d.ts Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
yarn.lock Outdated Show resolved Hide resolved
Copy link
Collaborator

@kainino0x kainino0x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@kainino0x kainino0x merged commit a3b8dd8 into gpuweb:main Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants