Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(all): use remark-validate-links to check internal links, convert to file-path links #1022

Merged
merged 96 commits into from
Feb 22, 2023

Conversation

alicewriteswrongs
Copy link
Contributor

@alicewriteswrongs alicewriteswrongs commented Feb 2, 2023

This adds a gh actions workflow which runs this action which in turns runs this CLI program which will check links in Markdown files in order to see if any are returning 404s.

It also checks that #anchor links have a matching anchor in the same file, which is pretty neat! So if you have some markdown like this:

# my anchor

And also see [my anchor](#my-anchovy)

you'll get an error! neat!

Between this program and the internal link checking that Docusaurus already does we have a fairly robust 'test suite' here for our markdown.

@vercel
Copy link

vercel bot commented Feb 2, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
stencil-docs ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Feb 22, 2023 at 8:58PM (UTC)

Copy link
Contributor Author

@alicewriteswrongs alicewriteswrongs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting one thing

mlc_config.json Outdated
{ "pattern": "^../" }
],
"timeout": "20s"
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the configuration for the markdown link check program.

  • the timeout is 20s because we have a few archive.org links which can be erroneously marked as dead because they take a while to respond
  • the ignorePatterns are set to /, ./ and ../ because that lets us have the markdown link checker ignore all the internal links, the checking of which is left to Docusaurus (it does a pretty good job already)

One consequence of that ignorePatterns bit that you'll notice is that it requires us to always use ./ or ../ for internal links in order to ensure that the link checker doesn't report spuriously broken links.

basically, if it encounters

link to [some page](dir/page)

it will try to resolve dir/page as a URL, since it doesn't understand Docusaurus' format and algorithm for turning that string into a link to, say, /dir/page.

So we just need to use the ./dir/page path instead, which is, as far as I can tell, equally supported by Docusaurus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it - I hate this config uses JSON (not that it's our choice) and we can't add this as comments :-(

@alicewriteswrongs alicewriteswrongs marked this pull request as ready for review February 2, 2023 21:26
@alicewriteswrongs alicewriteswrongs changed the title try out a gh action for checking status of links in md files use markdown-link-checker gh action to verify that external links are still fresh Feb 2, 2023
Copy link
Contributor

@rwaskiewicz rwaskiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's some false positives with respect to links with anchors in them 😬 I'm a little concerned about adding something that says everything's OK, but technically has broken links. Perhaps this is better than nothing though 🤔

docs/README.md Outdated
@@ -1,11 +1,11 @@
# Table of Contents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a question for another day, but does this file get used anymore? In the previous version of the site, it was used to generate the sidebar. I wonder if we have Docusaurus configured to explicitly exclude this file 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #1037 while waiting for the Stencil nightly CI fix to finish running

runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- uses: gaurav-nelson/github-action-markdown-link-check@v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a SHA for the version, and a comment to the right of it so dependabot will update both?

e.g.

Suggested change
- uses: gaurav-nelson/github-action-markdown-link-check@v1
- uses: gaurav-nelson/github-action-markdown-link-check@9710f0fec812ce0a3b98bef4c9d842fc1f39d976 # 1.0.13

- [@Event()](events#event-decorator) declares a DOM event the component might emit
- [@Listen()](events#listen-decorator) listens for DOM events
- [@Component()](./component#component-decorator) declares a new web component
- [@Prop()](./properties#prop-decorator) declares an exposed property/attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this link exist? I see docs/properties#the-prop-decorator-prop, but I don't see one for this particular anchor 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not, I think it will unfortunately only check anchors within the same file. What I ended up doing was basically converting all of these links to be relative links that start with ./. This works with Docusaurus properly, and it will check that the link resolves to a page, but I don't think that Docusaurus verifies that specific anchors within a linked page exist.

The markdown link checker, by contrast, checks that links either resolve to anchor links within the same file (if they look like [foo](#foo) or that they'll return a 200 when it makes a GET request to them (more or less). So basically it won't know what to do with the links that Docusaurus renders as internal links.

So anyhow, what I did to get Docusaurus' link-checking and the markdown link checker's link checking to work somewhat well together is converting Docusaurus links to use the relative path and then ignoring those links in the markdown link checker.

However, one thing I just tried is using the alternate file path linking supported in Docusaurus, like

[link](./foo.md#foo)

Both Docusaurus and the markdown link checker will check this and report that yes, this is pointing to a file in the right spot, but neither will check that the anchor tag foo is present in that file. Boo!

There's an issue on the Docusaurus repo here about this: facebook/docusaurus#3321

One suggestion in there is to convert to file-path links ([](./foo.md)) and then use a project like this one: https://github.com/remarkjs/remark-validate-links which might provide a better way to validate all the internal links between files. It looks like it checks that anchor tags are present in any linked file.

That option of course doesn't cover external links.

So my takeaway here is that there's not a perfect solution here. I think I'd be in favor of the following:

  • adding the markdown link checker, along the lines of how it's set up here where it's constrained to only checking external URLs
  • refactoring internal links to use the filepath-based linking
  • adding remark-validate-links to check that all internal links + anchor tags are valid (this would make it basically an extra layer of validation on top of what will already get checked by docusaurus build).

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remark plugin seems pretty cool as long as all of the tools can play together nicely. Writing relative file paths might be a bit tedious, but I feel like we had to do that sometimes in the old docs too so nbd. I don't really have a strong opinion here for next steps, but think this is a step in the right direction and probably better than the nothing we have today 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a plan to me. The only thing I'm hesitant on is converting internal links to use the filepath-based linking. I think this could make some maintenance a little tricky if we rename a file, but perhaps that's the cost you have to pay if you want anchor tag validation at the moment. I think I'm willing to pay that cost if it means automated tests. Does remark-validate-links flag you if you don't use file-path links? My other concern here is education across teams contributing to the site. If remark-validate-links doesn't flag non-file-path links, maybe the compromise here is to just move forward with mlc only here, and see how that works for us (taking that usage period as a time to better understand our needs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was just able to confirm that remark-validate-links does show a warning if you don't use file-path-based links. If you have, for instance, something like this:

| `single-export-module`        | All component and custom element definition helper functions will be exported from the `index.js` file in the output directory. This file can be used as the root module when distributing your component library, see [below](../custom-elements#distributing-custom-elements) for more details. |

it will say this:

Screenshot 2023-02-09 at 10 33 40 AM

this can be addressed (in this case) by changing the markdown to this:

| `single-export-module`        | All component and custom element definition helper functions will be exported from the `index.js` file in the output directory. This file can be used as the root module when distributing your component library, see [below](../output-targets/custom-elements.md#distributing-custom-elements) for more details. |

A slightly confusing thing with re-writing these to relative-file-path based links is that right now they are effectively relative paths based on their final url, so for instance from docs/introduction/upgrading-to-stencil-three.md, which has the url set to url: /docs/upgrading-to-stencil-3 in the frontmatter the relative url to the page rendered based on docs/output-targets/custom-elements.md is actually just ../custom-elements because that file has this in frontmatter: slug: /custom-elements

So it's a little bit funky to rewrite them, you can sometimes just put .md at the end of what's there already, but sometimes you have to figure out which .md file the relative URL would normally point to and then find it. Not impossible, but a little funky.

One consequence of that is that right now individual markdown files can be somewhat 'mobile', i.e. they can be moved around without all their relative links being broken. With this change, moving a file would require fixing all of the relative paths to other files (possibly some IDEs can fix such things easily). Anyway, not a major concern because we don't move files around all that much, but worth flagging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so update — that only addressed a small portion of the internal links. There's a lot more!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the relative file-path based links are the way to go though, the checking with remark-validate-links is just quite good at checking for headings and whatnot. A further advantage is that if the links are filepath-based then that is not a docusaurus-specific way to represent links between the files, but a more 'cross-platform' (if you will) way to do so, and, for instance, links on Github between files should be clickable (as they are, for instance, in this README file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Would you consider this ready to re-review then? I'd like to pull the branch down and do some faux-development with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to play with it, I am going through and changing over to relative links everywhere -- this requires editing ~400 or so of them and I'm not quite done yet!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add remark-validate-links to dev dependencies and then add running it to the CI as well

- [@Component()](./component#component-decorator) declares a new web component
- [@Prop()](./properties#prop-decorator) declares an exposed property/attribute
- [@State()](./state#the-state-decorator-state) declares an internal state of the component
- [@Watch()](./reactive-data#watch-decorator) declares a hook that runs when a property or state changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this link exist? I see docs/reactive-data#the-watch-decorator-watch, but I don't see this particular anchor

@@ -645,7 +645,7 @@ export class ToDoListItem {

## Prop Validation

To do validation of a Prop, you can use the [@Watch()](reactive-data/#watch-decorator) decorator:
To do validation of a Prop, you can use the [@Watch()](./reactive-data/#watch-decorator) decorator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this link exist? I see docs/reactive-data/#the-watch-decorator-watch, but not what's listed here

@@ -274,10 +274,10 @@ export const config: Config = {
| `default` | No additional re-export or auto-definition behavior will be performed.<br /><br />This value will be used if no explicit value is set in the config, or if a given value is not a valid option. |
| `auto-define-custom-elements` | A component and its children will be automatically defined with the `CustomElementRegistry` when the component's module is imported. |
| `bundle` | A utility `defineCustomElements()` function is exported from the `index.js` file of the output directory. This function can be used to quickly define all Stencil components in a project on the custom elements registry. |
| `single-export-module` | All component and custom element definition helper functions will be exported from the `index.js` file in the output directory (see [Defining Exported Custom Elements](#defining-exported-custom-elements) for more information on this file's purpose). This file can be used as the root module when distributing your component library, see [below](#distributing-custom-elements) for more details. |
| `single-export-module` | All component and custom element definition helper functions will be exported from the `index.js` file in the output directory. This file can be used as the root module when distributing your component library, see [below](../custom-elements#distributing-custom-elements) for more details. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the verbiage got changed here. In the v2 version of the docs, this table different a little from what was in v3 (note how we changed from 'See Defining Exported Custom Elements' to 'see below').

mlc_config.json Outdated
{ "pattern": "^../" }
],
"timeout": "20s"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it - I hate this config uses JSON (not that it's our choice) and we can't add this as comments :-(

@alicewriteswrongs
Copy link
Contributor Author

@rwaskiewicz @tanner-reits Ok so I've made some changes on here. These are the key changes:

  • added the remark linting thingy for checking internal links, including checking that headers exist in linked-to files
  • converted a fairly large number of links to be relative, file-path-based links
  • added that internal link checker to the CI

As of right now there is more to do in order to convert all of the links to file-path links, which is what would give us complete checking of all of them (including headers). We could either do that here or do some subsequent PRs in order to make those changes, what do you two think?

@rwaskiewicz
Copy link
Contributor

@alicewriteswrongs I say we just do it all here if you think the effort is relatively well sized. Unsure if we need to do all of it now to get CI to pass or not. Not that that precludes us from splitting the effort up. I think my vote would be if the effort is small-medium sized we do it here. Otherwise maybe it makes sense to split this PR up and the subsequent work as well.

@alicewriteswrongs
Copy link
Contributor Author

Unfortunately there seems to be an issue with the external link checker (that is what is preventing CI from passing) where it's showing non-broken links as broken. There's an issue about it here: tcort/markdown-link-check#201

@alicewriteswrongs
Copy link
Contributor Author

ok I think those issues should be addressed, ready for a re-review

@alicewriteswrongs alicewriteswrongs changed the title use markdown-link-checker gh action to verify that external links are still fresh docs(all): use remark-validate-links to check internal links, convert to file-path links Feb 22, 2023
@alicewriteswrongs alicewriteswrongs merged commit c9246f8 into main Feb 22, 2023
@alicewriteswrongs alicewriteswrongs deleted the ap/gh-actions-link-check branch June 14, 2023 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants