-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infrastructure: Fix failing link checker on GitHub README link #2931
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evmiguel thanks for doing the research into figuring what the issue was! I've also confirmed this solution works as expected!
I also really appreciate the consideration for the brittleness of this approach as you've mentioned.
My initial thought for this, was if there were any alternative ways explored? So as to not risk GH pulling the rug from under link-checker.js
once more 🙂. If not, that doesn't have to be something explored in this PR but was wondering if there was anything else to consider other than how the page ids are currently being extracted.
I've left some additional comments in the thread, design-related questions and possible brittleness mitigation thoughts.
.link-checker.js
Outdated
@@ -18,13 +20,27 @@ module.exports = { | |||
{ | |||
name: 'github', | |||
pattern: /^https:\/\/github\.com\/.*/, | |||
matchHash: (ids, hash) => | |||
ids.includes(hash) || ids.includes(`user-content-${hash}`), | |||
matchHash: (ids, hash, ssr) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matchHash: (ids, hash, ssr) => { | |
matchHash: (ids, hash, { ssr }) => { |
To me, passing in something like ssr
here feels like an additional 'option', and beyond just a 'parameter required for matching against known hashes'.
I'm wondering what you think of representing this (and updating other references in this PR where relevant). Also, If future edge cases like this pop up that require new 'options', it may make this easier to extend if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Howard's point about adding an object here.
I would say though that I don't think this is properly called ssr
. After all, if GitHub was doing SSR properly then we would be dealing with correctly formatted HTML already and none of these workarounds would be necessary.
As you can see in my other comment, I think we need to add another option to the configuration file to contain the complexity that GitHub is forcing us to handle. When you do that maybe you can try out another name and/or way of passing data to this function that makes sense based on how you tackle it.
scripts/link-checker.js
Outdated
@@ -33,7 +33,7 @@ async function checkLinks() { | |||
return getLineNumber; | |||
}; | |||
|
|||
const checkPathForHash = (hrefOrSrc, ids = [], hash) => { | |||
const checkPathForHash = (hrefOrSrc, ids = [], hash, ssr) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const checkPathForHash = (hrefOrSrc, ids = [], hash, ssr) => { | |
const checkPathForHash = (hrefOrSrc, ids = [], hash, { ssr } = {}) => { |
To align with my previous thought
@@ -142,7 +142,18 @@ async function checkLinks() { | |||
.querySelectorAll('[id]') | |||
.map((idElement) => idElement.getAttribute('id')); | |||
|
|||
return { ok: response.ok, status: response.status, ids }; | |||
// Handle GitHub README links. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is solely related to GitHub-related urls with hashes, I think checking hrefOrSrc
to verify if the link follows those rules before running this would be okay.
.link-checker.js
Outdated
const overviewFiles = | ||
ssr['props']['initialPayload']['overview']['overviewFiles']; | ||
for (let file of overviewFiles) { | ||
if (file['richText']) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could recursively check the sub-objects, and arrays found in ssr
until you find the richText
attribute. Doing this could lessen the brittle-ness by putting the main point of failure on richText
existing, which is probably more manageable, than if any of the intermediate objects disappear or the path gets shifted around.
Purely for demonstration, I was thinking something along the lines of:
const getAttributeValue = (obj, attribute) => {
if (typeof obj !== 'object' || obj === null) return undefined;
if (obj.hasOwnProperty(attribute)) return obj[attribute];
if (Array.isArray(obj)) {
for (const element of obj) {
const attributeValue = getAttributeValue(element, attribute);
if (attributeValue !== undefined) return attributeValue;
}
} else {
for (const key in obj) {
const attributeValue = getAttributeValue(obj[key], attribute);
if (attributeValue !== undefined) return attributeValue;
}
}
return undefined;
}
// ...
const richText = getAttributeValue(ssr, 'richText');
if (richText !== undefined) {
// ...
[My thoughts on its maintainability vs performance is another discussion though but for now, we only have that 1 offending link which shouldn't cause much impact]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple comments but my first bit of feedback is a biggie, which is I verified that this does indeed fix the issue with the page. Nice!
There is one link failing though. If you merge in the latest changes from the main branch there is another PR, merged yesterday, which fixes it.
scripts/link-checker.js
Outdated
) | ||
.flatMap((element) => element.getElementsByTagName('script')) | ||
.map((element) => JSON.parse(element.innerHTML))[0]; | ||
return { ok: response.ok, status: response.status, ids, ssr }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add an option for this work in the .link-checker
file? If we can avoid dumping this complexity in the main script that would be a big win.
.link-checker.js
Outdated
@@ -18,13 +20,27 @@ module.exports = { | |||
{ | |||
name: 'github', | |||
pattern: /^https:\/\/github\.com\/.*/, | |||
matchHash: (ids, hash) => | |||
ids.includes(hash) || ids.includes(`user-content-${hash}`), | |||
matchHash: (ids, hash, ssr) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Howard's point about adding an object here.
I would say though that I don't think this is properly called ssr
. After all, if GitHub was doing SSR properly then we would be dealing with correctly formatted HTML already and none of these workarounds would be necessary.
As you can see in my other comment, I think we need to add another option to the configuration file to contain the complexity that GitHub is forcing us to handle. When you do that maybe you can try out another name and/or way of passing data to this function that makes sense based on how you tackle it.
ef648d0
to
d1babdc
Compare
@howard-e @alflennik thank you for your feedback! I've made some changes based on them. Looking forward to your thoughts. |
The ARIA Authoring Practices (APG) Task Force just discussed The full IRC log of that discussion<jugglinmike> Topic: Link checker<jugglinmike> github: https://github.com//pull/2931 <jugglinmike> howard-e: The link checker fails to find a GitHub link with an ID matching the anchor <jugglinmike> howard-e: It worked once. The reason it no longer works is that there is a server-rendered element now rendered on that page which is not easily identified using the link checker implementation <jugglinmike> howard-e: The server-side-rendered element seems to be hiding the one we want <jugglinmike> howard-e: I have some thoughts about the stability of this solution, but I haven't reviewed the latest changes (Erika has made more changes since I last shared feedback) <jugglinmike> howard-e: I will be taking a look again this week <jugglinmike> Matt_King: This is really about testing the fragment part of the link. We're not worried about the link checker failing because the URL of the page is wrong. It's just that if we specify a fragment and the fragment is invalid--that's a problem <jugglinmike> Matt_King: We want to catch those kinds of errors when we can <jugglinmike> Matt_King: Okay howard-e, you merge this when you think it's ready <jugglinmike> howard-e: Will do <jugglinmike> Zakim, end the meeting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for addressing my feedback and thanks for the research put into this work!
We'll want to pay close attention to the link-checker checks, to determine how this holds up with any future GitHub changes -- but having this a bit more structured is great and makes it easier to think about adding future support. Not only for GitHub, but also if other domains are changing unexpectedly.
@@ -1,3 +1,24 @@ | |||
const HTMLParser = require('node-html-parser'); | |||
|
|||
const getAttributeValue = (obj, attribute) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a brief comment here on what this does for future context? Something like:
// Checks object for attribute and returns value. If not found on first pass, recursively checks nested objects and arrays of nested object(s) until attribute is found. If not found, returns undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my feedback, I'm happy with how this turned out!
This PR addresses #2907, where the link checker is failing due to a reference to a GitHub README link. The link checker fails because the page uses a
react-partial
element to wrap the README content.Questions for reviewers:
WAI Preview Link (Last built on Thu, 22 Feb 2024 21:30:45 GMT).