-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow option to ignore redirects warnings #54
Comments
Hmm, I can see that some redirects are useful indeed. Though, I am sure some are to be avoided too? Could there be a regex pattern system for redirects? |
Also, could different status codes be used to differentiate? |
This is really a problematic change as it basically prohibits canonical links which always redirect to the latest version of something. |
Can you read the above comments and add something useful to the conversion... You likely don’t want some redirects. You likely want to allow some redirects. How could we make that distinction? |
I can and did. Can you read the README and CHANGELOG ? Nowhere in the documentation of this project does it mention anything about forbidding redirection of URLs. Now, I get that redirects should only be followed up to a point (2, maybe 3 redirects), but forbidding/failing on any redirect is an undocumented change in behaviour (not mentioned in the README or changelog) and should therefor be considered a bug. It is also outside the scope of this plugin. People use this package for what it is says it does: check for dead URLs. URLs which redirect to a valid URL are not dead.
You can't, but you shouldn't have to. The only distinction you need to make if it a URL redirects to a 4## or 5## code or gets into a redirect loop. Anything else should not be considered problematic. |
ok well it’s an intended major change to improve docs — here’s your money back; feel free to use other things. In the previous comments I was discussing which how more useful features can be added
I think it might be possible! there’s differences between 301 and 302 and 303 and 307 and 308. |
Seriously ? Look I'm a maintainer too and I feel your pain, but these type of remarks are not helpful and only go to alienate people who do want to help (and understand how OS works). I use this package in an action, often run on a weekly cronjob, to ensure that URLs in the README, CHANGELOG and other docs are not broken when end-users browse them. I've had to make emergency fixes to a dozen or so projects over the last few days because of this undocumented change as otherwise contributors to my projects would be blocked by failing builds. However, as canonical links are now "forbidden", this means these "fixed" links will soon go out of date and the docs of my projects will point to outdated information, which I then have to fix again. Now, I realize that that is not your concern, but I just want to give you some perspective of the consequences of this bug.
Well, that presumes sites actually send the correct HTTP code ;-) Then again, I've seen 404 pages being served with a 200 status, so I suppose that's something we shouldn't even consider. I think limiting to the 301 and 308 statuses ("Moved Permanently"/"Permanent Redirect") may already be an improvement, but a 308 might still flag canonical links. Having said that, I would still prefer an option to turn the warning on redirection off completely, like the OP suggested above.
|
Yes, seriously. You should have used versioning: this was in a major release. I understand that the Python ecosystem has different traditions. But you should know that about JS too. This is between an abandoned v1 up to a maintained rewritten v2 with breaking changes. Users are good at raising problems. I don‘t think they are always good at coming up with solutions. I think I am in a better position, as a maintainer, for that. If you see OPs log, you will see that they indeed use several intentionally redirecting URLs. But there are also redirecting URLs that can be improved. I would appreciate it if you can share what your inputs are. What the actual current output is. And what your expected output is. Perhaps you can even look at which status codes you are getting. Thanks |
That's a fair point, if it wasn't for predefined action runners and Docker containers including this package and me not having any control over those. And even if I changed that setup to directly request the packages from within the action workflows (which I have done for some), Dependabot does not keep those up to date, which makes that a maintenance nightmare in the making.
Interesting about the Python ecosystem. I have no idea about that at all. In the PHP world, using versioning is common and straight-forward. The problem with versioning in the JS world - IMO - is that it appears to be a rule that a package can't be released without at least a 1000 dependencies, leading to regular dependency version conflicts, also known as "dependency hell". Please note: this is not critism of this package, this is just a generic observation on JS/Node and dependencies.
Well, thank you for that. I had no idea the package was abandoned and can only compliment you on taking it on. Regarding breaking changes: it would be great if the consequences of those were annotated in a human understandable way in the changelog/release.
Happy to. Would you like me to share some links to failing builds ? Or would you prefer examples ? To start, here are some examples of what I would say are redirections which should be left alone:
Here are some which can be argued should be reported (and are):
*: Status codes determined via a Curl request to the original URLs. I'm also seeing anchors being reported as missing when the target website uses Example:
The <li class="details-toggler version" data-version-id="dev-develop" data-load-more="/versions/3558491.json">
<a href="#dev-develop" class="version-number">dev-develop / 1.x-dev
</a> Other errors I'm seeing intermittently which may be related: Any links to
Hope this helps. |
I thought you were doing python but it’s php, ok! I think you are wrong about JavaScript; take a look at the 100s of projects that are being maintained by us. Or see https://github.com/wooorm/npm-high-impact/blob/main/lib/top.js. They don’t have 1000s of dependencies. I also think that what you call “dependency hell” is the reason JavaScript is more popular than PHP/Ruby/etc. And, the main problems that the JS ecosystem has — and it has many — I ascribe to being so popular. If docker makes it hard to version things: don’t use docker. You can
Pass
It may be. Or see the options https://github.com/wooorm/dead-or-alive#options. If the website is slow, perhaps tweak the |
Thanks for these! So this looks like we can differentiate between 301 and 302! As for your note on shield.io: I kinda get it. But every http request also takes time for your users. HTTP on 301:
https://datatracker.ietf.org/doc/html/rfc9110#section-15.4.2 |
|
OK, I’ll publish a fix in a second. Where now this tool warns for each link in the following document: # URLs
## Fine
[a](https://keepachangelog.com/)
[b](https://coveralls.io/repos/github/PHPCSStandards/PHPCSExtra/badge.svg)
[c](http://getcomposer.org/)
[e](https://github.com/Yoast/PHPUnit-Polyfills/issues/new/choose)
## Changable
[f](https://eslint.org/docs/rules/no-lonely-if)
[g](http://semver.org/)
[h](https://img.shields.io/packagist/php-v/phpcsstandards/phpcsextra.svg) It will then warn for the last 3 URLs, as they are permanently redirected (solely 301 or 308s). This tool then proposes: # URLs
## Fine
[a](https://keepachangelog.com/)
[b](https://coveralls.io/repos/github/PHPCSStandards/PHPCSExtra/badge.svg)
[c](http://getcomposer.org/)
[e](https://github.com/Yoast/PHPUnit-Polyfills/issues/new/choose)
## Changable
[f](https://eslint.org/docs/latest/rules/no-lonely-if)
[g](https://semver.org/)
[h](https://img.shields.io/packagist/dependency-v/phpcsstandards/phpcsextra/php.svg?) Which then works without warnings |
Will start using that more diligently in the repos where I converted the workflow away from Docker already - in combination with watching releases of all those repos as I won't be able to rely on Dependabot. Thanks.
Thanks for that suggestion. I think I'll go with the
In both these cases, it's the same team behind them, so I'll open some issues for them and see if I can convince them to update both. Might take a while, but still worth a try ;-)
Yes, currently ignoring. Retriggering the workflow normally fixes it. Was just weird as I can't remember seeing that one before 2.0.
That's awesome! Thank you so much for listening and addressing this! |
OK, can you both please check out https://github.com/remarkjs/remark-lint-no-dead-urls/releases/tag/2.0.1? Non-permanent redirects are now fine. In both your logs there’s a ton of things that I believe should be improved in the docs (or, sometimes, on the servers). Is this new behavior sufficient? Is something else needed? Is an option to allow permanent redirects too still needed (if so: please provide strong arguments). Thanks! |
@wooorm Currently on my way to a conference, so probably won't get to this until I'm back. I've made a note to revert the "fix" commits I made earlier after the 2.0 release and to re-evaluate the run results. |
OK I’ll close this as I think it‘s done. But open to further discussion! |
@wooorm Reporting back, looking good so far, but I didn't manage to get the The below did not work for me: [
"remark-lint-no-dead-urls",
{
"deadOrAliveOptions": {
"anchorAllowlist": [
["https://packagist.org/packages/phpcsstandards/phpcsextra", "dev-develop"]
],
"maxRetries": 3
}
}
], |
Right! Regexes are expected. JSON and YAML don’t support regular expressions. In JS it would be something like: anchorAllowlist: [
[/^https:\/\/packagist\.org\/packages\/phpcsstandards\/phpcsextra\/$/i, /^dev-develop$/i]
] …but the point of using regexes is that you can use expressions to match different things instead of single, literal, values |
Thanks for fixing this so quickly. You are correct our website did have a bunch of links that actually needed to be fixed. Now on to figure out why |
Curl might offer some pointers for your example curl -v -L www.mysql.com I don't see a 403, but I do see an |
I get a 403 as well. Some garbage old HTML with: <td width="68%" valign="top">
<div align="center">
<p align="justify"><font face="Arial, Helvetica, sans-serif">This site http://www.mysql.com/ </font><font face="Arial, Helvetica, sans-serif"> is experiencing technical difficulty. We are aware of the issue and are working as quick as possible to correct the issue. <br />
<br />
We apologize for any inconvenience this may have caused. <br />
<br />
To speak with an Oracle sales representative: 1.800.ORACLE1.<br />
<br />
To contact Oracle Corporate Headquarters from anywhere in the world: 1.650.506.7000.<br />
<br />
To get technical support in the United States: 1.800.633.0738. </font><br />
</p>
</div> Well. that’s the web 🤷♂️ Maybe contact an Oracle rep 😉 |
Initial checklist
Problem
There are many redirects that are considered valid for our website https://github.com/k3s-io/docs.
With the bump to 2.0.0 we are now seeing warning such as:
It is very common for
/stable/
or/latest/
urls to auto redirect to the version in question.See https://github.com/k3s-io/docs/actions/runs/11113288516 for all the specific errors we ran into suddenly.
Solution
Add an option to ignore/skip the check on whether a url has been redirected. I don't care that the URL redirects, I only care that the URL hits a 404 or does not exist.
Alternatives
Currently we have simply started running this plugin without
--frail
and will manually check whether or not some URLs are actually 404/dead.The text was updated successfully, but these errors were encountered: