-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude the MISRA Website from CI-CD link verifier checks #91
Conversation
This fixes the issue seen in FreeRTOS/FreeRTOS-Kernel#880 |
32c8041
to
4a98c0d
Compare
…sequence. Also trying to fix the issue with trailing comas and slashes being counted as part of the URLs.
…token to the action so that workflows can use the CLI
@@ -160,7 +160,7 @@ jobs: | |||
org: AWS, | |||
branch: main, | |||
run-link-verifier: true, | |||
run-complexity: true, | |||
run-complexity: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jobs is undergoing changes still
For now skip running build related tests.
@@ -210,7 +210,7 @@ jobs: | |||
with: | |||
path: repo/${{ matrix.inputs.repository }} | |||
exclude-dirs: complexity, formatting | |||
exclude-urls: https://dummy-url.com/ota.bin | |||
exclude-urls: https://dummy-url.com/ota.bin, https://s3.region.amazonaws.com/joe-ota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jobs has a new URL for its own tests, add to this list.
@@ -20,6 +20,7 @@ inputs: | |||
exclude-urls: | |||
description: 'Comma separated list of URLS not to check' | |||
required: false | |||
default: https://www.misra.org.uk/misra-c, https://www.misra.org.uk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placing these here just so that if not adding them explicitly it shows up the in action log
# now has a CAPTCHA landing page, as such always exclude it from this check. | ||
touch allowList.txt | ||
echo "https://www.misra.org.uk/misra-c" >> allowList.txt | ||
echo "https://www.misra.org.uk" >> allowList.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea here is to make this fix more "portable" as any repos that are already using the above parameters would need an individual PR
Adding the exclusion of these URLs means that we only need to make this change in this repo, not ever repo
# Test that it will find this url and drop the slash | ||
https://www.google.com/ | ||
# Test that it will find this url by dropping the coma | ||
https://www.google.com, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test to make sure we don't try and use the trailing coma as part of the URL, currently an issue with the checker.
@@ -14,7 +14,7 @@ | |||
import traceback | |||
from collections import defaultdict | |||
|
|||
MARKDOWN_SEARCH_TERM = r'\.md$' | |||
MARKDOWN_SEARCH_TERM = r"\.md$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still throws a warning about it not being a valid escape sequence? But not sure what it wants.
@@ -14,7 +14,7 @@ | |||
import traceback | |||
from collections import defaultdict | |||
|
|||
MARKDOWN_SEARCH_TERM = r'\.md$' | |||
MARKDOWN_SEARCH_TERM = r"\.md$" | |||
# Regex to find a URL | |||
URL_SEARCH_TERM = r'(\b(https?)://[^\s\)\]\\"<>]+[^\s\)\.\]\\"<>])' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to update this regex to exclude the trailing slash, or coma, but I honestly have no idea how this even matches currently.
@@ -151,7 +151,11 @@ def identify_broken_links(self, files, verbose): | |||
cprint(f'\t{link}','green') | |||
|
|||
for link in self.external_links: | |||
is_broken, status_code = test_url(link) | |||
# Remove the trailing slash or trailing coma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing slash - So that we don't do a duplicate search of <URL>/
and <URL>
Trailing coma - There are a few places I've seen links be put into files like this
/* <COMMENT> <URL>, <MORE COMMENT> */
Where this then breaks the URL checker, since the current regex grabs the coma
@@ -166,7 +170,7 @@ def parse_file(html_file): | |||
return HtmlFile(html_file) | |||
|
|||
def html_name_from_markdown(filename): | |||
md_pattern = re.compile('\.md', re.IGNORECASE) | |||
md_pattern = re.compile("\.md$", re.IGNORECASE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this uses this, compared to the global one at the top of the file, but just tried using quotes to see if that helped with the warning
Added the $ to make sure it only looks for files that fully end in .md
@@ -254,7 +258,10 @@ def fetch_issues(repo, issue_type, limit): | |||
if process.returncode == 0: | |||
key = issue_type + 's' | |||
for issue in process.stdout.split(): | |||
main_repo_list[repo][key].add(int(issue)) | |||
if(issue.isnumeric()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If for some reason the GitHub issues can't be accessed this will throw an error attempting to convert it to an int
Check if we have an actual number first, if we do not that means there was an error reading the actual Issues from the repo. When this occurs it returns an output of:
Stdout = ['gh:', 'To', 'use', 'GitHub', 'CLI', 'in', 'a', 'GitHub', 'Actions', 'workflow,', 'set', 'the', 'GH_TOKEN', 'environment', 'variable.', 'Example:', 'env:', 'GH_TOKEN:', '${{', 'github.token', '}}']
@@ -347,7 +353,7 @@ def main(): | |||
if any(file.endswith(file_type) for file_type in args.include_files): | |||
f_path = os.path.join(root, file) | |||
if args.verbose: | |||
print("Processing File: {}".format(f_path)) | |||
print("\nProcessing File: {}".format(f_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a newline just to help space the log out when doing a verbose run.
if ( ( link[-1] == "/" ) or ( link[-1] == "," ) ): | ||
is_broken, status_code = test_url(link[:-1]) | ||
else: | ||
is_broken, status_code = test_url(link) | ||
if is_broken: | ||
broken_links.append(link) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentionally use the version of the URL with the coma or slash for the error list. This way the exact link can be searched for in the source file easily.
No description provided.