Skip to content

Commit

Permalink
Open link-verifier target files with encoding="utf8", errors='ignore'…
Browse files Browse the repository at this point in the history
… options (#29)

* Open link-verifier target files with encoding="utf8", errors='ignore' options
* Print each file path that is processed to stdout for debugging purposes.

Co-authored-by: Archit Aggarwal <[email protected]>
  • Loading branch information
paulbartell and aggarw13 authored Jun 2, 2021
1 parent d9eced0 commit 53ff74c
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion link-verifier/verify-links.py
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,11 @@ def main():
dirs[:] = [dir for dir in dirs if dir.lower() not in exclude_dirs]
for file in files:
if any(file.endswith(file_type) for file_type in args.include_files):
with open(os.path.join(root, file), 'r') as f:
f_path = os.path.join(root, file)
print("Processing File: {}".format(f_path))
with open(f_path, 'r', encoding="utf8", errors='ignore') as f:
# errors='ignore' argument Suppresses UnicodeDecodeError
# when reading invalid UTF-8 characters.
text = f.read()
urls = re.findall(URL_SEARCH_TERM, text)
for url in urls:
Expand Down

0 comments on commit 53ff74c

Please sign in to comment.