Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Newlines in text extraction #807

Merged
merged 1 commit into from
Apr 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion Resources/crazyones.txt
Original file line number Diff line number Diff line change
@@ -1 +1,18 @@
The Cr azy Ones Octob er 14, 1998 Heres to the crazy ones. The mis˝ts. The reb els. The troublemak ers. The round p egs in the square holes. The ones who see things di˙eren tly . Theyre not fond of rules. And they ha v e no resp ect for the status quo. Y ou can quote them, disagree with them, glorify or vilify them. Ab out the only thing y ou cant do is ignore them. Because they c hange things. They in v en t. They imagine. They heal. They explore. They create. They inspire. They push the h uman race forw ard. Ma yb e they ha v e to b e crazy . Ho w else can y ou stare at an empt y can v as and see a w ork of art? Or sit in silence and hear a song thats nev er b een written? Or gaze at a red planet and see a lab oratory on wheels? W e mak e to ols for these kinds of p eople. While some see them as the crazy ones, w e see genius. Because the p eople who are crazy enough to think they can c hange the w orld, are the ones who do.
The Cr azy Ones
Octob er 14, 1998
Heres to the crazy ones. The mis˝ts. The reb els. The troublemak ers.
The round p egs in the square holes.
The ones who see things di˙eren tly . Theyre not fond of rules. And
they ha v e no resp ect for the status quo. Y ou can quote them,
disagree with them, glorify or vilify them.
Ab out the only thing y ou cant do is ignore them. Because they c hange
things. They in v en t. They imagine. They heal. They explore. They
create. They inspire. They push the h uman race forw ard.
Ma yb e they ha v e to b e crazy .
Ho w else can y ou stare at an empt y can v as and see a w ork of art? Or
sit in silence and hear a song thats nev er b een written? Or gaze at
a red planet and see a lab oratory on wheels?
W e mak e to ols for these kinds of p eople.
While some see them as the crazy ones, w e see genius. Because the
p eople who are crazy enough to think they can c hange the w orld,
are the ones who do.
5 changes: 4 additions & 1 deletion Tests/test_workflows.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,12 @@ def test_PdfReaderFileLoad():
with open(os.path.join(RESOURCE_ROOT, "crazyones.txt"), "rb") as pdftext_file:
pdftext = pdftext_file.read()

text = page.extractText().replace("\n", "").encode("utf-8")
text = page.extractText().encode("utf-8")

# Compare the text of the PDF to a known source
for expected_line, actual_line in zip(text.split(b"\n"), pdftext.split(b"\n")):
assert expected_line == actual_line

assert text == pdftext, (
"PDF extracted text differs from expected value.\n\nExpected:\n\n%r\n\nExtracted:\n\n%r\n\n"
% (pdftext, text)
Expand Down