check for header row #73

DerMoehre · 2022-10-03T11:03:41Z

I used your code to implement a small test for a header row
Also updated the code in transcriptprocessor to check for the header row and jump to the next row.
Because sometimes, there are no rows following, the next() has also to be in a try:except block

hotfix - exported script names

chrisbrickhouse

This is really good! One small point: test files should be named after the module they test so that it's easier to track down where the failures are. If test_readFile.py fails, we need to go into the test to figure out what it's testing before we can go fix it. If test_transcriptprocessor.py fails when we know exactly where the issue is: fave/align/transcriptprocessor.py. So test_readFile.py should be added to test_transcriptprocessor.py instead of its own file.

Besides that top-level point, my code comments cover two main points:

You seem to have a good grasp of the testing basics, so I've given you some feedback on how to improve your test writing so that you can write them more easily in the future. You've duplicated a lot of code, but with a few tricks you can actually simplify a lot of what you need to write.
Your changes to transcriptprocessor.py are good, and my main complaint is how the try-except blocks are scoped. Except blocks can cause a lot of problems if they catch exceptions we didn't predict, so a few comments on how to improve that.

chrisbrickhouse · 2022-10-04T18:18:14Z

fave/align/transcriptprocessor.py

-            lines = self.replace_smart_quotes(f.readlines())
-        self.lines = lines
+            try:
+                lines = self.replace_smart_quotes(f.readlines())


Move this outside try block. If there is a ValueError in replace_smart_quotes, we don't want to catch that.

chrisbrickhouse · 2022-10-04T18:18:42Z

fave/align/transcriptprocessor.py

@@ -33,6 +33,7 @@
 import re
 import sys
 import os
+import csv


It seems like we're not using CSV, so we can remove this import.

chrisbrickhouse · 2022-10-04T18:29:28Z

fave/align/transcriptprocessor.py

+                float(lines[0].split('\t')[2]) 
+            except ValueError:
+                # Log a warning about having detected a header row
+                print("Header row was detected")


We use python's logging library instead of print because the logger gives users more control over how much information the program puts out. This would probably be a "warning" level because things are still working, but the program is making an assumption (a header row exists) that might lead to errors later if the assumption is wrong. You can read more about what the log levels mean in this article.

The logger is accessed using self.logger and using this format, self.logger.warning('Message text'), you can convert the print statement to a log entry.

Ah okay, never used this :-)

chrisbrickhouse · 2022-10-04T19:02:22Z

fave/align/transcriptprocessor.py

+                # check, if there is a next line
+                try:
+                    # jump to the next line of the file
+                    next(f)
+                except:
+                    pass


Two things:

This approach is too cautious. Try-except should be used when we expect a specific situation could arise and want to handle that case. If we cannot handle that situation, we should let the function exit with the error code and let the parent function deal with it (and so on up the chain until it is handled or the program exits with an error). If something goes wrong with next() we want that to error to "bubble up" to the top of the program. See exception handling best practices 4 and 9.

You don't want to use next(f) here. You can see in line 249/251 that we've already read the file into memory as lines so we should be working with that instead of the file. I'm realizing now that next() is probably the wrong approach. This function sets self.lines and it should be a list. If we use next() the type will be an iterator and not a list. So we should just remove the first entry from the list, del lines[0] should do the trick.

okay, that would also make it easier to check, if there is another line. This is why I added the extra try:except block, because I ran in some errors because there were no next lines

edit: ah now I understand. We dont have to check for another line, if we just delete the first row, right? :-)

chrisbrickhouse · 2022-10-04T19:08:50Z

tests/fave/align/test_readFile.py

+    for test_case in provide_value_error_file():
+        test_text = test_case[0]
+        flags = test_case[1]
+        expected = test_case[2]
+        tmp_file.write_text(test_text)
+        tp_obj = transcriptprocessor.TranscriptProcessor(
+                tmp_file,
+                cmu_dict,
+                **flags
+            )
+        tp_obj.read_transcription_file()
+
+        assert tp_obj.lines == expected
+
+    for test_case in provide_no_error_file():
+        test_text = test_case[0]
+        flags = test_case[1]
+        expected = test_case[2]
+        tmp_file.write_text(test_text)
+        tp_obj = transcriptprocessor.TranscriptProcessor(
+                tmp_file,
+                cmu_dict,
+                **flags
+            )
+        tp_obj.read_transcription_file()
+
+        assert tp_obj.lines == expected


You can (and should) combine these. Notice that these loops iterate over the return of the provider functions, and notice that the provider functions return a list of lists. If you add new test cases to the list of lists returned by the provider function, you only need one of these loops. This is known as the data provider pattern.

chrisbrickhouse · 2022-10-04T19:15:14Z

tests/fave/align/test_readFile.py

+def provide_value_error_file():
+    return [
+        [
+            "Style\tSpeaker\tBeginning\tEnd\tDuration\nFoo\tBar\t0.0\t3.2\t3.2",
+            {
+                'prompt': "IDK what this is -CJB",
+                'check' : '',
+                'verbose': logging.DEBUG
+            },
+            ['Style\tSpeaker\tBeginning\tEnd\tDuration\n', 'Foo\tBar\t0.0\t3.2\t3.2']
+        ]
+    ]
+
+# this does not raise a ValueError
+def provide_no_error_file():
+    return [
+        [
+            "Foo\tBar\t0.0\t3.2\t3.2\nTest\t1.0\4.5\t3.5",
+            {
+                'prompt': "IDK what this is -CJB",
+                'check' : '',
+                'verbose': logging.DEBUG
+            },
+            ['Foo\tBar\t0.0\t3.2\t3.2\n', 'Test\t1.0\4.5\t3.5']
+        ]
+    ]


Following up on the above comment, notice that the parameters for the test are a list:

[ "foo\tbar...\n...4.5\t3.5", # test_text { ... # flags }, ['foo...', 'Test...'] # expected ]

Meanwhile, that list is an entry in a list returned by the function, and each element of that list of lists is iterated over and tested. So instead of having multiple providers for a test, we can just add new test cases as entries in the list:

def provide_foo(): return [ [ 'Test case 1'], [ 'Test case 2'], [ 'Test case 3'] ]

Thank you :) I will read your comments and try to learn from them.

…le to detec header

DerMoehre · 2022-10-05T16:24:26Z

OKay, I hope everything went right.
Something was off, when I tried to git push the directory for the pull request.

In the end it worked, but maybe it added to much files.

chrisbrickhouse

LGTM. A few typos, but once those are fixed I can merge.

chrisbrickhouse · 2022-10-05T18:08:42Z

tests/fave/align/test_transcriptprocessor.py

+            ['Foo\tBar\t0.0\t3.2\t3.2']
+        ],
+        [   # test with one line 
+            "Foo\tBar\t0.0\t3.2\t3.2\nTest\t1.0\4.5\t3.5",


You forgot a tab: 1.0\4.5 should be 1.0\t4.5. Same typo in the expected field (which is why the test passes) and in the following test cases.

chrisbrickhouse · 2022-10-05T18:09:44Z

tests/fave/align/test_transcriptprocessor.py

+            },
+            ['Foo\tBar\t0.0\t3.2\t3.2\n', 'Test\t1.0\4.5\t3.5']
+        ],
+        [   # test with more lines 


See above. Happens twice in this line and the expected value.

chrisbrickhouse

LGTM, merged

Resolves JoFrhwld#65 by checking the data type of the first time field. If it's not a float, we assume it's a header row and remove it from the returned list. Otherwise the function returns as previously. Squashed commit of DerMoehre's PR JoFrhwld#73 Co-authored-by: JoFrhwld <[email protected]> Co-authored-by: Christian Brickhouse <[email protected]>

Resolves #65 by checking the data type of the first time field. If it's not a float, we assume it's a header row and remove it from the returned list. Otherwise the function returns as previously. Squashed commit of DerMoehre's PR #73 Co-authored-by: JoFrhwld <[email protected]> Co-authored-by: Christian Brickhouse <[email protected]>

JoFrhwld added 4 commits September 22, 2022 13:38

readme

4147b8a

update export script names for backwards compatibility

638314e

Merge pull request JoFrhwld#78 from JoFrhwld/hotfix

d0c2a94

hotfix - exported script names

incrementing version number

f514fde

chrisbrickhouse requested changes Oct 4, 2022

View reviewed changes

DerMoehre force-pushed the master branch from 57ab0b6 to f514fde Compare October 5, 2022 05:10

DerMoehre and others added 14 commits October 5, 2022 07:22

removed the link to the mailing group

913d6c6

added a bug reporting form

614c519

added a link to the bug report form

e21cef9

added a description of bug triage

a865e50

added the changes to the bug report and readme

0a03242

deleted .code-workspace

acbfb43

Delete Documents.code-workspace

6d95cd9

added the test_readFile and updated the code in read_transcription_fi…

4e38a1e

…le to detec header

added the requested changes

50bc9b5

Delete test.csv

0836312

Delete test_readFile.py

2293dba

changes in the transcriptprocessor file

8e245e3

added the requested changes

f305e54

added the requested changes

400a060

DerMoehre requested a review from chrisbrickhouse October 5, 2022 16:06

chrisbrickhouse requested changes Oct 5, 2022

View reviewed changes

typo correction

27c6fde

DerMoehre requested a review from chrisbrickhouse October 6, 2022 04:24

Merge branch 'dev' into master

66611d2

chrisbrickhouse merged commit ce4ee71 into JoFrhwld:dev Oct 6, 2022

chrisbrickhouse reviewed Oct 6, 2022

View reviewed changes

chrisbrickhouse mentioned this pull request Mar 14, 2024

Prep 2.0.3 release #98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check for header row #73

check for header row #73

DerMoehre commented Oct 3, 2022

chrisbrickhouse left a comment

chrisbrickhouse Oct 4, 2022

chrisbrickhouse Oct 4, 2022

chrisbrickhouse Oct 4, 2022

DerMoehre Oct 4, 2022 •

edited

Loading

chrisbrickhouse Oct 4, 2022

DerMoehre Oct 4, 2022 •

edited

Loading

chrisbrickhouse Oct 4, 2022

chrisbrickhouse Oct 4, 2022

DerMoehre Oct 4, 2022

DerMoehre commented Oct 5, 2022

chrisbrickhouse left a comment

chrisbrickhouse Oct 5, 2022

chrisbrickhouse Oct 5, 2022

chrisbrickhouse left a comment

check for header row #73

check for header row #73

Conversation

DerMoehre commented Oct 3, 2022

chrisbrickhouse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DerMoehre Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DerMoehre Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DerMoehre commented Oct 5, 2022

chrisbrickhouse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisbrickhouse left a comment

Choose a reason for hiding this comment

DerMoehre Oct 4, 2022 •

edited

Loading

DerMoehre Oct 4, 2022 •

edited

Loading