-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check for header row #73
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good! One small point: test files should be named after the module they test so that it's easier to track down where the failures are. If test_readFile.py
fails, we need to go into the test to figure out what it's testing before we can go fix it. If test_transcriptprocessor.py
fails when we know exactly where the issue is: fave/align/transcriptprocessor.py. So test_readFile.py
should be added to test_transcriptprocessor.py
instead of its own file.
Besides that top-level point, my code comments cover two main points:
- You seem to have a good grasp of the testing basics, so I've given you some feedback on how to improve your test writing so that you can write them more easily in the future. You've duplicated a lot of code, but with a few tricks you can actually simplify a lot of what you need to write.
- Your changes to transcriptprocessor.py are good, and my main complaint is how the try-except blocks are scoped. Except blocks can cause a lot of problems if they catch exceptions we didn't predict, so a few comments on how to improve that.
fave/align/transcriptprocessor.py
Outdated
lines = self.replace_smart_quotes(f.readlines()) | ||
self.lines = lines | ||
try: | ||
lines = self.replace_smart_quotes(f.readlines()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this outside try block. If there is a ValueError in replace_smart_quotes, we don't want to catch that.
fave/align/transcriptprocessor.py
Outdated
@@ -33,6 +33,7 @@ | |||
import re | |||
import sys | |||
import os | |||
import csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we're not using CSV, so we can remove this import.
fave/align/transcriptprocessor.py
Outdated
float(lines[0].split('\t')[2]) | ||
except ValueError: | ||
# Log a warning about having detected a header row | ||
print("Header row was detected") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use python's logging library instead of print because the logger gives users more control over how much information the program puts out. This would probably be a "warning" level because things are still working, but the program is making an assumption (a header row exists) that might lead to errors later if the assumption is wrong. You can read more about what the log levels mean in this article.
The logger is accessed using self.logger
and using this format, self.logger.warning('Message text')
, you can convert the print statement to a log entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, never used this :-)
fave/align/transcriptprocessor.py
Outdated
# check, if there is a next line | ||
try: | ||
# jump to the next line of the file | ||
next(f) | ||
except: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
- This approach is too cautious. Try-except should be used when we expect a specific situation could arise and want to handle that case. If we cannot handle that situation, we should let the function exit with the error code and let the parent function deal with it (and so on up the chain until it is handled or the program exits with an error). If something goes wrong with
next()
we want that to error to "bubble up" to the top of the program. See exception handling best practices 4 and 9. - You don't want to use
next(f)
here. You can see in line 249/251 that we've already read the file into memory aslines
so we should be working with that instead of the file. I'm realizing now thatnext()
is probably the wrong approach. This function setsself.lines
and it should be a list. If we usenext()
the type will be an iterator and not a list. So we should just remove the first entry from the list,del lines[0]
should do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, that would also make it easier to check, if there is another line. This is why I added the extra try:except block, because I ran in some errors because there were no next lines
edit: ah now I understand. We dont have to check for another line, if we just delete the first row, right? :-)
tests/fave/align/test_readFile.py
Outdated
for test_case in provide_value_error_file(): | ||
test_text = test_case[0] | ||
flags = test_case[1] | ||
expected = test_case[2] | ||
tmp_file.write_text(test_text) | ||
tp_obj = transcriptprocessor.TranscriptProcessor( | ||
tmp_file, | ||
cmu_dict, | ||
**flags | ||
) | ||
tp_obj.read_transcription_file() | ||
|
||
assert tp_obj.lines == expected | ||
|
||
for test_case in provide_no_error_file(): | ||
test_text = test_case[0] | ||
flags = test_case[1] | ||
expected = test_case[2] | ||
tmp_file.write_text(test_text) | ||
tp_obj = transcriptprocessor.TranscriptProcessor( | ||
tmp_file, | ||
cmu_dict, | ||
**flags | ||
) | ||
tp_obj.read_transcription_file() | ||
|
||
assert tp_obj.lines == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can (and should) combine these. Notice that these loops iterate over the return of the provider functions, and notice that the provider functions return a list of lists. If you add new test cases to the list of lists returned by the provider function, you only need one of these loops. This is known as the data provider pattern.
tests/fave/align/test_readFile.py
Outdated
def provide_value_error_file(): | ||
return [ | ||
[ | ||
"Style\tSpeaker\tBeginning\tEnd\tDuration\nFoo\tBar\t0.0\t3.2\t3.2", | ||
{ | ||
'prompt': "IDK what this is -CJB", | ||
'check' : '', | ||
'verbose': logging.DEBUG | ||
}, | ||
['Style\tSpeaker\tBeginning\tEnd\tDuration\n', 'Foo\tBar\t0.0\t3.2\t3.2'] | ||
] | ||
] | ||
|
||
# this does not raise a ValueError | ||
def provide_no_error_file(): | ||
return [ | ||
[ | ||
"Foo\tBar\t0.0\t3.2\t3.2\nTest\t1.0\4.5\t3.5", | ||
{ | ||
'prompt': "IDK what this is -CJB", | ||
'check' : '', | ||
'verbose': logging.DEBUG | ||
}, | ||
['Foo\tBar\t0.0\t3.2\t3.2\n', 'Test\t1.0\4.5\t3.5'] | ||
] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on the above comment, notice that the parameters for the test are a list:
[
"foo\tbar...\n...4.5\t3.5", # test_text
{
... # flags
},
['foo...', 'Test...'] # expected
]
Meanwhile, that list is an entry in a list returned by the function, and each element of that list of lists is iterated over and tested. So instead of having multiple providers for a test, we can just add new test cases as entries in the list:
def provide_foo():
return [
[ 'Test case 1'],
[ 'Test case 2'],
[ 'Test case 3']
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :) I will read your comments and try to learn from them.
…le to detec header
OKay, I hope everything went right. In the end it worked, but maybe it added to much files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A few typos, but once those are fixed I can merge.
['Foo\tBar\t0.0\t3.2\t3.2'] | ||
], | ||
[ # test with one line | ||
"Foo\tBar\t0.0\t3.2\t3.2\nTest\t1.0\4.5\t3.5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot a tab: 1.0\4.5
should be 1.0\t4.5
. Same typo in the expected field (which is why the test passes) and in the following test cases.
}, | ||
['Foo\tBar\t0.0\t3.2\t3.2\n', 'Test\t1.0\4.5\t3.5'] | ||
], | ||
[ # test with more lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. Happens twice in this line and the expected value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, merged
Resolves JoFrhwld#65 by checking the data type of the first time field. If it's not a float, we assume it's a header row and remove it from the returned list. Otherwise the function returns as previously. Squashed commit of DerMoehre's PR JoFrhwld#73 Co-authored-by: JoFrhwld <[email protected]> Co-authored-by: Christian Brickhouse <[email protected]>
Resolves #65 by checking the data type of the first time field. If it's not a float, we assume it's a header row and remove it from the returned list. Otherwise the function returns as previously. Squashed commit of DerMoehre's PR #73 Co-authored-by: JoFrhwld <[email protected]> Co-authored-by: Christian Brickhouse <[email protected]>
I used your code to implement a small test for a header row
Also updated the code in transcriptprocessor to check for the header row and jump to the next row.
Because sometimes, there are no rows following, the next() has also to be in a try:except block