-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port USFM code from Machine up to commit a9058ce #111
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ready to be reviewed? Does this include all of the commits that need to be ported?
Reviewable status: 0 of 4 files reviewed, all discussions resolved (waiting on @mshannon-sil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, currently finishing up the commit for non-verse text support. Should be done with that by the end of tomorrow. There's still more commits to port after that, but if you'd like to review the PR midway through the porting process, that could be a good point to do so.
Reviewable status: 0 of 4 files reviewed, all discussions resolved (waiting on @mshannon-sil)
98e645f
to
fb630bf
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #111 +/- ##
==========================================
+ Coverage 87.97% 88.12% +0.15%
==========================================
Files 243 247 +4
Lines 14239 14799 +560
==========================================
+ Hits 12527 13042 +515
- Misses 1712 1757 +45 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 35 files reviewed, 7 unresolved discussions
machine/corpora/scripture_ref_usfm_parser_handler.py
line 23 at r6 (raw file):
def __init__(self) -> None: self._cur_verse_ref: VerseRef = VerseRef() self._cur_elements_stack: List[ScriptureElement] = []
For variables that were a Stack type in C#, I used a list in python, since it's fast to append and pop items from a list like a stack. I added _stack
to the end of these variables to document that they should be used as a stack.
machine/corpora/scripture_ref_usfm_parser_handler.py
line 173 at r6 (raw file):
) # No need to reverse unlike in Machine, elements are already added in correct order path = [e for e in self._cur_elements_stack if e.position > 0]
In Machine, there's a need to reverse the order of the stack based on how elements are added to the stack. However when I ported it to machine.py and used a list, all the elements were actually being added in the correct order, and reversing just caused bugs, so I removed that in machine.py.
machine/corpora/scripture_text.py
line 43 at r6 (raw file):
else: yield from self._create_rows_scripture_ref(ref, text, is_sentence_start)
In Machine, this method was overloaded to have two separate implementations for either a VerseRef or a List of ScriptureRef as input. Since I can't do exactly the same thing in python, I created a function with the general create_rows
name, which checks the type of the input and calls the corresponding create_rows
method for that type.
machine/corpora/text_corpus.py
line 203 at r6 (raw file):
class _FilterTextCorpus(TextCorpus):
The class _FilterTextCorpus in machine.py is called WhereTextCorpus in Machine. I thought this might be due to different naming conventions in python vs c#, but then _TextFilterTextCorpus in machine.py has the same name in Machine. Is the naming difference here intentional?
machine/scripture/scripture_ref.py
line 18 at r6 (raw file):
self._path: List[ScriptureElement] = path if path is not None else [] _empty: Optional[ScriptureRef] = None
I was concerned about properties of the _empty
ScriptureRef (verse_ref and path) getting modified and affecting all instances so that they are no longer empty. But since the public properties only have getters not setters, it should be fine as long as the corresponding private properties aren't modified.
machine/scripture/verse_ref.py
line 380 at r6 (raw file):
return 0 def __eq__(self, other: object) -> bool:
The Equals method in SIL.Scripture/VerseRef.cs is the same as the exact_equals method that was in machine.py. I changed the machine.py version to be the eq method instead so that it is called when == is used, since this seemed to me to be what we would want, and it didn't break any tests. Let me know if there's a reason to not make it the default equality comparison.
machine/scripture/verse_ref.py
line 488 at r6 (raw file):
def are_overlapping_verse_ranges_vref(verse_ref1: VerseRef, verse_ref2: VerseRef) -> bool: if verse_ref1.is_default or verse_ref2.is_default:
I think there's actually a bug in SIL.Scripture/VerseRef.cs. The corresponding section in OverlappingVersesRanges for the VerseRef type input is if (verseRef1.IsDefault || verseRef1.IsDefault)
but it should be if (verseRef1.IsDefault || verseRef2.IsDefault)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5, 34 of 34 files at r6, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @mshannon-sil)
machine/corpora/scripture_ref_usfm_parser_handler.py
line 23 at r6 (raw file):
Previously, mshannon-sil wrote…
For variables that were a Stack type in C#, I used a list in python, since it's fast to append and pop items from a list like a stack. I added
_stack
to the end of these variables to document that they should be used as a stack.
Makes sense to me.
machine/corpora/scripture_text.py
line 43 at r6 (raw file):
Previously, mshannon-sil wrote…
In Machine, this method was overloaded to have two separate implementations for either a VerseRef or a List of ScriptureRef as input. Since I can't do exactly the same thing in python, I created a function with the general
create_rows
name, which checks the type of the input and calls the correspondingcreate_rows
method for that type.
That is a good solution.
machine/corpora/scripture_text.py
line 82 at r6 (raw file):
) def _create_row(
Because Python doesn't support overloading, we sometimes have to deviate from the C# code a bit, as you discovered for the _create_rows
method. I think it would be good to name this something else, maybe _create_scripture_row
, then you wouldn't have to call the base class _create_row
method using super()
. If you rename this method, then you should also rename the _create_rows
method to match.
machine/corpora/scripture_text_corpus.py
line 111 at r6 (raw file):
def is_scripture(text_corpus: TextCorpus) -> bool:
We want to export this function from the __init__.py
of the corpora
package.
machine/corpora/standard_parallel_text_corpus.py
line 255 at r6 (raw file):
trg_refs = [] if trg_row is None else [trg_row.ref] if len(trg_refs) == 0 and isinstance(self._target_corpus, ScriptureTextCorpus):
This was a bug in the C# code, you should use the is_scripture
function.
machine/corpora/text_corpus.py
line 203 at r6 (raw file):
Previously, mshannon-sil wrote…
The class _FilterTextCorpus in machine.py is called WhereTextCorpus in Machine. I thought this might be due to different naming conventions in python vs c#, but then _TextFilterTextCorpus in machine.py has the same name in Machine. Is the naming difference here intentional?
As you guessed, this is because of the difference in function names between Python and C#. Python uses map
and filter
. C# uses Select
and Where
. I wanted to match the standard names in each language.
machine/scripture/scripture_element.py
line 10 at r6 (raw file):
@total_ordering class ScriptureElement(Comparable):
This class should be moved to the corpora
package. This class should also be exported from __init__.py
in the corpora
package.
machine/scripture/scripture_ref.py
line 13 at r6 (raw file):
@total_ordering class ScriptureRef(Comparable):
This class should be moved to the corpora
package.
machine/scripture/scripture_ref.py
line 18 at r6 (raw file):
Previously, mshannon-sil wrote…
I was concerned about properties of the
_empty
ScriptureRef (verse_ref and path) getting modified and affecting all instances so that they are no longer empty. But since the public properties only have getters not setters, it should be fine as long as the corresponding private properties aren't modified.
You can make this a top-level constant, i.e. EMPTY_SCRIPTURE_REF
.
machine/scripture/verse_ref.py
line 380 at r6 (raw file):
Previously, mshannon-sil wrote…
The Equals method in SIL.Scripture/VerseRef.cs is the same as the exact_equals method that was in machine.py. I changed the machine.py version to be the eq method instead so that it is called when == is used, since this seemed to me to be what we would want, and it didn't break any tests. Let me know if there's a reason to not make it the default equality comparison.
The Comparable
base class provides an implementation of __eq__
based on the compare_to
method. This makes all of the comparison methods consistent with each other. exact_equals
does not convert the versification, so it serves a different purpose. This does deviate from the C# implementation, but it has more consistent behavior.
machine/scripture/verse_ref.py
line 488 at r6 (raw file):
Previously, mshannon-sil wrote…
I think there's actually a bug in SIL.Scripture/VerseRef.cs. The corresponding section in OverlappingVersesRanges for the VerseRef type input is
if (verseRef1.IsDefault || verseRef1.IsDefault)
but it should beif (verseRef1.IsDefault || verseRef2.IsDefault)
.
Yeah, that looks like a bug in the C# code.
tests/corpora/test_scripture_ref.py
line 6 at r6 (raw file):
class TestScriptureRef(unittest.TestCase):
Is there some reason why you used a test case here? I would prefer it to be consistent with other test fixtures. If it is because you need to specify a description for each assert, you can do so with a normal assert.
Previously, ddaspit (Damien Daspit) wrote…
Should I do this for the get_rows() method as well? Renaming it to get_scripture_rows() so that in the method, rather than calling super().get_rows(), I just call self.get_rows()? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ddaspit)
machine/scripture/scripture_element.py
line 10 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This class should be moved to the
corpora
package. This class should also be exported from__init__.py
in thecorpora
package.
I realize I didn't export most of the classes, I'll go ahead and do that.
machine/scripture/scripture_ref.py
line 13 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This class should be moved to the
corpora
package.
Okay. I had put it in the scripture
folder since it seemed to parallel the VerseRef class in some ways. Is the idea of the scripture
folder just to include python equivalents of SIL.Scripture
?
machine/scripture/verse_ref.py
line 488 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Yeah, that looks like a bug in the C# code.
Do we need to submit a PR to SIL.Scripture to fix it?
tests/corpora/test_scripture_ref.py
line 6 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Is there some reason why you used a test case here? I would prefer it to be consistent with other test fixtures. If it is because you need to specify a description for each assert, you can do so with a normal assert.
I was trying to follow the format of the corresponding test file in Machine, since it uses test cases unlike the other corpora test files. But I agree, it makes sense to keep it consistent. I'll go ahead and change it to match the other files in machine.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @mshannon-sil)
machine/corpora/scripture_text.py
line 82 at r6 (raw file):
Previously, mshannon-sil wrote…
Should I do this for the get_rows() method as well? Renaming it to get_scripture_rows() so that in the method, rather than calling super().get_rows(), I just call self.get_rows()?
In the case of get_rows
, we are overriding the method, so we don't want to change the name. It is also pretty normal to call the base class method when overriding, because you are often adding behavior to the base class implementation.
machine/scripture/scripture_ref.py
line 13 at r6 (raw file):
Previously, mshannon-sil wrote…
Okay. I had put it in the
scripture
folder since it seemed to parallel the VerseRef class in some ways. Is the idea of thescripture
folder just to include python equivalents ofSIL.Scripture
?
Yes, that is correct.
machine/scripture/verse_ref.py
line 488 at r6 (raw file):
Previously, mshannon-sil wrote…
Do we need to submit a PR to SIL.Scripture to fix it?
It would be the neighborly thing to do, but probably not a high priority.
…to corpora folder, use top level constant for empty ScriptureRef, change __eq__ back to exact_equals, keep test files consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @ddaspit)
machine/corpora/scripture_text.py
line 82 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
In the case of
get_rows
, we are overriding the method, so we don't want to change the name. It is also pretty normal to call the base class method when overriding, because you are often adding behavior to the base class implementation.
Done.
machine/corpora/scripture_text_corpus.py
line 111 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
We want to export this function from the
__init__.py
of thecorpora
package.
Done.
machine/corpora/standard_parallel_text_corpus.py
line 255 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This was a bug in the C# code, you should use the
is_scripture
function.
Done.
machine/scripture/scripture_element.py
line 10 at r6 (raw file):
Previously, mshannon-sil wrote…
I realize I didn't export most of the classes, I'll go ahead and do that.
Done.
machine/scripture/scripture_ref.py
line 13 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Yes, that is correct.
Done.
machine/scripture/scripture_ref.py
line 18 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
You can make this a top-level constant, i.e.
EMPTY_SCRIPTURE_REF
.
Done.
machine/scripture/verse_ref.py
line 380 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The
Comparable
base class provides an implementation of__eq__
based on thecompare_to
method. This makes all of the comparison methods consistent with each other.exact_equals
does not convert the versification, so it serves a different purpose. This does deviate from the C# implementation, but it has more consistent behavior.
Okay, I changed it back.
tests/corpora/test_scripture_ref.py
line 6 at r6 (raw file):
Previously, mshannon-sil wrote…
I was trying to follow the format of the corresponding test file in Machine, since it uses test cases unlike the other corpora test files. But I agree, it makes sense to keep it consistent. I'll go ahead and change it to match the other files in machine.py.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 19 of 19 files at r7, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @mshannon-sil)
This change is