Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update machine.py to reflect usfm changes in machine #103

Merged
merged 3 commits into from
Apr 4, 2024

Conversation

mshannon-sil
Copy link
Collaborator

@mshannon-sil mshannon-sil commented Mar 27, 2024

Based on the changes in Machine from sillsdev/machine#160, I've updated machine.py's handling of USFM accordingly.

Main changes:

  1. Files in machine/corpora were either created or modified to align with new files/changes in Machine
  2. Versification.load() in machine/scripture/verse_ref.py was updated to accept a stream as input, like it can in Machine
  3. Test files were created or modified to reflect new test files/changes in Machine
  4. I added a new test case test_utf_16_encoding_stream() in tests/scripture/test_versification.py to test code I added which handles Unicode errors for loading versification from a stream.

This change is Reviewable

@mshannon-sil mshannon-sil added the enhancement New feature or request label Mar 27, 2024
@mshannon-sil mshannon-sil requested a review from ddaspit March 27, 2024 19:40
@mshannon-sil mshannon-sil self-assigned this Mar 27, 2024
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 20 files at r1, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @mshannon-sil)


machine/corpora/paratext_backup_terms_corpus.py line 15 at r1 (raw file):

class ParatextBackupTermsCorpus(DictionaryTextCorpus):
    def __init__(self, filename: str, term_categories: List[str]) -> None:
        self._predefined_terms_list_types = ["Major", "All", "SilNt", "Pt6"]

You can move this to a constant (_PREDEFINED_TERMS_LIST_TYPES) at the top of the file.


machine/corpora/paratext_backup_terms_corpus.py line 61 at r1 (raw file):

            self._add_text(text)

    def _get_renderings(self, rendering: str) -> List[str]:

You can refactor these methods to functions at the top level of the file. Generally, I port any private static methods to module-level "private" functions.


machine/corpora/paratext_project_settings.py line 7 at r1 (raw file):

class ParatextProjectSettings(ABC):

This can be a dataclass.


machine/corpora/paratext_project_settings_parser_base.py line 15 at r1 (raw file):

    @abstractmethod
    def __enter__(self) -> "ParatextProjectSettingsParserBase": ...

I don't think this class needs to be a context manager.


machine/corpora/zip_paratext_project_settings_parser.py line 20 at r1 (raw file):

        return file_name in self._archive.namelist()

    def find(self, extension: str) -> Union[str, None]:

Optional[str] is better than Union[str, None].


machine/corpora/zip_paratext_project_settings_parser_base.py line 15 at r1 (raw file):

                    stylesheet_temp_file.write(source.read())
                stylesheet_path = stylesheet_temp_file.name
            custom_stylesheet_path = None

You forgot to specify the type here.


tests/corpora/test_usfm_verse_text_updater.py line 5 at r1 (raw file):

from testutils.corpora_test_helpers import USFM_TEST_PROJECT_PATH

from machine.corpora.usfm_parser import parse_usfm

You should import from the package, i.e. from machine.corpora import parse_usfm.


tests/corpora/test_usfm_verse_text_updater.py line 13 at r1 (raw file):

    rows = [
        (
            [VerseRef.from_string("MAT 1:1", Versification.get_builtin(VersificationType.ENGLISH))],

You can import the versification as a constant from machine.scripture, i.e. from machine.scripture import ENGLISH_VERSIFICATION.

Copy link
Collaborator Author

@mshannon-sil mshannon-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 13 of 21 files reviewed, 8 unresolved discussions (waiting on @ddaspit and @mshannon-sil)


machine/corpora/paratext_backup_terms_corpus.py line 15 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You can move this to a constant (_PREDEFINED_TERMS_LIST_TYPES) at the top of the file.

Done.


machine/corpora/paratext_backup_terms_corpus.py line 61 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You can refactor these methods to functions at the top level of the file. Generally, I port any private static methods to module-level "private" functions.

Done.


machine/corpora/paratext_project_settings.py line 7 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This can be a dataclass.

Done.


machine/corpora/paratext_project_settings_parser_base.py line 15 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I don't think this class needs to be a context manager.

Done.


machine/corpora/zip_paratext_project_settings_parser.py line 20 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Optional[str] is better than Union[str, None].

Done.


machine/corpora/zip_paratext_project_settings_parser_base.py line 15 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You forgot to specify the type here.

Done.


tests/corpora/test_usfm_verse_text_updater.py line 5 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should import from the package, i.e. from machine.corpora import parse_usfm.

Done.


tests/corpora/test_usfm_verse_text_updater.py line 13 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You can import the versification as a constant from machine.scripture, i.e. from machine.scripture import ENGLISH_VERSIFICATION.

Done. Also, there's a handful of other places in the codebase that use Versification.get_builtin(VersificationType.ENGLISH) rather than ENGLISH_VERSIFICATION. Would you like me to change those too so that it's consistent?

@mshannon-sil
Copy link
Collaborator Author

This PR is porting over the changes from sillsdev/machine#163 as well now.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @mshannon-sil)


machine/corpora/paratext_project_settings.py line 7 at r1 (raw file):

Previously, mshannon-sil wrote…

Done.

I don't think it needs to be an abstract base class.


tests/corpora/test_usfm_verse_text_updater.py line 13 at r1 (raw file):

Previously, mshannon-sil wrote…

Done. Also, there's a handful of other places in the codebase that use Versification.get_builtin(VersificationType.ENGLISH) rather than ENGLISH_VERSIFICATION. Would you like me to change those too so that it's consistent?

I believe that there are a couple of places that need to use ``Versification.get_builtin()`, but other than that everywhere else should use the constants.

@codecov-commenter
Copy link

codecov-commenter commented Apr 4, 2024

Codecov Report

Attention: Patch coverage is 92.22462% with 36 lines in your changes are missing coverage. Please review.

Project coverage is 88.33%. Comparing base (61d49d8) to head (4e6ccde).

Files Patch % Lines
machine/corpora/usfm_parser.py 50.00% 11 Missing ⚠️
...rpora/zip_paratext_project_settings_parser_base.py 66.66% 6 Missing ⚠️
...e/corpora/paratext_project_settings_parser_base.py 91.80% 5 Missing ⚠️
...ne/corpora/zip_paratext_project_settings_parser.py 72.22% 5 Missing ⚠️
machine/corpora/usfm_verse_text_updater.py 96.39% 4 Missing ⚠️
machine/corpora/corpora_utils.py 66.66% 1 Missing ⚠️
machine/corpora/dbl_bundle_text_corpus.py 66.66% 1 Missing ⚠️
...e/corpora/file_paratext_project_settings_parser.py 94.11% 1 Missing ⚠️
machine/corpora/paratext_project_settings.py 97.29% 1 Missing ⚠️
machine/corpora/usfm_parser_state.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
+ Coverage   87.62%   88.33%   +0.71%     
==========================================
  Files         226      234       +8     
  Lines       13520    13816     +296     
==========================================
+ Hits        11847    12205     +358     
+ Misses       1673     1611      -62     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator Author

@mshannon-sil mshannon-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 19 of 25 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @mshannon-sil)


machine/corpora/paratext_project_settings.py line 7 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I don't think it needs to be an abstract base class.

Done.


tests/corpora/test_usfm_verse_text_updater.py line 13 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I believe that there are a couple of places that need to use ``Versification.get_builtin()`, but other than that everywhere else should use the constants.

Okay, I changed most of the places that I saw using Versification.get_builtin(VersificationType.ENGLISH) to use ENGLISH_VERSIFICATION instead, except in verse_ref.py to avoid circular imports. All the test cases are still passing. Let me know if there's any of those places that I need to change back to the original method.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 6 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @mshannon-sil)

@mshannon-sil mshannon-sil changed the title update machine.py to reflect usfm changes in machine Update machine.py to reflect usfm changes in machine Apr 4, 2024
@mshannon-sil mshannon-sil merged commit bda3b54 into main Apr 4, 2024
14 checks passed
@mshannon-sil mshannon-sil deleted the machine_usfm branch April 4, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants