Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip parsing excluded books when preprocessing #207

Merged
merged 1 commit into from
Jun 7, 2024
Merged

Conversation

ddaspit
Copy link
Contributor

@ddaspit ddaspit commented Jun 6, 2024

@codecov-commenter
Copy link

codecov-commenter commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 42.92929% with 113 lines in your changes missing coverage. Please review.

Project coverage is 67.23%. Comparing base (3182ff2) to head (02c3efb).
Report is 30 commits behind head on master.

Files Patch % Lines
src/SIL.Machine/Corpora/CorporaExtensions.cs 23.18% 51 Missing and 2 partials ⚠️
src/SIL.Machine/Corpora/DictionaryTextCorpus.cs 34.78% 15 Missing ⚠️
src/SIL.Machine/Corpora/AlignmentCorpusBase.cs 0.00% 12 Missing ⚠️
src/SIL.Machine/Corpora/TextCorpusBase.cs 0.00% 12 Missing ⚠️
...c/SIL.Machine/Corpora/DictionaryAlignmentCorpus.cs 50.00% 7 Missing ⚠️
src/SIL.Machine/Corpora/ParallelTextCorpus.cs 73.91% 5 Missing and 1 partial ⚠️
...chine.AspNetCore/Services/NmtPreprocessBuildJob.cs 83.33% 4 Missing and 1 partial ⚠️
src/SIL.Machine/Corpora/ParallelTextCorpusBase.cs 80.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #207      +/-   ##
==========================================
- Coverage   67.35%   67.23%   -0.13%     
==========================================
  Files         441      441              
  Lines       35144    35277     +133     
  Branches     4703     4721      +18     
==========================================
+ Hits        23672    23717      +45     
- Misses      10381    10466      +85     
- Partials     1091     1094       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@johnml1135 johnml1135 self-requested a review June 7, 2024 14:51
@johnml1135
Copy link
Collaborator

src/SIL.Machine.AspNetCore/Services/NmtPreprocessBuildJob.cs line 236 at r1 (raw file):

    private static bool IsInTrain(Row row, Corpus corpus)
    {
        if (corpus.TrainOnChapters is not null)

TrainOnChapters is never null (a bug I found). Either we check for null or length == 0, or we change how it is made up at the Serval level to send a true null down.

@johnml1135
Copy link
Collaborator

src/SIL.Machine.AspNetCore/Services/NmtPreprocessBuildJob.cs line 269 at r1 (raw file):

        if (!corpus.TrainOnAll)
        {
            IEnumerable<string> textIds = corpus.TrainOnChapters is not null

is null or empty.

@johnml1135
Copy link
Collaborator

src/SIL.Machine.AspNetCore/Services/NmtPreprocessBuildJob.cs line 393 at r1 (raw file):

        if (!corpus.PretranslateAll)
        {
            IEnumerable<string> textIds = corpus.PretranslateChapters is not null

null or empty.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ddaspit)

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ddaspit)

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135 johnml1135 merged commit 8efc011 into master Jun 7, 2024
4 checks passed
@johnml1135 johnml1135 deleted the skip-books branch June 7, 2024 19:19
Enkidu93 added a commit to sillsdev/serval that referenced this pull request Aug 26, 2024
johnml1135 added a commit to sillsdev/serval that referenced this pull request Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't crash everything if USFM books are not included in training
3 participants