-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A better fix for #516. #521
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #521 +/- ##
==========================================
+ Coverage 56.75% 56.86% +0.11%
==========================================
Files 299 299
Lines 15647 15686 +39
Branches 2159 2165 +6
==========================================
+ Hits 8880 8920 +40
Misses 6114 6114
+ Partials 653 652 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93 and @johnml1135)
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 632 at r1 (raw file):
TrainingCorpus? trainingCorpus, PretranslateCorpus? pretranslateCorpus, bool noTrainingCorpusDefined,
Could we rename these parameters to trainOnAll
and pretranslateAll
(or trainOnAllCorpora
and pretranslateAllCorpora
)? That helps the readability of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, John! I'm fine with this as long as it's the behavior we want. It depends on how interdependent you'd like trainOn
and pretranslate
to be and what kinds of filters you want to make easier to compose and which more difficult - i.e., something is implicit now which wasn't implicit before and something would have to be made explicit which was previously implicit.
I apologize if I did this logic last week differently than you expected; I thought you were expecting it to work differently. This underscores the issue I've mentioned before which is that we need to have a clearer definition of intended behavior across all filtering arrangements. There are so many fields that can be null, empty, or set or situations in which corpora can specified multiple times in different places. It might be good to work on a document to describe this more fully & then make sure our tests cover that.
Reviewed 2 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)
src/Serval/test/Serval.Translation.Tests/Services/EngineServiceTests.cs
line 470 at r1 (raw file):
[Test] public async Task StartBuildAsync_OneEachOfMultipleCorpora()
There's already a test named StartBuildAsync_ParallelCorpus_OneOfMultipleCorpora
, I believe. I think a more descriptive name for each of them is in order to differentiate the two (or merge them into one).
Previously, ddaspit (Damien Daspit) wrote…
How about noTrainingFilter and noPretranslationFilter? That allowed the logic that it is used within to be clearer - either there is "noTrainingFilter" or "the source filter is null" . It is used to combine the logic for when to "trainOnAll." |
Previously, johnml1135 (John Lambert) wrote…
or "noTrainingCorpusFilter". |
Previously, Enkidu93 (Eli C. Lowry) wrote…
TrainOnOnePretranslateOnTheOther |
Updated documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where?
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/Serval/test/Serval.Translation.Tests/Services/EngineServiceTests.cs
line 470 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
TrainOnOnePretranslateOnTheOther
Or TrainOnOnePretranslateTheOther
?
Coming... |
Updated. |
Previously, Enkidu93 (Eli C. Lowry) wrote…
better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 3 of 4 files at r2, all commit messages.
Reviewable status: 4 of 5 files reviewed, 5 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs
line 994 at r2 (raw file):
/// <remarks> /// Specify the corpora and textIds/scriptureRanges within those corpora to train on. Only one type of corpus may be used: either (legacy) corpora (see /translation/engines/{id}/corpora) or parallel corpora (see /translation/engines/{id}/parallel-corpora). /// A (legacy) corpus is selected by specifying CorpusId and a parallel corpus is selected for training by specifying the appropriate CorpusId in SourceFilters or TargetFilters.
I would make the language consistent across the legacy corpus and the parallel corpus. I'd also mention the fact that the parallel corpus id does need to be specified. Something like: "A (legacy) corpus is selected by specifying CorpusId and a parallel corpus is selected by specifying ParallelCorpusId, A parallel corpus can be further filtered by specifying particular CorpusIds in SourceFilters or TargetFilters"
src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs
line 998 at r2 (raw file):
/// Filters can also be supplied via scriptureRange parameter as ranges of biblical text. See [here](https://github.com/sillsdev/serval/wiki/Filtering-Paratext-Project-Data-with-a-Scripture-Range) /// All Paratext project filtering follows original versification. See [here](https://github.com/sillsdev/serval/wiki/Versification-in-Serval) for more information. /// If `trainOn` or `pretranslate` is not provided, all corpora will be used for the respective task.
It might be more clear to say "...provided, all corpora will be used for training or pretranslation respectively", but this is fine too
src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs
line 999 at r2 (raw file):
/// All Paratext project filtering follows original versification. See [here](https://github.com/sillsdev/serval/wiki/Versification-in-Serval) for more information. /// If `trainOn` or `pretranslate` is not provided, all corpora will be used for the respective task. /// If a corpus is selected for training or pretranslation and neither scriptureRange or textIds are defined, all of the selected corpus will be used.
Not necessary but I'd suggest: "...neither...nor..."
src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs
line 1001 at r2 (raw file):
/// If a corpus is selected for training or pretranslation and neither scriptureRange or textIds are defined, all of the selected corpus will be used. /// If a corpus is selected for training or pretranslation and an empty scriptureRange or textIds is defined, none of the selected corpus will be used. /// If a corpus is selected for training or pretranslation, all corpora that are not selected will not be used for the respective task.
I might further qualify it; something like: "If a corpus is selected for training or pretranslation but no further filters are provided, all selected corpora will be used for training or pretranslation respectively"
Previously, Enkidu93 (Eli C. Lowry) wrote…
Updated. |
Previously, Enkidu93 (Eli C. Lowry) wrote…
updated |
Previously, Enkidu93 (Eli C. Lowry) wrote…
done |
Previously, Enkidu93 (Eli C. Lowry) wrote…
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like to see Damien's approval too, but it looks good to me. Thank you for doing this, John!
Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Peter asked us to mark the deprecated endpoints and properties. Should we do that in this PR or a separate PR? We just need to add the Obsolete
attribute to the appropriate controller methods and DTO properties.
Reviewed 2 of 4 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 632 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
or "noTrainingCorpusFilter".
Although noTrainingCorpusFilter
and noPretranslateCorpusFilter
accurately describe the state of the data model, I would prefer to use names that represent what those states actually mean. When no training corpora is defined, it means that we want to train on all corpora. This makes the code clearer and more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to do that in my original mixed source PR, and it yelled at me, so it might be more complicated than it seems and worth separating into a different PR.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)
I agree, let's make it a separate PR. |
Previously, ddaspit (Damien Daspit) wrote…
Updated naming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
#516 - this should be applied independently for pretranslate and trainOn.
This change is