-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pretranslation filtering #520
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #520 +/- ##
==========================================
+ Coverage 56.67% 56.75% +0.07%
==========================================
Files 299 299
Lines 15627 15647 +20
Branches 2155 2159 +4
==========================================
+ Hits 8857 8880 +23
Misses 6114 6114
+ Partials 656 653 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit, I think John is hoping we can get this through today in his absence, so if you could review this ASAP + confirm or reject my review comments, I can follow up with any changes and move forward with deploying everything to QA.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 709 at r1 (raw file):
} } return new V1.ParallelCorpus
Won't this result in corpora that have no specified filters? Would it be simpler or more readable to filter the corpora before the map function is called to include only the parallel corpora that are specified in a trainOn/pretranslate, but if there are none, just use all? Then we wouldn't have to pass new arguments either. But this works too, so either way is fine by me.
src/Serval/test/Serval.E2ETests/ServalApiTests.cs
line 125 at r1 (raw file):
true ); _helperClient.TranslationBuildConfig.Pretranslate = [new() { CorpusId = cId2, TextIds = ["2JN.txt"] }];
Can't this be tested at the unit test level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 709 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Won't this result in corpora that have no specified filters? Would it be simpler or more readable to filter the corpora before the map function is called to include only the parallel corpora that are specified in a trainOn/pretranslate, but if there are none, just use all? Then we wouldn't have to pass new arguments either. But this works too, so either way is fine by me.
Can you try implementing your approach? It sounds cleaner.
src/Serval/test/Serval.E2ETests/ServalApiTests.cs
line 125 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Can't this be tested at the unit test level?
I agree. We should test this at the unit test level.
Previously, ddaspit (Damien Daspit) wrote…
If you can make it cleaner, go for it. I was just trying to get down to logic as quickly as I could. |
Previously, ddaspit (Damien Daspit) wrote…
I agree, a unit test would be great. I would still keep this end-to-end function though, just to confirm is that the filter is actually applied IRL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status: 2 of 3 files reviewed, all discussions resolved (waiting on @johnml1135)
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 709 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
If you can make it cleaner, go for it. I was just trying to get down to logic as quickly as I could.
Done. This seems cleaner to me. Also, the unit tests are now passing with this implementation.
src/Serval/test/Serval.E2ETests/ServalApiTests.cs
line 125 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
I agree, a unit test would be great. I would still keep this end-to-end function though, just to confirm is that the filter is actually applied IRL.
Done. Yep, I agree, John. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert the temporary fix? Should we do it in this PR or a separate one?
Reviewed 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)
Don't train/pretranslate on other corpora if one is already defined.
Test for pretranslation filtering
This resolves #516
This change is