Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pretranslation filtering #520

Merged
merged 4 commits into from
Oct 25, 2024
Merged

Fix pretranslation filtering #520

merged 4 commits into from
Oct 25, 2024

Conversation

johnml1135
Copy link
Collaborator

@johnml1135 johnml1135 commented Oct 25, 2024

Don't train/pretranslate on other corpora if one is already defined.
Test for pretranslation filtering
This resolves #516


This change is Reviewable

@codecov-commenter
Copy link

codecov-commenter commented Oct 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.75%. Comparing base (0b06fbf) to head (8342610).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #520      +/-   ##
==========================================
+ Coverage   56.67%   56.75%   +0.07%     
==========================================
  Files         299      299              
  Lines       15627    15647      +20     
  Branches     2155     2159       +4     
==========================================
+ Hits         8857     8880      +23     
  Misses       6114     6114              
+ Partials      656      653       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit, I think John is hoping we can get this through today in his absence, so if you could review this ASAP + confirm or reject my review comments, I can follow up with any changes and move forward with deploying everything to QA.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/Serval/src/Serval.Translation/Services/EngineService.cs line 709 at r1 (raw file):

            }
        }
        return new V1.ParallelCorpus

Won't this result in corpora that have no specified filters? Would it be simpler or more readable to filter the corpora before the map function is called to include only the parallel corpora that are specified in a trainOn/pretranslate, but if there are none, just use all? Then we wouldn't have to pass new arguments either. But this works too, so either way is fine by me.


src/Serval/test/Serval.E2ETests/ServalApiTests.cs line 125 at r1 (raw file):

            true
        );
        _helperClient.TranslationBuildConfig.Pretranslate = [new() { CorpusId = cId2, TextIds = ["2JN.txt"] }];

Can't this be tested at the unit test level?

@ddaspit ddaspit requested a review from Enkidu93 October 25, 2024 16:48
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)


src/Serval/src/Serval.Translation/Services/EngineService.cs line 709 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Won't this result in corpora that have no specified filters? Would it be simpler or more readable to filter the corpora before the map function is called to include only the parallel corpora that are specified in a trainOn/pretranslate, but if there are none, just use all? Then we wouldn't have to pass new arguments either. But this works too, so either way is fine by me.

Can you try implementing your approach? It sounds cleaner.


src/Serval/test/Serval.E2ETests/ServalApiTests.cs line 125 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Can't this be tested at the unit test level?

I agree. We should test this at the unit test level.

@johnml1135
Copy link
Collaborator Author

src/Serval/src/Serval.Translation/Services/EngineService.cs line 709 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Can you try implementing your approach? It sounds cleaner.

If you can make it cleaner, go for it. I was just trying to get down to logic as quickly as I could.

@johnml1135
Copy link
Collaborator Author

src/Serval/test/Serval.E2ETests/ServalApiTests.cs line 125 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I agree. We should test this at the unit test level.

I agree, a unit test would be great. I would still keep this end-to-end function though, just to confirm is that the filter is actually applied IRL.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: 2 of 3 files reviewed, all discussions resolved (waiting on @johnml1135)


src/Serval/src/Serval.Translation/Services/EngineService.cs line 709 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

If you can make it cleaner, go for it. I was just trying to get down to logic as quickly as I could.

Done. This seems cleaner to me. Also, the unit tests are now passing with this implementation.


src/Serval/test/Serval.E2ETests/ServalApiTests.cs line 125 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

I agree, a unit test would be great. I would still keep this end-to-end function though, just to confirm is that the filter is actually applied IRL.

Done. Yep, I agree, John. Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert the temporary fix? Should we do it in this PR or a separate one?

Reviewed 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)

@Enkidu93 Enkidu93 merged commit 81333af into main Oct 25, 2024
3 checks passed
@Enkidu93 Enkidu93 deleted the fix_pretranslation_filtering branch October 25, 2024 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

If one corpora is specified in a build, it should be assumed that all others are not used
4 participants