Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag for deleting files when deleting corpus #441

Merged
merged 4 commits into from
Jul 29, 2024

Conversation

Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Jul 22, 2024

Fixes #342


This change is Reviewable

@Enkidu93 Enkidu93 requested review from ddaspit and johnml1135 July 22, 2024 20:00
@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.

Project coverage is 61.89%. Comparing base (f5b7620) to head (1ed1ce9).

Files Patch % Lines
...rval/src/Serval.Shared/Contracts/DeleteDataFile.cs 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #441      +/-   ##
==========================================
+ Coverage   61.77%   61.89%   +0.11%     
==========================================
  Files         232      234       +2     
  Lines       11830    11856      +26     
  Branches     1510     1513       +3     
==========================================
+ Hits         7308     7338      +30     
+ Misses       3995     3993       -2     
+ Partials      527      525       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 9 files at r1, all commit messages.
Reviewable status: 2 of 9 files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)


src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs line 497 at r1 (raw file):

        [NotNull] string id,
        [NotNull] string corpusId,
        [FromQuery] bool? deleteFiles,

We want to use kebab case for query parameters. There are some older parameters that are incorrect that we need to fix, but for all new parameters we want to make sure they use kebab case. You can explicitly specify the name using the FromQuery attribute, i.e. [FromQuery(Name = "delete-files")].


src/Serval/src/Serval.Translation/Services/EngineService.cs line 9 at r1 (raw file):

    IRepository<Build> builds,
    IRepository<Pretranslation> pretranslations,
    IDataFileService dataFileService,

We want to keep the different domains (translation, data files, webhooks, etc.) loosely coupled. We have achieved this by publishing messages to a mediator that different domains can subscribe to. For an example, you can see how a translation engine corpus is updated when a data file is deleted. The DataFileService.DeleteAsync publishes a DataFileDeleted message to the mediator. The translation domain subscribes to the message by adding a consumer, i.e. DataFileDeletedConsumer.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 9 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs line 497 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We want to use kebab case for query parameters. There are some older parameters that are incorrect that we need to fix, but for all new parameters we want to make sure they use kebab case. You can explicitly specify the name using the FromQuery attribute, i.e. [FromQuery(Name = "delete-files")].

Ok, I'm on it.


src/Serval/src/Serval.Translation/Services/EngineService.cs line 9 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We want to keep the different domains (translation, data files, webhooks, etc.) loosely coupled. We have achieved this by publishing messages to a mediator that different domains can subscribe to. For an example, you can see how a translation engine corpus is updated when a data file is deleted. The DataFileService.DeleteAsync publishes a DataFileDeleted message to the mediator. The translation domain subscribes to the message by adding a consumer, i.e. DataFileDeletedConsumer.

Yeah, I wondered about this, but I figured it was easier to just push what I had and wait for the critique 😄. Thank you for pointing me in the right direction.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've properly addressed the issues you mentioned, @ddaspit.

Reviewable status: 0 of 12 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs line 497 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Ok, I'm on it.

Done.


src/Serval/src/Serval.Translation/Services/EngineService.cs line 9 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Yeah, I wondered about this, but I figured it was easier to just push what I had and wait for the critique 😄. Thank you for pointing me in the right direction.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 9 files at r1, 9 of 9 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93 and @johnml1135)


src/Serval/src/Serval.Translation/Services/EngineService.cs line 466 at r2 (raw file):

            )
            {
                await _mediator.Publish<DeleteDataFile>(new { DataFileId = id }, cancellationToken);

This should be Send instead of Publish, since this is an operation and not an event.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs line 497 at r2 (raw file):

        [NotNull] string id,
        [NotNull] string corpusId,
        [FromQuery(Name = "delete-files")] bool? deleteFiles,

You need some documentation on how this works - specifically it will delete all files associated with the engine, even if the files are associated with other engines (I assume).

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 12 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs line 497 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

You need some documentation on how this works - specifically it will delete all files associated with the engine, even if the files are associated with other engines (I assume).

Is that sufficient? Or would you rather put something in the remark as well?


src/Serval/src/Serval.Translation/Services/EngineService.cs line 466 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This should be Send instead of Publish, since this is an operation and not an event.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @johnml1135)

@Enkidu93
Copy link
Collaborator Author

@johnml1135 , are you satisfied with the updates?

@johnml1135
Copy link
Collaborator

src/Serval/test/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 785 at r3 (raw file):

        ICollection<TranslationCorpus> resultsAfterDelete = await client.GetAllCorporaAsync(ECHO_ENGINE1_ID);
        Assert.That(resultsAfterDelete, Has.Count.EqualTo(0));
        Assert.That(await _env.DataFiles.GetAllAsync(), Has.Count.EqualTo(2)); //Paratext files still exist

it would be great to have a few files that were added that were not deleted. I don't want this deleting all files, just the ones in the database. Also, if you don't say "delete", I want to make sure that It doesn't delete them.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 9 files at r1, 6 of 9 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.DataFiles/Consumers/DeleteDataFileConsumer.cs line 9 at r3 (raw file):

    public async Task Consume(ConsumeContext<DeleteDataFile> context)
    {
        await _dataFileService.DeleteAsync(context.Message.DataFileId, context.CancellationToken);

If the same file is deleted multiple times, what happens? Will it throw an error?

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

            {
                await _mediator.Send<DeleteDataFile>(new { DataFileId = id }, cancellationToken);
            }

If multiple corpses have the same files, what will happen?

@Enkidu93
Copy link
Collaborator Author

src/Serval/test/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 785 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

it would be great to have a few files that were added that were not deleted. I don't want this deleting all files, just the ones in the database. Also, if you don't say "delete", I want to make sure that It doesn't delete them.

Sure, OK, I can do that! What kind of files do you mean for me to add and have not deleted? There are already Paratext files that are not deleted but have been added (see the comment in line 785). You mean Text format files?

I think the not deleting is implicitly tested elsewhere, but I'll make it explicit.

@Enkidu93
Copy link
Collaborator Author

src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

If multiple corpses have the same files, what will happen?

Currently, it will remove those files from other corpora via the DeleteAllCorpusFilesAsync function.

@Enkidu93
Copy link
Collaborator Author

src/Serval/src/Serval.DataFiles/Consumers/DeleteDataFileConsumer.cs line 9 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

If the same file is deleted multiple times, what happens? Will it throw an error?

I'm not sure how that could happen. Once the corpus is deleted and all the files have been deleted (and removed from other corpora), there wouldn't be a way to get at this code and delete those files. If that were to happen, it would throw an error (I believe) which would appear in the logs but not bubble up and break any code publishing to this consumer.

@johnml1135
Copy link
Collaborator

src/Serval/test/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 785 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Sure, OK, I can do that! What kind of files do you mean for me to add and have not deleted? There are already Paratext files that are not deleted but have been added (see the comment in line 785). You mean Text format files?

I think the not deleting is implicitly tested elsewhere, but I'll make it explicit.

Ah I see the other files now....

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 12 files reviewed, 3 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/Serval/test/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 785 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Ah I see the other files now....

Done. Added a test.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Currently, it will remove those files from other corpora via the DeleteAllCorpusFilesAsync function.

Specifically imagine that two corporas on the same engine each have the same file. I believe that two "delete file 123" commands would be sent out. This is fine, I just probably want handling that if a file is deleted even if it doesn't exist anymore, that it just silently continues on rather than crashing and not deleting the other files that are in the queue.

@ddaspit ddaspit requested a review from johnml1135 July 26, 2024 17:48
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 11 of 12 files reviewed, 3 unresolved discussions (waiting on @johnml1135)


src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Specifically imagine that two corporas on the same engine each have the same file. I believe that two "delete file 123" commands would be sent out. This is fine, I just probably want handling that if a file is deleted even if it doesn't exist anymore, that it just silently continues on rather than crashing and not deleting the other files that are in the queue.

Further, what about two different engines that use the same file? Right now, it will silently delete the file from the corpus in the other engine, which might be an unintended side effect for the calling application. Maybe we should only delete the file if this is the last engine that is referencing the file. This seems to be getting pretty complicated. I don't want to add something that creates more problems than it solves. What is the motivation for this feature? Did the SF team specifically request it?

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.DataFiles/Consumers/DeleteDataFileConsumer.cs line 9 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I'm not sure how that could happen. Once the corpus is deleted and all the files have been deleted (and removed from other corpora), there wouldn't be a way to get at this code and delete those files. If that were to happen, it would throw an error (I believe) which would appear in the logs but not bubble up and break any code publishing to this consumer.

https://masstransit.io/documentation/concepts/exceptions - are we handling this exception properly? If there is no file, we want to just stop trying.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)

@Enkidu93
Copy link
Collaborator Author

src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Further, what about two different engines that use the same file? Right now, it will silently delete the file from the corpus in the other engine, which might be an unintended side effect for the calling application. Maybe we should only delete the file if this is the last engine that is referencing the file. This seems to be getting pretty complicated. I don't want to add something that creates more problems than it solves. What is the motivation for this feature? Did the SF team specifically request it?

Right. It is a little sketchy. I don't know that SF reuses file ids across engines like that though, do they? The flag does default to false, so it shouldn't change any behavior for now. But I could see the utility when wanting to quickly remove all corpus-related data (i.e., not end up with lots of orphaned data files). I don't know about the requirement though, no. @johnml1135 would have to answer.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)


src/Serval/src/Serval.DataFiles/Consumers/DeleteDataFileConsumer.cs line 9 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

https://masstransit.io/documentation/concepts/exceptions - are we handling this exception properly? If there is no file, we want to just stop trying.

Yes, I was looking in the same place. I don't think that retries are configured by default, so we should be fine.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Right. It is a little sketchy. I don't know that SF reuses file ids across engines like that though, do they? The flag does default to false, so it shouldn't change any behavior for now. But I could see the utility when wanting to quickly remove all corpus-related data (i.e., not end up with lots of orphaned data files). I don't know about the requirement though, no. @johnml1135 would have to answer.

It was more relevant (and requested) when Peter was uploading a file per chapter and there were thousands of files. It still has value, but less so now. Is there a mapping from files to engines? Can we easily do the logic "delete if it is the last engine holding it"?

@Enkidu93
Copy link
Collaborator Author

src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

It was more relevant (and requested) when Peter was uploading a file per chapter and there were thousands of files. It still has value, but less so now. Is there a mapping from files to engines? Can we easily do the logic "delete if it is the last engine holding it"?

I see. One simple thing we could do (for now anyways) would be to only delete files that are not associated with other corpora.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)


src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I see. One simple thing we could do (for now anyways) would be to only delete files that are not associated with other corpora.

Will SF use this feature if we add it? If not, I think we should defer this feature until we have clearer requirements.

@Enkidu93
Copy link
Collaborator Author

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)

src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…

Will SF use this feature if we add it? If not, I think we should defer this feature until we have clearer requirements.

Who should we ask? Peter?

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)


src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Who should we ask? Peter?

Yes, Peter would be right person to ask.

@Enkidu93
Copy link
Collaborator Author

src/Serval/src/Serval.Translation/Services/EngineService.cs line 467 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Yes, Peter would be right person to ask.

OK, so the word is that: 1. This would still be helpful/convenient and 2. SF never uses the same data file across engines (not to say some other user of Serval couldn't).

@johnml1135
Copy link
Collaborator

:lgtm:

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@johnml1135 johnml1135 merged commit 0e0e9f5 into main Jul 29, 2024
2 checks passed
@johnml1135 johnml1135 deleted the delete_files_on_corpus_delete branch July 29, 2024 21:18
@Enkidu93
Copy link
Collaborator Author

@johnml1135 Was it worth changing it so it only deletes not-shared-between-engines files? I guess it's at least safe since it's not the default behavior to delete files anyways.

@johnml1135
Copy link
Collaborator

It's not the default behavior and it documents what it actually does.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @johnml1135. We document the behavior clearly and it meets SF's current requirements. If the requirements change in the future, we can re-evaluate.

Reviewed 1 of 1 files at r4.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

delete engine - and then all files
4 participants