-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flag for deleting files when deleting corpus #441
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #441 +/- ##
==========================================
+ Coverage 61.77% 61.89% +0.11%
==========================================
Files 232 234 +2
Lines 11830 11856 +26
Branches 1510 1513 +3
==========================================
+ Hits 7308 7338 +30
+ Misses 3995 3993 -2
+ Partials 527 525 -2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 9 files at r1, all commit messages.
Reviewable status: 2 of 9 files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs
line 497 at r1 (raw file):
[NotNull] string id, [NotNull] string corpusId, [FromQuery] bool? deleteFiles,
We want to use kebab case for query parameters. There are some older parameters that are incorrect that we need to fix, but for all new parameters we want to make sure they use kebab case. You can explicitly specify the name using the FromQuery
attribute, i.e. [FromQuery(Name = "delete-files")]
.
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 9 at r1 (raw file):
IRepository<Build> builds, IRepository<Pretranslation> pretranslations, IDataFileService dataFileService,
We want to keep the different domains (translation, data files, webhooks, etc.) loosely coupled. We have achieved this by publishing messages to a mediator that different domains can subscribe to. For an example, you can see how a translation engine corpus is updated when a data file is deleted. The DataFileService.DeleteAsync
publishes a DataFileDeleted
message to the mediator. The translation domain subscribes to the message by adding a consumer, i.e. DataFileDeletedConsumer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 9 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs
line 497 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
We want to use kebab case for query parameters. There are some older parameters that are incorrect that we need to fix, but for all new parameters we want to make sure they use kebab case. You can explicitly specify the name using the
FromQuery
attribute, i.e.[FromQuery(Name = "delete-files")]
.
Ok, I'm on it.
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 9 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
We want to keep the different domains (translation, data files, webhooks, etc.) loosely coupled. We have achieved this by publishing messages to a mediator that different domains can subscribe to. For an example, you can see how a translation engine corpus is updated when a data file is deleted. The
DataFileService.DeleteAsync
publishes aDataFileDeleted
message to the mediator. The translation domain subscribes to the message by adding a consumer, i.e.DataFileDeletedConsumer
.
Yeah, I wondered about this, but I figured it was easier to just push what I had and wait for the critique 😄. Thank you for pointing me in the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've properly addressed the issues you mentioned, @ddaspit.
Reviewable status: 0 of 12 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs
line 497 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Ok, I'm on it.
Done.
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 9 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Yeah, I wondered about this, but I figured it was easier to just push what I had and wait for the critique 😄. Thank you for pointing me in the right direction.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 9 files at r1, 9 of 9 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93 and @johnml1135)
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 466 at r2 (raw file):
) { await _mediator.Publish<DeleteDataFile>(new { DataFileId = id }, cancellationToken);
This should be Send
instead of Publish
, since this is an operation and not an event.
You need some documentation on how this works - specifically it will delete all files associated with the engine, even if the files are associated with other engines (I assume). |
…turn code description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 12 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/Serval/src/Serval.Translation/Controllers/TranslationEnginesController.cs
line 497 at r2 (raw file):
Previously, johnml1135 (John Lambert) wrote…
You need some documentation on how this works - specifically it will delete all files associated with the engine, even if the files are associated with other engines (I assume).
Is that sufficient? Or would you rather put something in the remark as well?
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 466 at r2 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This should be
Send
instead ofPublish
, since this is an operation and not an event.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @johnml1135)
@johnml1135 , are you satisfied with the updates? |
it would be great to have a few files that were added that were not deleted. I don't want this deleting all files, just the ones in the database. Also, if you don't say "delete", I want to make sure that It doesn't delete them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 9 files at r1, 6 of 9 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
If the same file is deleted multiple times, what happens? Will it throw an error? |
If multiple corpses have the same files, what will happen? |
Previously, johnml1135 (John Lambert) wrote…
Sure, OK, I can do that! What kind of files do you mean for me to add and have not deleted? There are already Paratext files that are not deleted but have been added (see the comment in line 785). You mean Text format files? I think the not deleting is implicitly tested elsewhere, but I'll make it explicit. |
Previously, johnml1135 (John Lambert) wrote…
Currently, it will remove those files from other corpora via the |
Previously, johnml1135 (John Lambert) wrote…
I'm not sure how that could happen. Once the corpus is deleted and all the files have been deleted (and removed from other corpora), there wouldn't be a way to get at this code and delete those files. If that were to happen, it would throw an error (I believe) which would appear in the logs but not bubble up and break any code publishing to this consumer. |
Previously, Enkidu93 (Eli C. Lowry) wrote…
Ah I see the other files now.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 12 files reviewed, 3 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/Serval/test/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs
line 785 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Ah I see the other files now....
Done. Added a test.
Previously, Enkidu93 (Eli C. Lowry) wrote…
Specifically imagine that two corporas on the same engine each have the same file. I believe that two "delete file 123" commands would be sent out. This is fine, I just probably want handling that if a file is deleted even if it doesn't exist anymore, that it just silently continues on rather than crashing and not deleting the other files that are in the queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 11 of 12 files reviewed, 3 unresolved discussions (waiting on @johnml1135)
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 467 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Specifically imagine that two corporas on the same engine each have the same file. I believe that two "delete file 123" commands would be sent out. This is fine, I just probably want handling that if a file is deleted even if it doesn't exist anymore, that it just silently continues on rather than crashing and not deleting the other files that are in the queue.
Further, what about two different engines that use the same file? Right now, it will silently delete the file from the corpus in the other engine, which might be an unintended side effect for the calling application. Maybe we should only delete the file if this is the last engine that is referencing the file. This seems to be getting pretty complicated. I don't want to add something that creates more problems than it solves. What is the motivation for this feature? Did the SF team specifically request it?
Previously, Enkidu93 (Eli C. Lowry) wrote…
https://masstransit.io/documentation/concepts/exceptions - are we handling this exception properly? If there is no file, we want to just stop trying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
Previously, ddaspit (Damien Daspit) wrote…
Right. It is a little sketchy. I don't know that SF reuses file ids across engines like that though, do they? The flag does default to false, so it shouldn't change any behavior for now. But I could see the utility when wanting to quickly remove all corpus-related data (i.e., not end up with lots of orphaned data files). I don't know about the requirement though, no. @johnml1135 would have to answer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)
src/Serval/src/Serval.DataFiles/Consumers/DeleteDataFileConsumer.cs
line 9 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
https://masstransit.io/documentation/concepts/exceptions - are we handling this exception properly? If there is no file, we want to just stop trying.
Yes, I was looking in the same place. I don't think that retries are configured by default, so we should be fine.
Previously, Enkidu93 (Eli C. Lowry) wrote…
It was more relevant (and requested) when Peter was uploading a file per chapter and there were thousands of files. It still has value, but less so now. Is there a mapping from files to engines? Can we easily do the logic "delete if it is the last engine holding it"? |
Previously, johnml1135 (John Lambert) wrote…
I see. One simple thing we could do (for now anyways) would be to only delete files that are not associated with other corpora. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 467 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I see. One simple thing we could do (for now anyways) would be to only delete files that are not associated with other corpora.
Will SF use this feature if we add it? If not, I think we should defer this feature until we have clearer requirements.
Who should we ask? Peter? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)
src/Serval/src/Serval.Translation/Services/EngineService.cs
line 467 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Who should we ask? Peter?
Yes, Peter would be right person to ask.
Previously, ddaspit (Damien Daspit) wrote…
OK, so the word is that: 1. This would still be helpful/convenient and 2. SF never uses the same data file across engines (not to say some other user of Serval couldn't). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)
@johnml1135 Was it worth changing it so it only deletes not-shared-between-engines files? I guess it's at least safe since it's not the default behavior to delete files anyways. |
It's not the default behavior and it documents what it actually does. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @johnml1135. We document the behavior clearly and it meets SF's current requirements. If the requirements change in the future, we can re-evaluate.
Reviewed 1 of 1 files at r4.
Reviewable status: complete! all files reviewed, all discussions resolved
Fixes #342
This change is