Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add assessment API #455

Merged
merged 1 commit into from
Aug 21, 2024
Merged

Add assessment API #455

merged 1 commit into from
Aug 21, 2024

Conversation

ddaspit
Copy link
Contributor

@ddaspit ddaspit commented Aug 15, 2024

This change is Reviewable

  • add assessment library
  • assessment API disabled using a feature flag
  • add separate health gRPC API
  • add GHA unit test report
  • add constants for all endpoint names
  • add assessment engine gRPC API
  • add new assessment scopes
  • update webhook library to push assessment events
  • add initial set of assessment integration tests

@codecov-commenter
Copy link

codecov-commenter commented Aug 15, 2024

Codecov Report

Attention: Patch coverage is 43.46591% with 597 lines in your changes missing coverage. Please review.

Project coverage is 56.68%. Comparing base (d9e928d) to head (ec397d3).

Files Patch % Lines
...Assessment/Services/AssessmentPlatformServiceV1.cs 0.00% 224 Missing ⚠️
...al/src/Serval.Assessment/Services/EngineService.cs 38.20% 105 Missing and 5 partials ⚠️
...essment/Controllers/AssessmentEnginesController.cs 67.22% 70 Missing and 8 partials ⚠️
...erval/src/Serval.Assessment/Services/JobService.cs 7.50% 37 Missing ⚠️
...ebhooks/Consumers/AssessmentJobFinishedConsumer.cs 0.00% 29 Missing ⚠️
...Webhooks/Consumers/AssessmentJobStartedConsumer.cs 0.00% 26 Missing ⚠️
...sessment/Configuration/IServalBuilderExtensions.cs 35.48% 16 Missing and 4 partials ⚠️
src/Serval/src/Serval.ApiServer/Startup.cs 77.35% 11 Missing and 1 partial ⚠️
...l.Machine.Shared/Services/ServalHealthServiceV1.cs 0.00% 7 Missing ⚠️
...c/Serval.Shared/Contracts/AssessmentJobFinished.cs 0.00% 7 Missing ⚠️
... and 27 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #455      +/-   ##
==========================================
- Coverage   61.16%   56.68%   -4.48%     
==========================================
  Files         240      275      +35     
  Lines       11999    14115    +2116     
  Branches     1520     1895     +375     
==========================================
+ Hits         7339     8001     +662     
- Misses       4132     5529    +1397     
- Partials      528      585      +57     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@johnml1135 johnml1135 self-requested a review August 16, 2024 22:31
@johnml1135
Copy link
Collaborator

Only 85 files changed? Easy peasy. I'll get to this early next week.

@johnml1135
Copy link
Collaborator

It would be nice to have a 5-10 point list of what was changed.

@johnml1135
Copy link
Collaborator

.github/workflows/ci.yml line 42 at r1 (raw file):

          name: NUnit Tests
          path: src/**/TestResults/test-results.trx
          reporter: dotnet-trx

Ah - https://github.com/dorny/test-reporter.

@johnml1135
Copy link
Collaborator

src/Machine/src/Serval.Machine.Shared/Services/ServalHealthServiceV1.cs line 4 at r1 (raw file):

using Serval.Health.V1;

namespace Serval.Machine.Shared.Services;

Should this be in the new service toolkit?

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Configuration/IMongoDataAccessConfiguratorExtensions.cs line 14 at r1 (raw file):

            {
                await c.Indexes.CreateOrUpdateAsync(
                    new CreateIndexModel<Engine>(Builders<Engine>.IndexKeys.Ascending(e => e.Owner))

We will be looking up engines by ID. The engine ID should be a second index.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Configuration/IMongoDataAccessConfiguratorExtensions.cs line 36 at r1 (raw file):

                );
                await c.Indexes.CreateOrUpdateAsync(
                    new CreateIndexModel<Result>(Builders<Result>.IndexKeys.Ascending(pt => pt.TextId))

Will we want to also be getting verses by reference? Will there be corpora defined as well? pretranslations have more indexing specified.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Consumers/DataFileDeletedConsumer.cs line 3 at r1 (raw file):

namespace Serval.Assessment.Consumers;

public class DataFileDeletedConsumer(IEngineService engineService) : IConsumer<DataFileDeleted>

Could this be shared?

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Contracts/AssessmentCorpusConfigDto.cs line 13 at r1 (raw file):

    /// The language tag.
    /// </summary>
    public required string Language { get; init; }

So, a corpus will only be one language? Wouldn't they need to be two languages? How flexible are we imagining this API to be? I could imagine different types of assessments needing different input corpora - as well as referencing other engines for AQUA.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Contracts/AssessmentCorpusConfigDto.cs line 13 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

So, a corpus will only be one language? Wouldn't they need to be two languages? How flexible are we imagining this API to be? I could imagine different types of assessments needing different input corpora - as well as referencing other engines for AQUA.

I see how languages are handled - using an optional second corpora. That sounds acceptable.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Contracts/AssessmentJobConfigDto.cs line 14 at r1 (raw file):

    /// }
    /// </example>
    public object? Options { get; init; }

I assume that referencing other jobs from other engines would be done in these options for AQUA assessments.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Contracts/AssessmentResultDto.cs line 6 at r1 (raw file):

{
    public required string TextId { get; init; }
    public required string Ref { get; init; }

Could any more of this be shared with the Machine engines?

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Controllers/AssessmentEnginesController.cs line 446 at r1 (raw file):

    /// <response code="503">A necessary service is currently unavailable. Check `/health` for more details.</response>
    [Authorize(Scopes.ReadAssessmentEngines)]
    [HttpGet("{id}/jobs/{jobId}/results")]

You are organizing results by job instead of by engine and corpus (as with pretranslations). Can we reasonably assume that people would want legacy results? Wouldn't the tracking, correlation resolution of assessment results be the responsibility of the client?
If we want to correlate different sets of results, what would be the best way to present the correlated results to the user? Wouldn't it be to have a unique ID for the result that is preserved with updates? Would we want to say "this was there before, but it no longer applies"? Would we update an existing record or make a new one?

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much overlap/separation is there meant to be between the assessment pipeline and the translation pipeline?

Also, are there plans to make an issue to track adding additional testing? How soon do we expect this to be added to production? Is it worth exploring ignoring the assessment code for code coverage for now?

Reviewed 85 of 85 files at r1, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @ddaspit and @johnml1135)


.github/workflows/ci.yml line 42 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Ah - https://github.com/dorny/test-reporter.

Nice addition!


src/Serval/src/Serval.Shared/Controllers/Endpoints.cs line 5 at r1 (raw file):

public static class Endpoints
{
    public const string GetDataFile = "GetDataFile";

Very nice

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Models/Result.cs line 11 at r1 (raw file):

    public required string TextId { get; init; }
    public required string Ref { get; init; }
    public double? Score { get; init; }

Is this what we really want? I think when we were looking at this before, I recommended using something fasioned after the language server's diagnostic message. Severity, range, code, description, tags, data, etc.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Services/AssessmentPlatformServiceV1.cs line 51 at r1 (raw file):

            },
            cancellationToken: context.CancellationToken
        );

Is there any way that more of this could be shared with Machine? Could a ServiceBase class be created that is shared? There is so much that overlaps.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.DataFiles/Services/DataFileService.cs line 37 at r1 (raw file):

    }

    public async Task<IEnumerable<DataFile>> GetAllAsync(string owner, CancellationToken cancellationToken = default)

Do we want a cleanup function "show me everything that is not connected to a corpus"? Removing this endpoint disables cleanup of hanging files. If a person loses the id of the file, they lose it completely.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Grpc/Protos/serval/assessment/v1/platform.proto line 22 at r1 (raw file):

    optional double percent_completed = 2;
    optional string message = 3;
}

Is there anything tiered? Can't we share using the ServiceToolkit?

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Grpc/Protos/serval/translation/v1/platform.proto line 55 at r1 (raw file):

}

message DeleteAllPretranslationsRequest {

See other comment.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Shared/Contracts/AssessmentJobFinished.cs line 9 at r1 (raw file):

    public required string Owner { get; init; }
    public required JobState JobState { get; init; }
    public required string Message { get; init; }

Why is the message here, but not in the Dto?

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Shared/Services/GrpcServiceHealthCheck.cs line 17 at r1 (raw file):

            $"{context.Registration.Name}-Health"
        );
        HealthCheckResponse? healthCheckResponse = await client.HealthCheckAsync(

Ah, some Grpc stuff is in shared now.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Webhooks/Services/WebhookService.cs line 9 at r1 (raw file):

    private readonly IBackgroundJobClient _jobClient = jobClient;

    public async Task<IEnumerable<Webhook>> GetAllAsync(string owner, CancellationToken cancellationToken = default)

Same with files - are we sure we want to have dangling webhooks? Will they auto-die?

@johnml1135
Copy link
Collaborator

We should likely have an issue for finishing this, detailing all of what would be needed - including testing and the first integration.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 85 of 85 files at r1, all commit messages.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @ddaspit)

Copy link
Contributor Author

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @johnml1135)


src/Machine/src/Serval.Machine.Shared/Services/ServalHealthServiceV1.cs line 4 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Should this be in the new service toolkit?

No, the service toolkit wouldn't be appropriate, but it could be moved to the Serval.Grpc library.


src/Serval/src/Serval.Assessment/Configuration/IMongoDataAccessConfiguratorExtensions.cs line 14 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

We will be looking up engines by ID. The engine ID should be a second index.

The id property automatically has an index.


src/Serval/src/Serval.Assessment/Configuration/IMongoDataAccessConfiguratorExtensions.cs line 36 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Will we want to also be getting verses by reference? Will there be corpora defined as well? pretranslations have more indexing specified.

Right now, we only support filtering by book. We can always add more indexes when we add more filtering options.


src/Serval/src/Serval.Assessment/Consumers/DataFileDeletedConsumer.cs line 3 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Could this be shared?

No, this is specific to the assessment API. It depends on the assessment engine service.


src/Serval/src/Serval.Assessment/Contracts/AssessmentJobConfigDto.cs line 14 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

I assume that referencing other jobs from other engines would be done in these options for AQUA assessments.

Right now, I haven't designed it to reference engines from other domains. I would like to understand better why we would need to do that and how it might work before we design the interaction.


src/Serval/src/Serval.Assessment/Contracts/AssessmentResultDto.cs line 6 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Could any more of this be shared with the Machine engines?

I would like to keep the domains separate so they are free to evolve separately from each other.


src/Serval/src/Serval.Assessment/Controllers/AssessmentEnginesController.cs line 446 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

You are organizing results by job instead of by engine and corpus (as with pretranslations). Can we reasonably assume that people would want legacy results? Wouldn't the tracking, correlation resolution of assessment results be the responsibility of the client?
If we want to correlate different sets of results, what would be the best way to present the correlated results to the user? Wouldn't it be to have a unique ID for the result that is preserved with updates? Would we want to say "this was there before, but it no longer applies"? Would we update an existing record or make a new one?

Those are good questions. To tell you the truth, I don't know exactly how the API will be used yet. I am not far enough into designing Lynx to know. I want to have something in place, so that the AQuA work isn't blocked and I have something to work with when implementing Lynx. Initially, I am taking cues from the AQuA API, but I fully expect that we will be making changes to it. The nice thing about this design is that we don't have to worry about only allowing one job to run at a time.


src/Serval/src/Serval.Assessment/Models/Result.cs line 11 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Is this what we really want? I think when we were looking at this before, I recommended using something fasioned after the language server's diagnostic message. Severity, range, code, description, tags, data, etc.

This model is certainly not finalized yet. I'm not sure exactly what will be needed. Properties like severity are Lynx concepts and not AQuA concepts, so those probably won't be added.


src/Serval/src/Serval.Assessment/Services/AssessmentPlatformServiceV1.cs line 51 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Is there any way that more of this could be shared with Machine? Could a ServiceBase class be created that is shared? There is so much that overlaps.

There are similarities, but they are different domains with different data models, so I'm not what more could be shared.


src/Serval/src/Serval.DataFiles/Services/DataFileService.cs line 37 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Do we want a cleanup function "show me everything that is not connected to a corpus"? Removing this endpoint disables cleanup of hanging files. If a person loses the id of the file, they lose it completely.

This method was not removed. It was moved to the base class.


src/Serval/src/Serval.Grpc/Protos/serval/assessment/v1/platform.proto line 22 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Is there anything tiered? Can't we share using the ServiceToolkit?

I am intentionally keeping the APIs for different domains separate.


src/Serval/src/Serval.Shared/Contracts/AssessmentJobFinished.cs line 9 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Why is the message here, but not in the Dto?

I forgot to update the webhook library to consume the event.


src/Serval/src/Serval.Webhooks/Services/WebhookService.cs line 9 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Same with files - are we sure we want to have dangling webhooks? Will they auto-die?

Same as with the data files, the method is now defined in the base class.


src/Serval/src/Serval.Shared/Services/GrpcServiceHealthCheck.cs line 17 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Ah, some Grpc stuff is in shared now.

I created a separate health API that all engines can use.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Configuration/IMongoDataAccessConfiguratorExtensions.cs line 14 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

The id property automatically has an index.

Ah - I just forgot. This is to add other indexes from the default.

@johnml1135
Copy link
Collaborator

src/Machine/src/Serval.Machine.Shared/Services/ServalHealthServiceV1.cs line 4 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

No, the service toolkit wouldn't be appropriate, but it could be moved to the Serval.Grpc library.

Nevermind - it's already shared. I was mostly not wanting it to be duplicated.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Configuration/IMongoDataAccessConfiguratorExtensions.cs line 36 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Right now, we only support filtering by book. We can always add more indexes when we add more filtering options.

True.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Consumers/DataFileDeletedConsumer.cs line 3 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

No, this is specific to the assessment API. It depends on the assessment engine service.

Corpora are handled significantly differently - I see it now.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Contracts/AssessmentJobConfigDto.cs line 14 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Right now, I haven't designed it to reference engines from other domains. I would like to understand better why we would need to do that and how it might work before we design the interaction.

I think the build options is flexible enough to handle future requirements.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Controllers/AssessmentEnginesController.cs line 446 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Those are good questions. To tell you the truth, I don't know exactly how the API will be used yet. I am not far enough into designing Lynx to know. I want to have something in place, so that the AQuA work isn't blocked and I have something to work with when implementing Lynx. Initially, I am taking cues from the AQuA API, but I fully expect that we will be making changes to it. The nice thing about this design is that we don't have to worry about only allowing one job to run at a time.

For any testing or diagnostics done in VSCode or any other platform, I have not seen history on diagnostics. The one exception is baselining for regulated industries, which may want to be done with Bible translation, to work off the issues. For that though, the diagnostics would be stored in a separate system to make sure that they are actioned, not the original test platform. I think it adds an unneeded feature and would recommend it being removed.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Assessment/Models/Result.cs line 11 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This model is certainly not finalized yet. I'm not sure exactly what will be needed. Properties like severity are Lynx concepts and not AQuA concepts, so those probably won't be added.

How would Lynx determine severity? It would need data fed up to it from the lower levels. If we want to work on it more, let's open an issue for "finishing Assessment API" and adding these issues including test coverage.

@johnml1135
Copy link
Collaborator

src/Serval/src/Serval.Shared/Contracts/AssessmentJobFinished.cs line 9 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I forgot to update the webhook library to consume the event.

I'll wait for the update then.

Copy link
Contributor Author

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can think of this PR as a "partial" or "in-development" implementation of the assessment API to enable Lynx development. This way I don't have to continue to maintain a long-lived development branch. Once we have a clearer idea of what more will need to be done, we can open follow-up issues.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @johnml1135)


src/Serval/src/Serval.Assessment/Models/Result.cs line 11 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

How would Lynx determine severity? It would need data fed up to it from the lower levels. If we want to work on it more, let's open an issue for "finishing Assessment API" and adding these issues including test coverage.

Serval would be responsible for generating the assessment results. Lynx would be responsible for "interpreting" the assessment results. Severity might be determined by assessment type or score threshold.

I agree that we should open further issues to finish up the work, but I would like to hold off on doing that for now until we have a better idea of what we will need.

- add assessment library
- assessment API disabled using a feature flag
- add separate health gRPC API
- add GHA unit test report
- add constants for all endpoint names
- add assessment engine gRPC API
- add new assessment scopes
- update webhook library to push assessment events
- add initial set of assessment integration tests
Copy link
Contributor Author

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a list of changes to the commit message.

Reviewable status: 79 of 90 files reviewed, 17 unresolved discussions (waiting on @Enkidu93 and @johnml1135)


src/Serval/src/Serval.Shared/Contracts/AssessmentJobFinished.cs line 9 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

I'll wait for the update then.

Done.

@johnml1135
Copy link
Collaborator

-- commits line 7 at r2:
This is exactly what I was looking for! Thank you.

@johnml1135
Copy link
Collaborator

Good enough for this stage.

@johnml1135
Copy link
Collaborator

:lgtm:

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r2, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ddaspit)

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @Enkidu93 from a discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135 johnml1135 merged commit 0d33b4f into main Aug 21, 2024
3 checks passed
@johnml1135 johnml1135 deleted the assessment-api branch August 21, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants