Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: SearchFlatRuns api call for flat runs table support #8852

Merged
merged 46 commits into from
Mar 19, 2024
Merged

Conversation

AmanuelAaron
Copy link
Contributor

@AmanuelAaron AmanuelAaron commented Feb 16, 2024

Description

Creating SearchFlatRuns endpoint to support the flat runs table

PRD

Test Plan

Api is not exposed to UI or CLI in this PR, integration tests should pass

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

ET-14

@cla-bot cla-bot bot added the cla-signed label Feb 16, 2024
Copy link

codecov bot commented Feb 16, 2024

Codecov Report

Attention: Patch coverage is 66.75192% with 130 lines in your changes are missing coverage. Please review.

Project coverage is 47.70%. Comparing base (674cd73) to head (a0b9bae).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8852      +/-   ##
==========================================
+ Coverage   47.63%   47.70%   +0.06%     
==========================================
  Files        1159     1159              
  Lines      142794   143165     +371     
  Branches     2338     2338              
==========================================
+ Hits        68020    68292     +272     
- Misses      74621    74720      +99     
  Partials      153      153              
Flag Coverage Δ
backend 42.91% <89.53%> (+0.33%) ⬆️
harness 63.85% <22.04%> (-0.14%) ⬇️
web 40.64% <33.33%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
webui/react/src/ioTypes.ts 90.88% <100.00%> (+0.05%) ⬆️
...act/src/pages/F_ExpList/glide-table/GlideTable.tsx 16.15% <0.00%> (-0.08%) ⬇️
master/internal/experiment_filter.go 95.14% <92.10%> (-0.77%) ⬇️
master/internal/api_runs.go 86.50% <87.50%> (+7.55%) ⬆️
harness/determined/common/api/bindings.py 40.11% <22.04%> (-0.23%) ⬇️

... and 4 files with indirect coverage changes

Copy link

netlify bot commented Feb 16, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit a0b9bae
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/65f9c3cebadc640008c64bf7

@AmanuelAaron AmanuelAaron marked this pull request as ready for review February 20, 2024 20:21
@AmanuelAaron AmanuelAaron requested a review from a team as a code owner February 20, 2024 20:21
Copy link
Contributor

@NicholasBlaskey NicholasBlaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible we start from very few columns then add more as use cases are developed? Copying things from experiments I think will make this endpoint bloated and complicated

master/internal/experiment_filter.go Outdated Show resolved Hide resolved

filterExperimentColMap := map[string]string{
"id": "e.id",
"description": "e.config->>'description'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some runs won't have associated experiments, I don't think we can except to use e.config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using LEFT JOIN for this so that runs without it will just have NULL. I believe frontend and simply block actions that require this data if the experiment id field is NULL. @ashtonG could this work just like what fontend does with 'unmanaged' experiments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes for individual actions, but no on bulk actions. in situations where the user is operating off of a "select all" selection, the runs in the set will possibly include runs that haven't been paginated in yet, so we wouldn't know if the experiment id field is null or not.

master/internal/experiment_filter.go Outdated Show resolved Hide resolved
"progress": "ROUND(COALESCE(progress, 0) * 100)::INTEGER", // multiply by 100 for percent
"user": "e.owner_id",
"forkedFrom": "e.parent_id",
"resourcePool": "e.config->'resources'->>'resource_pool'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it even mean for a run to have a resource pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want the fields for experiments to be in this table. They won't be shown by default but can be available for the user. All this field would mean is that "the experiment of this run has a resource pool of name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of writing this API makes runs without an experiment feel like some they are missing some features.

I guess what I'm saying is the runs and tasks split is like trying to solve a big issue in our system which is the interconnected nature of everything. It feels like having resource_pool is like clawing onto this idea that runs and execution are still coupled. A run is just a collection of metadata. Resource pools belong to tasks.

@@ -348,7 +382,12 @@ func (e experimentFilter) toSQL(q *bun.SelectQuery,
}
switch location {
case projectv1.LocationType_LOCATION_TYPE_EXPERIMENT.String():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reusing projectv1.LocationType_LOCATION_TYPE_EXPERIMENT feels a little strange to me since the location is runs not experiments

but I'm not 100% up to date on get project columns and how the webui uses this endpoint exactly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created new location type for Runs and Run Hyperparameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a separate change updating project columns?

proto/src/determined/trial/v1/trial.proto Outdated Show resolved Hide resolved
proto/src/determined/trial/v1/trial.proto Outdated Show resolved Hide resolved
proto/src/determined/trial/v1/trial.proto Outdated Show resolved Hide resolved
"github.com/determined-ai/determined/proto/pkg/apiv1"
)

func TestSearchRuns(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test that every field can be filtered on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for all filter types (isEmpty, contains, etc...) as well as test for standard columns & hyperparameters

proto/src/determined/api/v1/api.proto Show resolved Hide resolved
query.OrderExpr("id ASC")
}

pagination, err := runPagedBunExperimentsQuery(ctx, query, int(req.Offset), int(req.Limit))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is another point that also applies to SearchExpeirments

but allow filtering on any field effectively means some of these columns won't have indexes on them leading to poor performance for larger customers. Is there any plans in place to mitigate this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently no -- as you noted, theoretically any column, metric, or hyperparameter can act as a sort or filter. furthermore, those sorts and filters can be on multiple columns, meaning there isn't a straightforward set of indices that will satisfy the product needs while also allowing for performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change the product needs then?

@mackrorysd what do you think?

@AmanuelAaron AmanuelAaron requested a review from a team as a code owner February 27, 2024 15:16
func (a *apiServer) SearchRuns(
ctx context.Context, req *apiv1.SearchRunsRequest,
) (*apiv1.SearchRunsResponse, error) {
resp := &apiv1.SearchRunsResponse{Runs: []*runv1.FlatRun{}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: declare variables closer to when they are used

Its kinda weird how we have curUser after these declarations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here. resp, curUser, and query are all used in the following lines for getting the project and doing the auth check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it just looks off to me since most other API calls start with getting the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the same order that exists in SearchExperiments. I can move the get user stuff to the top though

Column("r.checkpoint_count").
Column("r.external_run_id").
ColumnExpr(
"(SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) AS num_runs").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a weird column to me. Is there a product ask around it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was initially done so that frontend could know if the run is part of a multi-trial experiment. Instead I have renamed and turned the val into a bool, where it is true if > 1. Makes the field easier to understand and moves the logic off the frontend

ColumnExpr("COALESCE(u.display_name, u.username) as display_name").
ColumnExpr("r.hparams AS hyperparameters").
ColumnExpr("r.summary_metrics AS summary_metrics").
ColumnExpr("CASE WHEN e.parent_id IS NULL THEN NULL ELSE " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing to me

Like are we returning an experiment ID as the column forked from? Someone might assume that it is a runID that is forked from. Do we have a specific product ask for this column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we want all experiment fields available except ones that deal with "best trial"

master/internal/api_runs.go Outdated Show resolved Hide resolved
master/internal/api_runs.go Outdated Show resolved Hide resolved
master/internal/api_runs.go Outdated Show resolved Hide resolved
master/internal/api_runs.go Show resolved Hide resolved
master/internal/api_runs.go Outdated Show resolved Hide resolved
master/internal/api_runs_intg_test.go Outdated Show resolved Hide resolved
"progress": "ROUND(COALESCE(progress, 0) * 100)::INTEGER", // multiply by 100 for percent
"user": "e.owner_id",
"forkedFrom": "e.parent_id",
"resourcePool": "e.config->'resources'->>'resource_pool'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of writing this API makes runs without an experiment feel like some they are missing some features.

I guess what I'm saying is the runs and tasks split is like trying to solve a big issue in our system which is the interconnected nature of everything. It feels like having resource_pool is like clawing onto this idea that runs and execution are still coupled. A run is just a collection of metadata. Resource pools belong to tasks.

// summary metrics.
optional google.protobuf.Struct summary_metrics = 13;
// The id of the experiment linked to the run.
optional int32 experiment_id = 14;
Copy link
Contributor

@NicholasBlaskey NicholasBlaskey Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to Ilia and he mentioned that having all these fields is a compromise and that can later be removed.

What if we put all experiment related fields in a nested message like optional experimentType experiment so it clear from the API what is on the run and what is on the experiment. We could even reuse the experiment proto type.

Copy link
Contributor

@EmilyBonar EmilyBonar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes on the type interface. I'd also like to request that any parameters that can be made required should be, that makes using the resulting type much easier

webui/react/src/services/api-ts-sdk/api.ts Show resolved Hide resolved
webui/react/src/services/api-ts-sdk/api.ts Show resolved Hide resolved
webui/react/src/services/api-ts-sdk/api.ts Show resolved Hide resolved
webui/react/src/services/api-ts-sdk/api.ts Show resolved Hide resolved
webui/react/src/services/api-ts-sdk/api.ts Outdated Show resolved Hide resolved
webui/react/src/services/api-ts-sdk/api.ts Outdated Show resolved Hide resolved
webui/react/src/services/api-ts-sdk/api.ts Outdated Show resolved Hide resolved
@AmanuelAaron AmanuelAaron changed the title feat: Get Runs api call for flat runs table support feat: SearchFlatRuns api call for flat runs table support Mar 13, 2024
Copy link
Contributor

@EmilyBonar EmilyBonar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Api return type looks good, thanks for working with me on it!

Copy link
Contributor

@NicholasBlaskey NicholasBlaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a few comments but looks good to me. Thanks for changing the proto design.

func (a *apiServer) SearchRuns(
ctx context.Context, req *apiv1.SearchRunsRequest,
) (*apiv1.SearchRunsResponse, error) {
resp := &apiv1.SearchRunsResponse{Runs: []*runv1.FlatRun{}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it just looks off to me since most other API calls start with getting the user

master/internal/api_runs.go Outdated Show resolved Hide resolved
master/internal/api_runs.go Show resolved Hide resolved
master/internal/api_runs.go Outdated Show resolved Hide resolved
for i := 0; i < len(hp); i++ {
// for last element
if i == len(hp)-1 {
hp[i] = fmt.Sprintf(`>'%s'`, hp[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes me unconformable for potential of sql injection

can we write this as a parametrized query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with ?

proto/src/determined/run/v1/run.proto Outdated Show resolved Hide resolved
proto/src/determined/run/v1/run.proto Outdated Show resolved Hide resolved
@@ -348,7 +382,12 @@ func (e experimentFilter) toSQL(q *bun.SelectQuery,
}
switch location {
case projectv1.LocationType_LOCATION_TYPE_EXPERIMENT.String():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a separate change updating project columns?

master/internal/experiment_filter.go Outdated Show resolved Hide resolved
proto/src/determined/run/v1/run.proto Outdated Show resolved Hide resolved
@AmanuelAaron AmanuelAaron merged commit 93b6aa2 into main Mar 19, 2024
69 of 81 checks passed
@AmanuelAaron AmanuelAaron deleted the get-runs branch March 19, 2024 17:16
maxrussell pushed a commit that referenced this pull request Mar 21, 2024
* init

* basic request

* order + sort by

* addtional filters

* linting

* update api pb

* bindings lint

* get -> search

* add intg test

* fix columns

* swap columns

* fix query and update to flatruns

* filters for hyper params + improve filter logic

* move flat runs

* add filter test

* small fixes

* more location fixes

* update proto gen

* format tsx file

* lint

* tsx lint

* add num runs

* nested hyper parameters

* summary metrics struct

* add test cases for all filter types

* return error for glide table

* add performance tests

* column fixes

* fix param injection

* update return vals

* update tags to labels from experiments

* remove username field

* remove display name field

* owner_id to user_id

* make project & workspace id/name required

* make parent archived required

* add minimanl experiment in return

* lint

* update after merge

* update proto fields

* lint proto

* make experiment fields required in experiment proto

* fixes

* fix sort sql volnurability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants