feat: SearchFlatRuns api call for flat runs table support #8852

AmanuelAaron · 2024-02-16T18:27:27Z

Description

Creating SearchFlatRuns endpoint to support the flat runs table

PRD

Test Plan

Api is not exposed to UI or CLI in this PR, integration tests should pass

Checklist

Changes have been manually QA'd
User-facing API changes need the "User-facing API Change" label.
Release notes should be added as a separate file under docs/release-notes/.
See Release Note for details.
Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

ET-14

codecov · 2024-02-16T18:27:42Z

Codecov Report

Attention: Patch coverage is 66.75192% with 130 lines in your changes are missing coverage. Please review.

Project coverage is 47.70%. Comparing base (674cd73) to head (a0b9bae).
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8852      +/-   ##
==========================================
+ Coverage   47.63%   47.70%   +0.06%     
==========================================
  Files        1159     1159              
  Lines      142794   143165     +371     
  Branches     2338     2338              
==========================================
+ Hits        68020    68292     +272     
- Misses      74621    74720      +99     
  Partials      153      153

Flag	Coverage Δ
backend	`42.91% <89.53%> (+0.33%)`	⬆️
harness	`63.85% <22.04%> (-0.14%)`	⬇️
web	`40.64% <33.33%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
webui/react/src/ioTypes.ts	`90.88% <100.00%> (+0.05%)`	⬆️
...act/src/pages/F_ExpList/glide-table/GlideTable.tsx	`16.15% <0.00%> (-0.08%)`	⬇️
master/internal/experiment_filter.go	`95.14% <92.10%> (-0.77%)`	⬇️
master/internal/api_runs.go	`86.50% <87.50%> (+7.55%)`	⬆️
harness/determined/common/api/bindings.py	`40.11% <22.04%> (-0.23%)`	⬇️

... and 4 files with indirect coverage changes

netlify · 2024-02-16T18:27:46Z

✅ Deploy Preview for determined-ui canceled.

Name	Link
🔨 Latest commit	`a0b9bae`
🔍 Latest deploy log	https://app.netlify.com/sites/determined-ui/deploys/65f9c3cebadc640008c64bf7

NicholasBlaskey

Is it possible we start from very few columns then add more as use cases are developed? Copying things from experiments I think will make this endpoint bloated and complicated

master/internal/experiment_filter.go

NicholasBlaskey · 2024-02-22T18:11:31Z

master/internal/experiment_filter.go

+
+	filterExperimentColMap := map[string]string{
+		"id":                   "e.id",
+		"description":          "e.config->>'description'",


some runs won't have associated experiments, I don't think we can except to use e.config

Using LEFT JOIN for this so that runs without it will just have NULL. I believe frontend and simply block actions that require this data if the experiment id field is NULL. @ashtonG could this work just like what fontend does with 'unmanaged' experiments

yes for individual actions, but no on bulk actions. in situations where the user is operating off of a "select all" selection, the runs in the set will possibly include runs that haven't been paginated in yet, so we wouldn't know if the experiment id field is null or not.

master/internal/experiment_filter.go

NicholasBlaskey · 2024-02-22T18:14:11Z

master/internal/experiment_filter.go

+		"progress":             "ROUND(COALESCE(progress, 0) * 100)::INTEGER", // multiply by 100 for percent
+		"user":                 "e.owner_id",
+		"forkedFrom":           "e.parent_id",
+		"resourcePool":         "e.config->'resources'->>'resource_pool'",


what does it even mean for a run to have a resource pool?

We still want the fields for experiments to be in this table. They won't be shown by default but can be available for the user. All this field would mean is that "the experiment of this run has a resource pool of name

This way of writing this API makes runs without an experiment feel like some they are missing some features.

I guess what I'm saying is the runs and tasks split is like trying to solve a big issue in our system which is the interconnected nature of everything. It feels like having resource_pool is like clawing onto this idea that runs and execution are still coupled. A run is just a collection of metadata. Resource pools belong to tasks.

NicholasBlaskey · 2024-02-22T18:16:25Z

master/internal/experiment_filter.go

@@ -348,7 +382,12 @@ func (e experimentFilter) toSQL(q *bun.SelectQuery,
 		}
 		switch location {
 		case projectv1.LocationType_LOCATION_TYPE_EXPERIMENT.String():


reusing projectv1.LocationType_LOCATION_TYPE_EXPERIMENT feels a little strange to me since the location is runs not experiments

but I'm not 100% up to date on get project columns and how the webui uses this endpoint exactly

Created new location type for Runs and Run Hyperparameter

is a separate change updating project columns?

proto/src/determined/trial/v1/trial.proto

NicholasBlaskey · 2024-02-22T18:24:26Z

master/internal/api_runs_intg_test.go

+	"github.com/determined-ai/determined/proto/pkg/apiv1"
+)
+
+func TestSearchRuns(t *testing.T) {


can we test that every field can be filtered on?

Added tests for all filter types (isEmpty, contains, etc...) as well as test for standard columns & hyperparameters

proto/src/determined/api/v1/api.proto

NicholasBlaskey · 2024-02-22T18:29:54Z

master/internal/api_runs.go

+		query.OrderExpr("id ASC")
+	}
+
+	pagination, err := runPagedBunExperimentsQuery(ctx, query, int(req.Offset), int(req.Limit))


this is another point that also applies to SearchExpeirments

but allow filtering on any field effectively means some of these columns won't have indexes on them leading to poor performance for larger customers. Is there any plans in place to mitigate this?

Currently no -- as you noted, theoretically any column, metric, or hyperparameter can act as a sort or filter. furthermore, those sorts and filters can be on multiple columns, meaning there isn't a straightforward set of indices that will satisfy the product needs while also allowing for performance.

Do we need to change the product needs then?

@mackrorysd what do you think?

NicholasBlaskey · 2024-03-01T16:28:48Z

master/internal/api_runs.go

+func (a *apiServer) SearchRuns(
+	ctx context.Context, req *apiv1.SearchRunsRequest,
+) (*apiv1.SearchRunsResponse, error) {
+	resp := &apiv1.SearchRunsResponse{Runs: []*runv1.FlatRun{}}


nit: declare variables closer to when they are used

Its kinda weird how we have curUser after these declarations

Not sure what you mean here. resp, curUser, and query are all used in the following lines for getting the project and doing the auth check

I think it just looks off to me since most other API calls start with getting the user

I was following the same order that exists in SearchExperiments. I can move the get user stuff to the top though

NicholasBlaskey · 2024-03-01T16:32:17Z

master/internal/api_runs.go

+		Column("r.checkpoint_count").
+		Column("r.external_run_id").
+		ColumnExpr(
+			"(SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) AS num_runs").


this is a weird column to me. Is there a product ask around it?

This was initially done so that frontend could know if the run is part of a multi-trial experiment. Instead I have renamed and turned the val into a bool, where it is true if > 1. Makes the field easier to understand and moves the logic off the frontend

NicholasBlaskey · 2024-03-01T16:34:52Z

master/internal/api_runs.go

+		ColumnExpr("COALESCE(u.display_name, u.username) as display_name").
+		ColumnExpr("r.hparams AS hyperparameters").
+		ColumnExpr("r.summary_metrics AS summary_metrics").
+		ColumnExpr("CASE WHEN e.parent_id IS NULL THEN NULL ELSE " +


this is confusing to me

Like are we returning an experiment ID as the column forked from? Someone might assume that it is a runID that is forked from. Do we have a specific product ask for this column?

In general we want all experiment fields available except ones that deal with "best trial"

master/internal/api_runs.go

master/internal/api_runs_intg_test.go

NicholasBlaskey · 2024-03-01T16:59:09Z

master/internal/experiment_filter.go

+		"progress":             "ROUND(COALESCE(progress, 0) * 100)::INTEGER", // multiply by 100 for percent
+		"user":                 "e.owner_id",
+		"forkedFrom":           "e.parent_id",
+		"resourcePool":         "e.config->'resources'->>'resource_pool'",


This way of writing this API makes runs without an experiment feel like some they are missing some features.

I guess what I'm saying is the runs and tasks split is like trying to solve a big issue in our system which is the interconnected nature of everything. It feels like having resource_pool is like clawing onto this idea that runs and execution are still coupled. A run is just a collection of metadata. Resource pools belong to tasks.

NicholasBlaskey · 2024-03-01T21:36:02Z

proto/src/determined/run/v1/run.proto

+  // summary metrics.
+  optional google.protobuf.Struct summary_metrics = 13;
+  // The id of the experiment linked to the run.
+  optional int32 experiment_id = 14;


I talked to Ilia and he mentioned that having all these fields is a compromise and that can later be removed.

What if we put all experiment related fields in a nested message like optional experimentType experiment so it clear from the API what is on the run and what is on the experiment. We could even reuse the experiment proto type.

EmilyBonar

Some notes on the type interface. I'd also like to request that any parameters that can be made required should be, that makes using the resulting type much easier

webui/react/src/services/api-ts-sdk/api.ts

EmilyBonar

Api return type looks good, thanks for working with me on it!

webui/react/src/services/api-ts-sdk/api.ts

NicholasBlaskey

Have a few comments but looks good to me. Thanks for changing the proto design.

NicholasBlaskey · 2024-03-18T19:49:41Z

master/internal/api_runs.go

+func (a *apiServer) SearchRuns(
+	ctx context.Context, req *apiv1.SearchRunsRequest,
+) (*apiv1.SearchRunsResponse, error) {
+	resp := &apiv1.SearchRunsResponse{Runs: []*runv1.FlatRun{}}


I think it just looks off to me since most other API calls start with getting the user

master/internal/api_runs.go

NicholasBlaskey · 2024-03-18T20:00:08Z

master/internal/experiment_filter.go

+	for i := 0; i < len(hp); i++ {
+		// for last element
+		if i == len(hp)-1 {
+			hp[i] = fmt.Sprintf(`>'%s'`, hp[i])


this makes me unconformable for potential of sql injection

can we write this as a parametrized query?

Updated with ?

proto/src/determined/run/v1/run.proto

NicholasBlaskey · 2024-03-18T20:08:23Z

master/internal/experiment_filter.go

@@ -348,7 +382,12 @@ func (e experimentFilter) toSQL(q *bun.SelectQuery,
 		}
 		switch location {
 		case projectv1.LocationType_LOCATION_TYPE_EXPERIMENT.String():


is a separate change updating project columns?

master/internal/experiment_filter.go

proto/src/determined/run/v1/run.proto

* init * basic request * order + sort by * addtional filters * linting * update api pb * bindings lint * get -> search * add intg test * fix columns * swap columns * fix query and update to flatruns * filters for hyper params + improve filter logic * move flat runs * add filter test * small fixes * more location fixes * update proto gen * format tsx file * lint * tsx lint * add num runs * nested hyper parameters * summary metrics struct * add test cases for all filter types * return error for glide table * add performance tests * column fixes * fix param injection * update return vals * update tags to labels from experiments * remove username field * remove display name field * owner_id to user_id * make project & workspace id/name required * make parent archived required * add minimanl experiment in return * lint * update after merge * update proto fields * lint proto * make experiment fields required in experiment proto * fixes * fix sort sql volnurability

AmanuelAaron added 2 commits February 16, 2024 09:53

init

dc052e0

basic request

4ac3696

cla-bot bot added the cla-signed label Feb 16, 2024

AmanuelAaron and others added 8 commits February 16, 2024 13:58

order + sort by

4263146

addtional filters

c2e0857

linting

3086c8f

Merge branch 'main' into get-runs

dd9d486

update api pb

a063d6e

bindings lint

d3039da

get -> search

aa95c84

add intg test

0a44bb4

AmanuelAaron requested review from NicholasBlaskey and ashtonG February 20, 2024 19:54

AmanuelAaron marked this pull request as ready for review February 20, 2024 20:21

AmanuelAaron requested a review from a team as a code owner February 20, 2024 20:21

AmanuelAaron added 2 commits February 21, 2024 11:29

fix columns

ce3893f

swap columns

314d44f

NicholasBlaskey reviewed Feb 22, 2024

View reviewed changes

AmanuelAaron added 4 commits February 23, 2024 11:19

fix query and update to flatruns

45dfd39

filters for hyper params + improve filter logic

afbcf97

move flat runs

3fdf9f2

add filter test

7632812

AmanuelAaron assigned corban-beaird Feb 27, 2024

small fixes

ee8a4fe

AmanuelAaron requested a review from a team as a code owner February 27, 2024 15:16

AmanuelAaron added 2 commits February 27, 2024 10:35

more location fixes

3bca919

update proto gen

cb5ea71

add performance tests

c5d1fa4

NicholasBlaskey reviewed Mar 1, 2024

View reviewed changes

column fixes

d38c961

NicholasBlaskey reviewed Mar 1, 2024

View reviewed changes

fix param injection

e6f95ee

EmilyBonar reviewed Mar 13, 2024

View reviewed changes

AmanuelAaron force-pushed the get-runs branch from 7759f1b to e6f95ee Compare March 13, 2024 19:30

update return vals

6684cda

AmanuelAaron changed the title ~~feat: Get Runs api call for flat runs table support~~ feat: SearchFlatRuns api call for flat runs table support Mar 13, 2024

AmanuelAaron added 6 commits March 14, 2024 15:17

update tags to labels from experiments

8d72386

remove username field

52476db

remove display name field

3d08e46

owner_id to user_id

d2eafff

make project & workspace id/name required

3cddf6b

make parent archived required

34daed5

EmilyBonar approved these changes Mar 18, 2024

View reviewed changes

AmanuelAaron added 7 commits March 18, 2024 15:02

add minimanl experiment in return

b844fd5

lint

c98fe20

Merge branch 'main' into get-runs

d1eb522

update after merge

9bfc5bf

update proto fields

6c3ee9f

lint proto

676d8e7

make experiment fields required in experiment proto

f5b2ff1

EmilyBonar reviewed Mar 18, 2024

View reviewed changes

webui/react/src/services/api-ts-sdk/api.ts Show resolved Hide resolved

NicholasBlaskey approved these changes Mar 18, 2024

View reviewed changes

AmanuelAaron added 2 commits March 19, 2024 09:56

fixes

cbd184e

fix sort sql volnurability

a0b9bae

AmanuelAaron merged commit 93b6aa2 into main Mar 19, 2024
69 of 81 checks passed

AmanuelAaron deleted the get-runs branch March 19, 2024 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SearchFlatRuns api call for flat runs table support #8852

feat: SearchFlatRuns api call for flat runs table support #8852

AmanuelAaron commented Feb 16, 2024 •

edited

Loading

codecov bot commented Feb 16, 2024 •

edited

Loading

netlify bot commented Feb 16, 2024 •

edited

Loading

NicholasBlaskey left a comment

NicholasBlaskey Feb 22, 2024

AmanuelAaron Feb 26, 2024

ashtonG Feb 28, 2024

NicholasBlaskey Feb 22, 2024

AmanuelAaron Feb 26, 2024

NicholasBlaskey Mar 1, 2024

NicholasBlaskey Feb 22, 2024

AmanuelAaron Feb 26, 2024

NicholasBlaskey Mar 18, 2024

NicholasBlaskey Feb 22, 2024

AmanuelAaron Feb 28, 2024

NicholasBlaskey Feb 22, 2024

ashtonG Feb 28, 2024

NicholasBlaskey Feb 29, 2024

NicholasBlaskey Mar 1, 2024

AmanuelAaron Mar 1, 2024

NicholasBlaskey Mar 18, 2024

AmanuelAaron Mar 19, 2024

NicholasBlaskey Mar 1, 2024

AmanuelAaron Mar 1, 2024

NicholasBlaskey Mar 1, 2024

AmanuelAaron Mar 13, 2024

NicholasBlaskey Mar 1, 2024

NicholasBlaskey Mar 1, 2024 •

edited

Loading

EmilyBonar left a comment

EmilyBonar left a comment

NicholasBlaskey left a comment

NicholasBlaskey Mar 18, 2024

NicholasBlaskey Mar 18, 2024

AmanuelAaron Mar 19, 2024

NicholasBlaskey Mar 18, 2024

feat: SearchFlatRuns api call for flat runs table support #8852

feat: SearchFlatRuns api call for flat runs table support #8852

Conversation

AmanuelAaron commented Feb 16, 2024 • edited Loading

Description

Test Plan

Checklist

Ticket

codecov bot commented Feb 16, 2024 • edited Loading

Codecov Report

netlify bot commented Feb 16, 2024 • edited Loading

✅ Deploy Preview for determined-ui canceled.

NicholasBlaskey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicholasBlaskey Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

EmilyBonar left a comment

Choose a reason for hiding this comment

EmilyBonar left a comment

Choose a reason for hiding this comment

NicholasBlaskey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmanuelAaron commented Feb 16, 2024 •

edited

Loading

codecov bot commented Feb 16, 2024 •

edited

Loading

netlify bot commented Feb 16, 2024 •

edited

Loading

NicholasBlaskey Mar 1, 2024 •

edited

Loading