-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: SearchFlatRuns api call for flat runs table support #8852
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8852 +/- ##
==========================================
+ Coverage 47.63% 47.70% +0.06%
==========================================
Files 1159 1159
Lines 142794 143165 +371
Branches 2338 2338
==========================================
+ Hits 68020 68292 +272
- Misses 74621 74720 +99
Partials 153 153
Flags with carried forward coverage won't be shown. Click here to find out more.
|
✅ Deploy Preview for determined-ui canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible we start from very few columns then add more as use cases are developed? Copying things from experiments I think will make this endpoint bloated and complicated
master/internal/experiment_filter.go
Outdated
|
||
filterExperimentColMap := map[string]string{ | ||
"id": "e.id", | ||
"description": "e.config->>'description'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some runs won't have associated experiments, I don't think we can except to use e.config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using LEFT JOIN for this so that runs without it will just have NULL. I believe frontend and simply block actions that require this data if the experiment id field is NULL. @ashtonG could this work just like what fontend does with 'unmanaged' experiments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes for individual actions, but no on bulk actions. in situations where the user is operating off of a "select all" selection, the runs in the set will possibly include runs that haven't been paginated in yet, so we wouldn't know if the experiment id field is null or not.
master/internal/experiment_filter.go
Outdated
"progress": "ROUND(COALESCE(progress, 0) * 100)::INTEGER", // multiply by 100 for percent | ||
"user": "e.owner_id", | ||
"forkedFrom": "e.parent_id", | ||
"resourcePool": "e.config->'resources'->>'resource_pool'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does it even mean for a run to have a resource pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want the fields for experiments to be in this table. They won't be shown by default but can be available for the user. All this field would mean is that "the experiment of this run has a resource pool of name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way of writing this API makes runs without an experiment feel like some they are missing some features.
I guess what I'm saying is the runs and tasks split is like trying to solve a big issue in our system which is the interconnected nature of everything. It feels like having resource_pool
is like clawing onto this idea that runs and execution are still coupled. A run is just a collection of metadata. Resource pools belong to tasks.
@@ -348,7 +382,12 @@ func (e experimentFilter) toSQL(q *bun.SelectQuery, | |||
} | |||
switch location { | |||
case projectv1.LocationType_LOCATION_TYPE_EXPERIMENT.String(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reusing projectv1.LocationType_LOCATION_TYPE_EXPERIMENT
feels a little strange to me since the location is runs not experiments
but I'm not 100% up to date on get project columns and how the webui uses this endpoint exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created new location type for Runs and Run Hyperparameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is a separate change updating project columns?
"github.com/determined-ai/determined/proto/pkg/apiv1" | ||
) | ||
|
||
func TestSearchRuns(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test that every field can be filtered on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests for all filter types (isEmpty, contains, etc...) as well as test for standard columns & hyperparameters
query.OrderExpr("id ASC") | ||
} | ||
|
||
pagination, err := runPagedBunExperimentsQuery(ctx, query, int(req.Offset), int(req.Limit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is another point that also applies to SearchExpeirments
but allow filtering on any field effectively means some of these columns won't have indexes on them leading to poor performance for larger customers. Is there any plans in place to mitigate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently no -- as you noted, theoretically any column, metric, or hyperparameter can act as a sort or filter. furthermore, those sorts and filters can be on multiple columns, meaning there isn't a straightforward set of indices that will satisfy the product needs while also allowing for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change the product needs then?
@mackrorysd what do you think?
master/internal/api_runs.go
Outdated
func (a *apiServer) SearchRuns( | ||
ctx context.Context, req *apiv1.SearchRunsRequest, | ||
) (*apiv1.SearchRunsResponse, error) { | ||
resp := &apiv1.SearchRunsResponse{Runs: []*runv1.FlatRun{}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: declare variables closer to when they are used
Its kinda weird how we have curUser
after these declarations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean here. resp
, curUser
, and query
are all used in the following lines for getting the project and doing the auth check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it just looks off to me since most other API calls start with getting the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the same order that exists in SearchExperiments. I can move the get user stuff to the top though
master/internal/api_runs.go
Outdated
Column("r.checkpoint_count"). | ||
Column("r.external_run_id"). | ||
ColumnExpr( | ||
"(SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) AS num_runs"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a weird column to me. Is there a product ask around it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was initially done so that frontend could know if the run is part of a multi-trial experiment. Instead I have renamed and turned the val into a bool, where it is true if > 1. Makes the field easier to understand and moves the logic off the frontend
master/internal/api_runs.go
Outdated
ColumnExpr("COALESCE(u.display_name, u.username) as display_name"). | ||
ColumnExpr("r.hparams AS hyperparameters"). | ||
ColumnExpr("r.summary_metrics AS summary_metrics"). | ||
ColumnExpr("CASE WHEN e.parent_id IS NULL THEN NULL ELSE " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing to me
Like are we returning an experiment ID as the column forked from? Someone might assume that it is a runID that is forked from. Do we have a specific product ask for this column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we want all experiment fields available except ones that deal with "best trial"
master/internal/experiment_filter.go
Outdated
"progress": "ROUND(COALESCE(progress, 0) * 100)::INTEGER", // multiply by 100 for percent | ||
"user": "e.owner_id", | ||
"forkedFrom": "e.parent_id", | ||
"resourcePool": "e.config->'resources'->>'resource_pool'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way of writing this API makes runs without an experiment feel like some they are missing some features.
I guess what I'm saying is the runs and tasks split is like trying to solve a big issue in our system which is the interconnected nature of everything. It feels like having resource_pool
is like clawing onto this idea that runs and execution are still coupled. A run is just a collection of metadata. Resource pools belong to tasks.
// summary metrics. | ||
optional google.protobuf.Struct summary_metrics = 13; | ||
// The id of the experiment linked to the run. | ||
optional int32 experiment_id = 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to Ilia and he mentioned that having all these fields is a compromise and that can later be removed.
What if we put all experiment related fields in a nested message like optional experimentType experiment
so it clear from the API what is on the run and what is on the experiment. We could even reuse the experiment proto type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes on the type interface. I'd also like to request that any parameters that can be made required should be, that makes using the resulting type much easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Api return type looks good, thanks for working with me on it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a few comments but looks good to me. Thanks for changing the proto design.
master/internal/api_runs.go
Outdated
func (a *apiServer) SearchRuns( | ||
ctx context.Context, req *apiv1.SearchRunsRequest, | ||
) (*apiv1.SearchRunsResponse, error) { | ||
resp := &apiv1.SearchRunsResponse{Runs: []*runv1.FlatRun{}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it just looks off to me since most other API calls start with getting the user
master/internal/experiment_filter.go
Outdated
for i := 0; i < len(hp); i++ { | ||
// for last element | ||
if i == len(hp)-1 { | ||
hp[i] = fmt.Sprintf(`>'%s'`, hp[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me unconformable for potential of sql injection
can we write this as a parametrized query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with ?
@@ -348,7 +382,12 @@ func (e experimentFilter) toSQL(q *bun.SelectQuery, | |||
} | |||
switch location { | |||
case projectv1.LocationType_LOCATION_TYPE_EXPERIMENT.String(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is a separate change updating project columns?
* init * basic request * order + sort by * addtional filters * linting * update api pb * bindings lint * get -> search * add intg test * fix columns * swap columns * fix query and update to flatruns * filters for hyper params + improve filter logic * move flat runs * add filter test * small fixes * more location fixes * update proto gen * format tsx file * lint * tsx lint * add num runs * nested hyper parameters * summary metrics struct * add test cases for all filter types * return error for glide table * add performance tests * column fixes * fix param injection * update return vals * update tags to labels from experiments * remove username field * remove display name field * owner_id to user_id * make project & workspace id/name required * make parent archived required * add minimanl experiment in return * lint * update after merge * update proto fields * lint proto * make experiment fields required in experiment proto * fixes * fix sort sql volnurability
Description
Creating
SearchFlatRuns
endpoint to support the flat runs tablePRD
Test Plan
Api is not exposed to UI or CLI in this PR, integration tests should pass
Checklist
docs/release-notes/
.See Release Note for details.
Ticket
ET-14