Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add archive experiment feature in backend #3359

Merged
merged 46 commits into from
Apr 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
492ee05
add new field in db schema and api schema
jingzhang36 Mar 24, 2020
e5ce3f7
auto genereted types for experiment storage state
jingzhang36 Mar 24, 2020
331b780
add archive and unarchive methods to backend for experiments.
jingzhang36 Mar 24, 2020
4509f15
auto generated archive/unarchive methods for epxeriments
jingzhang36 Mar 24, 2020
eb372ec
add archive and unarchive to client
jingzhang36 Mar 24, 2020
b86fb0a
set proper storage state when creating experiment
jingzhang36 Mar 24, 2020
eccc09a
retrieve storage state when we get/list epxeriment(s)
jingzhang36 Mar 24, 2020
1397b2b
change expection in test to have storage state
jingzhang36 Mar 24, 2020
f460bb8
add storage state in resource manager test
jingzhang36 Mar 24, 2020
7724c50
revise experiemnt server test
jingzhang36 Mar 24, 2020
de59fe5
revise api converter test
jingzhang36 Mar 24, 2020
8b6dc2b
integration test of experiment archive
jingzhang36 Mar 24, 2020
3ec8ccc
archive/unarchive experiment affect the storage state of runs in it
jingzhang36 Mar 24, 2020
1e82eaf
test all the runs in archive/unarchive experiment
jingzhang36 Mar 25, 2020
664b0e2
test all runs are archived/unarchived with their experiment in experi…
jingzhang36 Mar 25, 2020
7c47129
integration test
jingzhang36 Mar 25, 2020
68a42ec
integration test: value type mismatch in assertion
jingzhang36 Mar 25, 2020
3c82eea
unused import; default value for storage state
jingzhang36 Mar 25, 2020
5e5d476
autogen code for frontend
jingzhang36 Mar 25, 2020
2ddd451
reorder the fields in api experiment schema
jingzhang36 Mar 26, 2020
cfb8ea6
switch the position of the two enum to verify a hypothesis
jingzhang36 Mar 26, 2020
d5c7d65
Put a place hodler to prevent any valid item to take the value 0
jingzhang36 Mar 26, 2020
8473c4d
Get rid of the place holder since the cause of issue related to value…
jingzhang36 Mar 26, 2020
8a3a07e
The returned api experiment now has storage state field
jingzhang36 Mar 26, 2020
6851be1
create experiment return doesn't contain storege state
jingzhang36 Mar 26, 2020
9daaa03
Cleanup needs to clean runs and pipeliens now
jingzhang36 Mar 27, 2020
d0b4205
a missing client
jingzhang36 Mar 27, 2020
5cf6f1c
use resource reference as fileter instead of experiment uuid
jingzhang36 Mar 27, 2020
a3c7513
use same namespace in archive unit test
jingzhang36 Mar 27, 2020
d63dbbb
Leave archive/unarchive experiment integration test to a separate PR
jingzhang36 Mar 27, 2020
b79ba43
also need to update jobs when experiments are archived
jingzhang36 Apr 1, 2020
c0cb1b5
Merge remote-tracking branch 'origin/master' into arch_exp
jingzhang36 Apr 1, 2020
c86dc48
Change of unarchiving logic. When experiment is unarchived, jobs/runs in
jingzhang36 Apr 1, 2020
0676ddf
add unit test for the job status in archived/unarchived experiment
jingzhang36 Apr 1, 2020
8d8516d
change archive state to 3 value enum; add experiment integration test
jingzhang36 Apr 2, 2020
4ba3621
make archive state 3 value enum to avoid 0 value mapped to available;…
jingzhang36 Apr 2, 2020
a74bac0
run swagger autogen
jingzhang36 Apr 2, 2020
4888ea7
fix an expected value
jingzhang36 Apr 2, 2020
16d59be
Merge branch 'arch_exp' of github.com:jingzhang36/pipelines into arch…
jingzhang36 Apr 2, 2020
faba821
fix experiment server test
jingzhang36 Apr 2, 2020
9b8fadb
add job check in experiment server test
jingzhang36 Apr 2, 2020
13260b0
update job crds
jingzhang36 Apr 6, 2020
e796869
Merge remote-tracking branch 'upstream/master' into arch_exp
jingzhang36 Apr 7, 2020
018d8d9
fix a typo
jingzhang36 Apr 8, 2020
934c7f9
Merge remote-tracking branch 'origin/master' into arch_exp
jingzhang36 Apr 8, 2020
ed821d0
remove accidentally included irrelevant changes
jingzhang36 Apr 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions backend/api/experiment.proto
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,20 @@ service ExperimentService {
delete: "/apis/v1beta1/experiments/{id}"
};
}

//Archive an experiment.
rpc ArchiveExperiment(ArchiveExperimentRequest) returns (google.protobuf.Empty) {
option (google.api.http) = {
post: "/apis/v1beta1/experiments/{id}:archive"
};
}

//Restore an archived experiment.
rpc UnarchiveExperiment(UnarchiveExperimentRequest) returns (google.protobuf.Empty) {
option (google.api.http) = {
post: "/apis/v1beta1/experiments/{id}:unarchive"
};
}
}

message CreateExperimentRequest {
Expand Down Expand Up @@ -150,4 +164,20 @@ message Experiment {
// Optional input field. Specify which resource this run belongs to.
// For Experiment, the only valid resource reference is a single Namespace.
repeated ResourceReference resource_references = 5;

enum StorageState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, does it make sense to have the default value as STORAGESTATE_UNSPECIFIED = 0;?

Copy link
Contributor Author

@jingzhang36 jingzhang36 Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to do that!
I have been debating with myself since RUN's storage _state has only two enum values (archived and unarchived). But I actually want to have one more enum value like RESERVED or UNSPECIFIED as you mentioned. The reason is that I found the first enum value being 0 will cause some wierd behavior when we return the api type; and thus I want to have archive and unarchive take non-0 values (and i.e., has RESERVED or UNSPECIFIED taking 0 value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we return api type, if something is 0, it will be omitted in returned json. Is that the problem you are describing?

I think our clients should handle that explicitly (we are already doing so). this behavior is specified as the omitempty annotation like in

Enabled bool `json:"enabled,omitempty"`

I think storage state shouldn't have an invalid state (there's no use case), everything should be active or archived.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to know the json omitempty behavior! My understanding is that boolean and generic string typed fields can be omitted because the default value (used when field not explicitly set) happens to be the same as the omitted value and hence the value is correct. Our StorageState type is a enum string which takes two values, e.g., taking Run as an example,


I don't think enum string typed field can be marked omitempty. Something else might be happening here...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM if you decides. Just wondering if you can make STORAGESTATE_AVAILABLE the default value for backfilled DB data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose not to do the data backfill. While manually testing this implementation on a KFP deployment, the existing rows seem to have "" populated in the new column (storage state). So when we retrieve the rows in experiment_store.go, I put a comment on the treatment of "", basically explaining that why we ran into "" in this column (while we shouldn't in a normal case) and how we patch for this legacy value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are still dealing with "" in our code, what has been improved by adding an extra enum value?
Sounds to me keeping the enum as 2 values and deal with "" in clients have fewer changes needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column got a "" value as that's the default for type string. I wonder what was the thought behind storing as string type vs int type (or even tinyint). As far as I can think of, string only benefits readability if one looks directly into the DB, whereas int type can save storage, and it defaults to 0, which will naturally match whatever the default enum value is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, both string and int types seem fine to me. I don't think this is a super critical decision (or life-and-death decision) in our case. E.g., even considering size, our strings are fixed length (since only predefined strings are allowed) and thus not a huge deal. Then why choosing string? To be consistent with Run storage schema. So the later developer might get better code readability (note: not readability of data in db). They'll naturally get the hint that storage state in Run and Experiment are meant to express similar notation. In other words, if Run has this field as int in the first place, int can b used here as well. Again, IMHO, not a crucial enough decision either in terms of performance or in terms of code/data readability, and arguments can be made on either side .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree it's not a super critical decision. And as you mentioned, consistency with existing Run state might be a good reason to stay with string.

STORAGESTATE_UNSPECIFIED = 0;
STORAGESTATE_AVAILABLE = 1;
STORAGESTATE_ARCHIVED = 2;
}

StorageState storage_state = 6;
}

message ArchiveExperimentRequest {
string id = 1;
}

message UnarchiveExperimentRequest {
string id = 1;
}
318 changes: 254 additions & 64 deletions backend/api/go_client/experiment.pb.go

Large diffs are not rendered by default.

102 changes: 102 additions & 0 deletions backend/api/go_client/experiment.pb.gw.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ load("@io_bazel_rules_go//go:def.bzl", "go_library")
go_library(
name = "go_default_library",
srcs = [
"archive_experiment_parameters.go",
"archive_experiment_responses.go",
"create_experiment_parameters.go",
"create_experiment_responses.go",
"delete_experiment_parameters.go",
Expand All @@ -12,6 +14,8 @@ go_library(
"get_experiment_responses.go",
"list_experiment_parameters.go",
"list_experiment_responses.go",
"unarchive_experiment_parameters.go",
"unarchive_experiment_responses.go",
],
importpath = "github.com/kubeflow/pipelines/backend/api/go_http_client/experiment_client/experiment_service",
visibility = ["//visibility:public"],
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading