Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add events to indicate start and end of skaffold dev iterations #4037

Merged

Conversation

tejal29
Copy link
Contributor

@tejal29 tejal29 commented Apr 28, 2020

/cc @quoctruong
Fixes: #4036
Related: Relevant tracking issues, for context
Merge before/after: Dependent or prerequisite PRs

Description

In this PR,
User facing changes (remove if N/A)
No user facing changes.
Tools consuming Event API will see following new events in the sequence.

tejaldesai@tejaldesai-macbookpro2 microservices (add_skaffold_end_event)curl localhost:50052/v1/events
{"result":{"timestamp":"2020-04-28T06:16:30.579021Z","event":{"metaEvent":{"entry":"Starting Skaffold: \u0026{Version:v1.8.0-67-ge5e6a38f3-dirty ConfigVersion:skaffold/v2beta3 GitVersion: GitCommit:e5e6a38f3abd6273ab14b6e7a71c41ec58b8b363 GitTreeState:dirty BuildDate:2020-04-27T23:16:05Z GoVersion:go1.14.1 Compiler:gc Platform:darwin/amd64}","metadata":{"build":{"numberOfArtifacts":2,"builders":[{"type":"DOCKER","count":2}],"type":"LOCAL"},"deploy":{"deployers":[{"type":"KUBECTL","count":1}],"cluster":"GKE"}}}}}}

{"result":{"timestamp":"2020-04-28T06:16:30.579067Z",
   "event":{"devLoopEvent":{"status":"In Progress"}},
  "entry":"Dev Iteration 0 in progress"}}
....

{"result":{"timestamp":"2020-04-28T06:16:35.192693Z",
   "event":{"devLoopEvent":{"status":"Succeeded"}},
   "entry":"Dev Iteration 0 successful"}}

{"result":{"timestamp":"2020-04-28T06:16:59.094247Z",
   "event":{"devLoopEvent":{"iteration":1, "changeType":"BUILD", "status":"In Progress"}},
   "entry":"Dev Iteration 1 in progress"}}

### COMPILE FAILURE - CAUSING A BUILD FAILURE
{"result":{"timestamp":"2020-04-28T06:17:03.748666Z",
   "event":{"buildEvent":{"artifact":"leeroy-app",
    "status":"Failed",
    "err":"unable to stream build output: The command '/bin/sh -c go build -o /app .' returned a 
         non-zero code: 2",
    "errCode":"ErrorCode_UNKNOWN"}},
   "entry":"Build failed for artifact leeroy-app"}}

{"result":{"timestamp":"2020-04-28T06:17:03.748736Z","
   event":{"devLoopEvent":{"iteration":1,"status":"Failed",
   "err":"couldn't build \"leeroy-app\": unable to stream build output: The command '/bin/sh -c go 
                build -o /app .' returned a non-zero code: 2",
   "errCode":"ErrorCode_UNKNOWN"}},
  "entry":"Dev Iteration 1 failed with error code ErrorCode_UNKNOWN"}}


{"result":{"timestamp":"2020-04-28T06:17:51.727711Z",
   "event":{"devLoopEvent":{"iteration":2 "changeType":"BUILD" "status":"In Progress"}},
    "entry":"Dev Iteration 2 in progress"}}
...
{"result":{"timestamp":"2020-04-28T06:18:12.363222Z",
   "event":{"devLoopEvent":{"status":"Succeeded"}},
    "entry":"Dev Iteration 2 successful"}}

{"result":{"timestamp":"2020-04-30T23:49:16.348194Z",
  "event":{
   "endEvent":{
      "errCode":1,"
        loops":[
          {"errCode":"SUCCESS"},   // 0 value for interation and changeType Initial Loop is ommitted in output
         {"iteration":1,  "changeType":"BUILD",  "errCode":"ErrorCode_UNKNOWN"}, 
         {"iteration":2,"changeType":"BUILD","errCode":"SUCCESS"}
        ]}},
     "entry":"Skaffold end event with 3 total dev loops"}}


curl: (18) transfer closed with outstanding read data remaining

Follow-up Work (remove if N/A)

  • [ ]Testing with IDES.

@codecov
Copy link

codecov bot commented Apr 28, 2020

Codecov Report

Merging #4037 into master will increase coverage by 0.03%.
The diff coverage is 84.50%.

Impacted Files Coverage Δ
pkg/skaffold/runner/runner.go 0.00% <ø> (ø)
pkg/skaffold/runner/dev.go 65.11% <62.50%> (-0.38%) ⬇️
pkg/skaffold/errors/errors.go 66.66% <66.66%> (-33.34%) ⬇️
pkg/skaffold/event/event.go 93.69% <100.00%> (+0.56%) ⬆️

pkg/skaffold/event/event.go Outdated Show resolved Hide resolved
proto/skaffold.proto Outdated Show resolved Hide resolved
@@ -344,11 +353,32 @@ enum ErrorCode {
STATUS_CHECK_NODE_NOT_READY= 406;

// Unknown Error Codes
ErrorCode_UNKNOWN = 501;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from COULD_NOT_DETERMINE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COULD_NOT_DETERMINE or 0 value is default enum. In cases where we don't send any error Code the value is 0. To differentiate between when it is empty Vs when we know its an unknown error, i added the field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not enforce that the error-code must be explicitly provided?

proto/skaffold.proto Outdated Show resolved Hide resolved
proto/skaffold.proto Outdated Show resolved Hide resolved
pkg/skaffold/runner/dev.go Outdated Show resolved Hide resolved
func ErrorCodeFromError(_ error, _ phase) proto.ErrorCode {
return proto.ErrorCode_COULD_NOT_DETERMINE
func ErrorCodeFromError(_ error, _ Phase) proto.ErrorCode {
return proto.ErrorCode_ErrorCode_UNKNOWN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COULD_NOT_DETERMINE seems more apt here?

It feels like we should have an error type that carries the ErrorCode inside, rather than returning an error-code and an error. Then we could get rid of STATUS_CHECK_NO_ERROR.

Copy link
Contributor Author

@tejal29 tejal29 Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to add a new proto ErrorDef with those 2 fields. However, we won't be able to change that for other events BuildEvent, DeployEvent , StatusCheckEvent etc since IDEs are already consuming it.

Re: STATUS_CHECK_NO_ERR, it was something we discussed for regarding following HTTP error codes for success. It would also be easy for us to understand metrics if we define a success code vs "0" since 0 is the default value for int type enum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back through this, I'm confused by the intent of this function. It's a placeholder? Can we hook in the pattern-matching aspects of #4045 here?

Or we we could our own error that carries an ErrorCode? Like the os.Err* variables and os.IsExist().

example
type ErrorWithCode interface {
   error
   ErrorCode() ErrorCode
}

type errorWithCode struct {
    errorCode ErrorCode
    err error
}

func (e errorWithCode) ErrorCode() ErrorCode {
  return e.errorCode
}
func (e errorWithCode) Error() string {
  return fmt.Sprintf("%d: %s", e.errorCode, e.err.Error())
}

func NewErrorCode(errorCode ErrorCode, err error) ErrorWithCode {
    return errorWithCode{errorCode: errorCode, err: err}
}

Deploy = Phase("Deploy")
StatusCheck = Phase("StatusCheck")
FileSync = Phase("FileSync")
Dev = Phase("Dev")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Dev really a phase? (We don't actually use Dev from what I can tell?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not a phase, but dev-init is a stage or step in dev loop where errors can happen.
I wanted to capture this "init" step errors.

Shd "DevInit" be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, i can rename "Phase" to "Stage"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! So these are phases of the dev loop? That makes sense, and could you add a comment to that effect please? (It seems mildly odd for this to be defined in skaffold/errors.) I think DevDevInit or just Init makes sense, and perhaps we should have a TearDown too.

(I had wondered if we need to differentiate between dev and debug, but the dev loop is the same so I think it's good.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change Dev -> DevInit
Add TearDown
Add comment saying these are phases in dev loop.

@tejal29 tejal29 force-pushed the add_skaffold_dev_loop_events branch from 581e583 to 6e99b4c Compare April 29, 2020 05:08
@tejal29
Copy link
Contributor Author

tejal29 commented Apr 29, 2020

Thanks a lot for all your feedback @briandealwis . I have addressed most of the comments and refactored method signatures. Defining a new struct ActionableErr makes sense so we skaffold stop (#3614) and dev iteration end event makes sense.

I am wiring up Actionable Error messages here #4045 and will propagate suggestions in the events.
Please take another look!

@tejal29 tejal29 force-pushed the add_skaffold_dev_loop_events branch 3 times, most recently from 5dff70e to 6889a05 Compare April 30, 2020 23:47
Copy link
Member

@briandealwis briandealwis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So metaEvent is the session start event, and endEvent is the session end? Are these supposed to be sent on every command or just the dev loop?

Comment on lines 100 to 101
// The actual error code will be passed in
// https://github.com/GoogleContainerTools/skaffold/pull/4045
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like the right place given that the MetaEvent is raised in runner.NewForConfig(). I believe this will cause things like fix to throw session events too. (Not to mention there's ExactArgs() too.) Can we not have it be thrown at the end of the runner?

(Or if MetaEvent and EndEvent are ok for things like build, deploy, fix, etc. then could we move the start/stop here.)

Deploy = Phase("Deploy")
StatusCheck = Phase("StatusCheck")
FileSync = Phase("FileSync")
Dev = Phase("Dev")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! So these are phases of the dev loop? That makes sense, and could you add a comment to that effect please? (It seems mildly odd for this to be defined in skaffold/errors.) I think DevDevInit or just Init makes sense, and perhaps we should have a TearDown too.

(I had wondered if we need to differentiate between dev and debug, but the dev loop is the same so I think it's good.)

pkg/skaffold/event/event.go Outdated Show resolved Hide resolved
pkg/skaffold/event/event.go Outdated Show resolved Hide resolved
pkg/skaffold/runner/dev.go Outdated Show resolved Hide resolved
proto/skaffold.proto Outdated Show resolved Hide resolved
func ErrorCodeFromError(_ error, _ phase) proto.ErrorCode {
return proto.ErrorCode_COULD_NOT_DETERMINE
func ErrorCodeFromError(_ error, _ Phase) proto.ErrorCode {
return proto.ErrorCode_ErrorCode_UNKNOWN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back through this, I'm confused by the intent of this function. It's a placeholder? Can we hook in the pattern-matching aspects of #4045 here?

Or we we could our own error that carries an ErrorCode? Like the os.Err* variables and os.IsExist().

example
type ErrorWithCode interface {
   error
   ErrorCode() ErrorCode
}

type errorWithCode struct {
    errorCode ErrorCode
    err error
}

func (e errorWithCode) ErrorCode() ErrorCode {
  return e.errorCode
}
func (e errorWithCode) Error() string {
  return fmt.Sprintf("%d: %s", e.errorCode, e.err.Error())
}

func NewErrorCode(errorCode ErrorCode, err error) ErrorWithCode {
    return errorWithCode{errorCode: errorCode, err: err}
}

integration/rpc_test.go Show resolved Hide resolved
integration/rpc_test.go Show resolved Hide resolved
cmd/skaffold/app/cmd/cmd.go Outdated Show resolved Hide resolved
Comment on lines 136 to 140
message ActionableErr {
ErrorCode errCode = 1; // error code representing the error
string message = 2; // message describing the error.
repeated string suggestions = 3; // a list of suggestions for the error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think reporting messages here is the right thing to do. I'm wondering if it would be better to have a secondary set of possible action codes that could be returned instead. That would also allow i18n or more.

(I think it's also an opportunity to somehow segregate PII fields, like artifact names or image tags, so that we might be able to report the actual errors seen?)

Copy link
Contributor Author

@tejal29 tejal29 May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I should have made this clear. The suggestion text is just for IDEs to show to User.
Having suggestions plumbed through skaffold will reduce the duplication of efforts in both IDEs
I will add that as a comment.
For metrics, we should probably report codes like you mentioned.

logEntry.Entry = fmt.Sprintf("Dev Iteration %d in progress", de.Iteration)
case Succeeded:
logEntry.Entry = fmt.Sprintf("Dev Iteration %d successful", de.Iteration)
default:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this case Failed:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tejal29 tejal29 force-pushed the add_skaffold_dev_loop_events branch from 5a540c2 to b467110 Compare May 4, 2020 22:54
@tejal29 tejal29 force-pushed the add_skaffold_dev_loop_events branch from b467110 to 87a56c4 Compare May 4, 2020 22:57
@tejal29 tejal29 force-pushed the add_skaffold_dev_loop_events branch from 87a56c4 to 9a2f0ff Compare May 4, 2020 23:13
pkg/skaffold/event/event.go Outdated Show resolved Hide resolved
pkg/skaffold/runner/dev.go Outdated Show resolved Hide resolved
proto/skaffold.proto Show resolved Hide resolved
proto/skaffold.proto Show resolved Hide resolved
},
{
"name": "event.devLoopEvent.err.errCode",
"description": " - UNKNOWN_ERROR: Could not determine error and phase\n - STATUSCHECK_SUCCESS: Status Check Success\n - STATUSCHECK_IMAGE_PULL_ERR: Container image pull error\n - STATUSCHECK_CONTAINER_CREATING: Container creating error\n - STATUSCHECK_RUN_CONTAINER_ERR: Container run error\n - STATUSCHECK_CONTAINER_TERMINATED: Container is already terminated\n - STATUSCHECK_CONTAINER_RESTARTING: Container restarting error\n - STATUSCHECK_NODE_MEMORY_PRESSURE: Node memory pressure error\n - STATUSCHECK_NODE_DISK_PRESSURE: Node disk pressure error\n - STATUSCHECK_NODE_NETWORK_UNAVAILABLE: Node network unavailable error\n - STATUSCHECK_NODE_PID_PRESSURE: Node PID pressure error\n - STATUSCHECK_NODE_UNSCHEDULABLE: Node unschedulable error\n - STATUSCHECK_NODE_UNREACHABLE: Node unreachable error\n - STATUSCHECK_NODE_NOT_READY: Node not ready error\n - STATUSCHECK_UNKNOWN: Status Check error unknown\n - STATUSCHECK_UNKNOWN_UNSCHEDULABLE: Container is unschedulable due to unknown reasons\n - STATUSCHECK_CONTAINER_WAITING_UNKNOWN: Container is waiting due to unknown reason\n - DEPLOY_UNKNOWN: Deploy failed due to unknown reason\n - SYNC_UNKNOWN: SYNC failed due to known reason\n - BUILD_UNKNOWN: Build failed due to unknown reason\n - DEVINIT_UNKNOWN: Dev Init failed due to unknown reason\n - CLEANUP_UNKNOWN: Cleanup failed due to unknown reason\n - SYNC_INIT_ERROR: File Sync Initialize failure\n - DEVINIT_REGISTER_BUILD_DEPS: Failed to configure watcher for build dependencies in dev loop\n - DEVINIT_REGISTER_TEST_DEPS: Failed to configure watcher for test dependencies in dev loop\n - DEVINIT_REGISTER_DEPLOY_DEPS: Failed to configure watcher for deploy dependencies in dev loop\n - DEVINIT_REGISTER_CONFIG_DEP: Failed to configure watcher for Skaffold configuration file.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something to make this a more manageable string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was recently changed in #4083 . I will follow up later.

proto/skaffold.proto Show resolved Hide resolved
@tejal29 tejal29 merged commit 53581a9 into GoogleContainerTools:master May 5, 2020
@tejal29 tejal29 mentioned this pull request May 5, 2020
@tejal29 tejal29 deleted the add_skaffold_dev_loop_events branch April 15, 2021 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants