Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ship full log output to Cronitor #14

Merged
merged 9 commits into from
May 3, 2022
Merged

Ship full log output to Cronitor #14

merged 9 commits into from
May 3, 2022

Conversation

jdotjdot
Copy link
Contributor

@jdotjdot jdotjdot commented Apr 28, 2022

This PR adds full log-shipping capabilities without truncation, like are available in https://github.com/cronitorio/cronitor-kubernetes, to Cronitor-CLI.

Still remaining:

  • Ship logs via S3
  • Add CLI flag to turn log-shipping on or off

@jdotjdot jdotjdot requested a review from shaneharter April 28, 2022 03:06
@jdotjdot jdotjdot marked this pull request as ready for review April 28, 2022 03:06
cmd/exec.go Outdated
outputForLogs := gatherOutput(tempFile, false)
_, err = shipLogData(monitorCode, series, string(outputForLogs))
if err != nil {
fmt.Printf("%v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch this to a log() function call and do the same on the other block.

Also since this inline goroutine is like 5+ LOC and used twice, i would probably just give it a name and call it

cmd/exec.go Outdated
outputForPing := gatherOutput(tempFile)
outputForPing := gatherOutput(tempFile, true)
var metrics map[string]int = nil
logLengthForPing, err := getFileSize(tempFile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this err is overwriting the err on L180 that is used later on L203

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Collaborator

@shaneharter shaneharter Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will rename the one on 180 -- not very defensive here. i'll do it in a branch i'm working on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly you shouldn't have to be, in my opinion this is a failure of go structurally

cmd/exec.go Outdated
return &b
}

func getPresignedUrl(postBody []byte) ([]byte, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to move this into lib/cronitor to keep this exec command focused on its main job. That module currently uses our v3 api end points but I will eventually update them to be consistent with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts on this:

  • As you said, currently lib/cronitor is built very differently than that bit is, including API versions (presign doesn't even have a version), etc.
  • Personally I think the presign API call actually acts quite a bit differently than lib/cronitor and the rest of the API. It is technically an API call, but only because we need to keep the AWS keys secret in order to presign. I think there's another valid argument that it actually belongs inside exec, because its sole purpose is to presign URLs for log uploading, which is very much an exec-related task.

Given the urgency I understand around shipping this, I suggest we leave this as is for now, and then if you disagree with what I wrote above and really want me to move it to lib/cronitor, I can refactor after.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I just pushed this, i was hung up at the end with a problem but the I realize it failed still when I stashed my changes.

Copy link
Contributor Author

@jdotjdot jdotjdot Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a time thing--philosophically I genuinely think it should sit in exec. But I'm happy to move it.

Regarding shipping now vs later, I misunderstood what you were saying--I thought you were suggesting refactoring the methods to fit as methods of the CronitorApi struct that appears in lib/cronitor, which from looking through it seemed like it would take longer given the assumptions about the API it currently makes--which actually if we did it would be clean. cronitor-kubernetes has a very similar structure and it looks nice. And actually looking at what I did there, I actually did attach shipLogData to the main struct, so I'm probably disproving my own earlier point given how clean that looks, and maybe it should be refactored.

That said if you literally just meant copy/pasting it to lib/cronitor, obviously that will take like 5 seconds. I can do that now.

Copy link
Collaborator

@shaneharter shaneharter Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the error currently:

mac: ~/0x0/cli $ ./cronitor-cli -v exec shane-testy2 'true && echo "test"'
Reading config from /etc/cronitor/cronitor.json
Running subcommand: true && echo "test"
Sending ping https://cronitor.link/ping/xxx/shane-testy2?state=run&try=1&stamp=1651207111.322&msg=true+%26%26+echo+%22test%22&host=Shanes-MacBook-Pro.local&series=1651207111.322
test
Sending ping https://cronitor.link/ping/xxx/shane-testy2?state=complete&try=1&stamp=1651207111.351&msg=test%0A&host=Shanes-MacBook-Pro.local&duration=0.029&series=1651207111.322&status_code=0&metric=length%3A5
error generating presign url for log uploading: error response code 400 returned

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pings are correct, has the correct length, etc, but the log upload is not happy
https://cronitor.io/app/monitors/shane-testy2?env=production&time=7d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure I figured out what's going on here and fixed in the latest commit. Your sample command works for me locally now, pull latest and try again

cmd/exec.go Outdated Show resolved Hide resolved
cmd/exec.go Outdated Show resolved Hide resolved
@jdotjdot jdotjdot requested a review from shaneharter April 29, 2022 03:10
@shaneharter
Copy link
Collaborator

Some failing tests here I will dig in and see what's up

mac: ~/0x0/cli/tests $ ./test-exec.sh

Exec uses bash when available.. OK
Exec runs command check.. OK
Exec runs command with complex args.. OK
Exec runs command with really complex args.. OK
Exec sends complete ping on success.. FAIL
Exec sends fail ping on failure.. FAIL
Exec sends status code on complete ping.. OK
Exec sends environment in pings.. OK
Exec sends run timestamp as complete ping series.. OK
Exec sends duration with complete ping.. OK
Exec sends command with run ping.. FAIL
Exec sends stdout with complete ping.. OK
Exec does not send stdout when suppressed.. OK
Exec passes stdout through to caller.. OK
Exec passes stdout through to caller with newline chars intact.. OK
Exec passes exitcode through to caller.. OK

@jdotjdot
Copy link
Contributor Author

jdotjdot commented May 3, 2022

Some failing tests here I will dig in and see what's up

mac: ~/0x0/cli/tests $ ./test-exec.sh

Exec uses bash when available.. OK
Exec runs command check.. OK
Exec runs command with complex args.. OK
Exec runs command with really complex args.. OK
Exec sends complete ping on success.. FAIL
Exec sends fail ping on failure.. FAIL
Exec sends status code on complete ping.. OK
Exec sends environment in pings.. OK
Exec sends run timestamp as complete ping series.. OK
Exec sends duration with complete ping.. OK
Exec sends command with run ping.. FAIL
Exec sends stdout with complete ping.. OK
Exec does not send stdout when suppressed.. OK
Exec passes stdout through to caller.. OK
Exec passes stdout through to caller with newline chars intact.. OK
Exec passes exitcode through to caller.. OK

Let me know what you find and I can look into it / fix.
Is it worth me spending time automating these tests in CI? I didn't realize there were tests that could be run.

@shaneharter
Copy link
Collaborator

Yes we have the beginnings of a docker-based test rig so it can be tested on different platforms that should be easy to adapt to Actions. But valuable right now is depending on how long it takes.

@shaneharter
Copy link
Collaborator

This might be a test update that didn't get committed when I migrated to the new ping URL format, verifying

@jdotjdot
Copy link
Contributor Author

jdotjdot commented May 3, 2022

Yes we have the beginnings of a docker-based test rig so it can be tested on different platforms that should be easy to adapt to Actions. But valuable right now is depending on how long it takes.

Let me know! We could work on it and decide on a timebox for it. Usually setting up CI isn't too bad

@shaneharter shaneharter merged commit a725fd4 into master May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants