Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xcli error enhancements #504

Merged
merged 1 commit into from
Apr 11, 2017

Conversation

codenrhoden
Copy link
Contributor

This patch makes the executor binary emit more detailed error messages
to the stderr when they are encountered. It also modifies the libStorage
client to log the stderr of the lsx command when it sees an error.

@codenrhoden codenrhoden added this to the 2017.04 milestone Apr 7, 2017
@codenrhoden codenrhoden requested a review from akutz April 7, 2017 17:45
@codenrhoden
Copy link
Contributor Author

@akutz Here is a fun one for you that I'd like to get your opinion on. From what we saw from a user yesterday, I noticed that the error message coming back from a failed xcli command was not as detailed as I would have expected. So here is my idea of how to change that, along with a small example of what could be done from the RBD executor. Ultimately this just enhances the framework -- it would be up to the driver executors to send correctly formatted Error messages up. But this allows us to see the inner messages from Goof even at the CLI. Not sure if you agree that is worthwhile. For an example, here is a snippet from before, w/o this patch:

DEBU[0001] invoking executor cli                         args=[rbd instanceID] cmd=/var/lib/libstorage/lsx-linux host=unix:///var/run/libstorage/228061063.sock server=cute-witch-kp service=rbd storageDriver=libstorage time=1491582191203 txCR=1491582191 txID=1377a900-b79b-400e-50f5-8562f6111382
DEBU[0001] signalling executor lock                      host=unix:///var/run/libstorage/228061063.sock server=cute-witch-kp service=rbd storageDriver=libstorage time=1491582191233 txCR=1491582191 txID=1377a900-b79b-400e-50f5-8562f6111382
Oops, an error occured!

  error initializing instance ID cache

Here we got no indication of what is wrong.

If add the logging of the error from the libS client, we get a little more info:

DEBU[0001] invoking executor cli                         args=[rbd instanceID] cmd=/var/lib/libstorage/lsx-linux host=unix:///var/run/libstorage/009997218.sock server=noble-puppy-co service=rbd storageDriver=libstorage time=1491582575349 txCR=1491582575 txID=263b75ca-3ce7-4679-5159-03a68293b564
ERRO[0001] error from executor cli                       args=[rbd instanceID] cmd=/var/lib/libstorage/lsx-linux host=unix:///var/run/libstorage/009997218.sock server=noble-puppy-co service=rbd stderr=time="2017-04-07T16:29:35Z" level=debug msg="initializing configuration"
time="2017-04-07T16:29:35Z" level=debug msg="enter join" elem=[config.yml]
time="2017-04-07T16:29:35Z" level=debug msg="exit join" elem=[/etc/libstorage config.yml]
time="2017-04-07T16:29:35Z" level=debug msg="enter join" elem=[config.yaml]
time="2017-04-07T16:29:35Z" level=debug msg="exit join" elem=[/etc/libstorage config.yaml]
error: error getting instanceID: Unable to get Ceph monitors
 storageDriver=libstorage time=1491582575399 txCR=1491582575 txID=263b75ca-3ce7-4679-5159-03a68293b564
DEBU[0001] signalling executor lock                      host=unix:///var/run/libstorage/009997218.sock server=noble-puppy-co service=rbd storageDriver=libstorage time=1491582575399 txCR=1491582575 txID=263b75ca-3ce7-4679-5159-03a68293b564
Oops, an error occured!

  error initializing instance ID cache

Here at least I see the toplevel error returned by the executor, Unable to get Ceph monitors. But even then, it was missing the inner message that I know it has. so that's why I set the IncludedFieldsInError property to true. And then, we get:

DEBU[0002] invoking executor cli                         args=[rbd instanceID] cmd=/var/lib/libstorage/lsx-linux host=unix:///var/run/libstorage/551956131.sock server=button-shrieker-ir service=rbd storageDriver=libstorage time=1491585983166 txCR=1491585983 txID=1c820eee-5383-41e4-74e3-3cbf1307af19
ERRO[0002] error from executor cli                       args=[rbd instanceID] cmd=/var/lib/libstorage/lsx-linux host=unix:///var/run/libstorage/551956131.sock server=button-shrieker-ir service=rbd stderr=time="2017-04-07T17:26:23Z" level=debug msg="initializing configuration"
time="2017-04-07T17:26:23Z" level=debug msg="enter join" elem=[config.yml]
time="2017-04-07T17:26:23Z" level=debug msg="exit join" elem=[/etc/libstorage config.yml]
time="2017-04-07T17:26:23Z" level=debug msg="enter join" elem=[config.yaml]
time="2017-04-07T17:26:23Z" level=debug msg="exit join" elem=[/etc/libstorage config.yaml]
error: error getting instanceID: msg="unable to get Ceph monitors" stderr="2017-04-07 17:26:23.221788 7f9142c60780 -1 did not load config file, using default settings.\n"
 storageDriver=libstorage time=1491585983226 txCR=1491585983 txID=1c820eee-5383-41e4-74e3-3cbf1307af19
DEBU[0002] signalling executor lock                      host=unix:///var/run/libstorage/551956131.sock server=button-shrieker-ir service=rbd storageDriver=libstorage time=1491585983226 txCR=1491585983 txID=1c820eee-5383-41e4-74e3-3cbf1307af19
Oops, an error occured!

  error initializing instance ID cache

Now we get the actual stderr for the underlying command the executor ran. I think this is helpful.

I admit, I have no idea why the stderr includes the debug messages about the Joins for the config file....

@codecov-io
Copy link

codecov-io commented Apr 7, 2017

Codecov Report

Merging #504 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #504   +/-   ##
=======================================
  Coverage   30.21%   30.21%           
=======================================
  Files          34       34           
  Lines        1959     1959           
=======================================
  Hits          592      592           
  Misses       1303     1303           
  Partials       64       64

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a45edf...e9f8a4b. Read the comment docs.

@akutz
Copy link
Collaborator

akutz commented Apr 7, 2017

Hi @codenrhoden,

I'm reposting your comment without the DEBUG logs to see if it helps me visualize the difference:

@akutz Here is a fun one for you that I'd like to get your opinion on. From what we saw from a user yesterday, I noticed that the error message coming back from a failed xcli command was not as detailed as I would have expected. So here is my idea of how to change that, along with a small example of what could be done from the RBD executor. Ultimately this just enhances the framework -- it would be up to the driver executors to send correctly formatted Error messages up. But this allows us to see the inner messages from Goof even at the CLI. Not sure if you agree that is worthwhile. For an example, here is a snippet from before, w/o this patch:

Oops, an error occured!

  error initializing instance ID cache

Here we got no indication of what is wrong.

If add the logging of the error from the libS client, we get a little more info:

ERRO[0001] error from executor cli                       args=[rbd instanceID] cmd=/var/lib/libstorage/lsx-linux host=unix:///var/run/libstorage/009997218.sock server=noble-puppy-co service=rbd stderr=time="2017-04-07T16:29:35Z" level=debug msg="initializing configuration"
time="2017-04-07T16:29:35Z" level=debug msg="enter join" elem=[config.yml]
time="2017-04-07T16:29:35Z" level=debug msg="exit join" elem=[/etc/libstorage config.yml]
time="2017-04-07T16:29:35Z" level=debug msg="enter join" elem=[config.yaml]
time="2017-04-07T16:29:35Z" level=debug msg="exit join" elem=[/etc/libstorage config.yaml]
error: error getting instanceID: Unable to get Ceph monitors
 storageDriver=libstorage time=1491582575399 txCR=1491582575 txID=263b75ca-3ce7-4679-5159-03a68293b564
Oops, an error occured!

  error initializing instance ID cache

Here at least I see the toplevel error returned by the executor, Unable to get Ceph monitors. But even then, it was missing the inner message that I know it has. so that's why I set the IncludedFieldsInError property to true. And then, we get:

ERRO[0002] error from executor cli                       args=[rbd instanceID] cmd=/var/lib/libstorage/lsx-linux host=unix:///var/run/libstorage/551956131.sock server=button-shrieker-ir service=rbd stderr=time="2017-04-07T17:26:23Z" level=debug msg="initializing configuration"
time="2017-04-07T17:26:23Z" level=debug msg="enter join" elem=[config.yml]
time="2017-04-07T17:26:23Z" level=debug msg="exit join" elem=[/etc/libstorage config.yml]
time="2017-04-07T17:26:23Z" level=debug msg="enter join" elem=[config.yaml]
time="2017-04-07T17:26:23Z" level=debug msg="exit join" elem=[/etc/libstorage config.yaml]
error: error getting instanceID: msg="unable to get Ceph monitors" stderr="2017-04-07 17:26:23.221788 7f9142c60780 -1 did not load config file, using default settings.\n"
 storageDriver=libstorage time=1491585983226 txCR=1491585983 txID=1c820eee-5383-41e4-74e3-3cbf1307af19
Oops, an error occured!

  error initializing instance ID cache

Now we get the actual stderr for the underlying command the executor ran. I think this is helpful.

I admit, I have no idea why the stderr includes the debug messages about the Joins for the config file....

I realize part of the issue I'm encountering is that you're running the executor in the above examples with debug logging as well. Which of course is being captured in the above output. Can you reproduce the above examples without debug logging enabled? Since the executor does not detect a terminal its logs are in machine format and make it harder to remove since they're a part of the STDERR you're trying to display here.

@codenrhoden
Copy link
Contributor Author

Ah, I see what you are saying @akutz. I don't think it's a big deal at all, then.

FWIW, here's what this patch looks like when running with the default warn loglevel:

# ./rexray -s rbd volume ls
ERRO[0001] error from executor cli                       args=[rbd instanceID] cmd=/var/lib/libstorage/lsx-linux host=unix:///var/run/libstorage/224270231.sock server=boulder-snout-sh service=rbd stderr=error: error getting instanceID: msg="unable to get Ceph monitors" stderr="2017-04-07 20:12:57.761171 7f371d455780 -1 did not load config file, using default settings.\n"
 storageDriver=libstorage time=1491595977764 txCR=1491595977 txID=500b8524-f9c0-41fb-4ca8-8f57b1fa43ef
Oops, an error occured!

  error initializing instance ID cache

It's verbose, but I think helps get to underlying problems faster.

@codenrhoden codenrhoden force-pushed the enhancement/xcli_stderr branch from 2bb0fb2 to 3662032 Compare April 7, 2017 20:48
@akutz
Copy link
Collaborator

akutz commented Apr 11, 2017

Hi @codenrhoden,

Do you mind rebasing this PR? Once that's done I'll happily merge it.

@codenrhoden codenrhoden force-pushed the enhancement/xcli_stderr branch from 3662032 to 793bd67 Compare April 11, 2017 16:11
@akutz
Copy link
Collaborator

akutz commented Apr 11, 2017

Hi @codenrhoden,

Ugh, I'm sorry. Rebase one more time? My PR screwed this up for you. I didn't think it would have any overlap.

@codenrhoden
Copy link
Contributor Author

@akutz no problem. It seems to be lately that GitHub has changed, such that any commit to master makes the branches out of date, even if they don't touch the same files. Maybe it's because they have the "rebase and merge" option now? I really don't know.

This patch makes the executor binary emit more detailed error messages
to the stderr when they are encountered. It also modifies the libStorage
client to log the stderr of the lsx command when it sees an error.
@codenrhoden codenrhoden force-pushed the enhancement/xcli_stderr branch from 793bd67 to e9f8a4b Compare April 11, 2017 16:27
@akutz
Copy link
Collaborator

akutz commented Apr 11, 2017

Hi @codenrhoden,

I agree that it seems this way but was avoiding saying that on purpose to not start a discussion about what that might mean for us :(

@akutz akutz merged commit 637e97f into thecodeteam:master Apr 11, 2017
@codenrhoden codenrhoden deleted the enhancement/xcli_stderr branch May 2, 2017 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants