Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in refresh_parser.rb when using startup params: -Dhawkular.agent.machine.id=machineA #617

Closed
mtho11 opened this issue Mar 7, 2017 · 13 comments

Comments

@mtho11
Copy link
Contributor

mtho11 commented Mar 7, 2017

When a hawkular provider refreshes its resources and one of those EAP servers is started with an explicit machine id (-Dhawkular.agent.machine.id=machineA) it causes the refresh_parser.rb to blow up.

Steps to Reproduce

  1. Start a hawkular server
  2. Add a Hawkular Provider
  3. Add manger EAP 7.0 server (with Hawkular Agent) to Hawkular Provider with parameter: -Dhawkular.agent.machine.id=machineA where machineA can be any string. For example:
bin/standalone.sh -Djboss.socket.binding.port-offset=100 -Dhawkular.agent.in-container=true -Dhawkular.agent.immutable=true -Dhawkular.agent.machine.id=machineA
  1. Examine the log/evm.log for the stacktrace

Here is the relevant bits of the stack trace

[----] E, [2017-03-07T11:01:29.880703 #24485:3fcdf843fa0c] ERROR -- : /Users/mtho11/projects/manageiq/app/models/manageiq/providers/hawkular/middleware_manager/refresh_parser.rb:118:in `swap_part'
/Users/mtho11/projects/manageiq/app/models/manageiq/providers/hawkular/middleware_manager/refresh_parser.rb:106:in `alternate_machine_id'
/Users/mtho11/projects/manageiq/app/models/manageiq/providers/hawkular/middleware_manager/refresh_parser.rb:35:in `block (2 levels) in fetch_middleware_servers'
/Users/mtho11/projects/manageiq/app/models/manageiq/providers/hawkular/middleware_manager/refresh_parser.rb:29:in `each'

Here is the relevant error line number:
https://github.com/mtho11/manageiq/blob/93e98d8ac0e5119e5e518bf3f3cf3fc4a559c94d/app/models/manageiq/providers/hawkular/middleware_manager/refresh_parser.rb#L118-L118

@mtho11
Copy link
Contributor Author

mtho11 commented Mar 7, 2017

CC @pilhuhn @rubenvp8510

@mtho11
Copy link
Contributor Author

mtho11 commented Mar 7, 2017

@miq-bot assign @abonas

@mtho11
Copy link
Contributor Author

mtho11 commented Mar 7, 2017

@miq-bot add_label bug, middleware

@rubenvp8510
Copy link
Contributor

I addressed this issue in this PR: ManageIQ/manageiq#14043 it is because the method alternate_machine_id assume that the string has a certain length.

I can separate that code in another PR for fix this.

@pilhuhn
Copy link
Contributor

pilhuhn commented Mar 8, 2017

We need some serious tests for this alternate_machine_id thing with real data @jpkrohling may still have some data from linking VMs

@jpkrohling
Copy link

Machine ID is a real thing, with a specification and all:
https://www.freedesktop.org/software/systemd/man/machine-id.html

If the given specified Machine ID is not on this format, it will break and will not match the Machine ID (or BIOS UUID) on the other side, like on RHEV-M guests.

If machineA is just test data, then I'd say the test data is incorrect and should use the format from the spec. If machineA is a real value, then it should be clear to the user that this machine ID will not match "the other side" of the linking.

@rubenvp8510
Copy link
Contributor

rubenvp8510 commented Mar 9, 2017

I have a problem with this, because in the case of linking containers, specifically in the case of Openshift containers, it takes the container name as the machine-id, for example if I have a container named hawkular-srv the machine id will be hawkular-srv, so in this particular case the machine-id doesn't compliant with the format. We need to change something there I guess.

@jpkrohling
Copy link

As discussed on IRC:

19:57:09          rubens | jpkroehling: pong 
19:57:17     jpkroehling | rubens: hola!      
19:57:22     jpkroehling | about the machine ID thing   
19:57:46     jpkroehling | do you *need* the container ID to be stored on the same field as the machine ID ?
19:58:12     jpkroehling | I think you could just add another fetcher method, to get hosts by their `container-id` 
20:00:13     jpkroehling | here: https://github.com/ManageIQ/manageiq/blob/master/app/models/manageiq/providers/hawkular/middleware_manager/refresh_parser.rb#L34                                
20:00:35     jpkroehling | you'd add a || find_host_by_container_id(container_id)                                                                                                                 
20:00:43     jpkroehling | rubens: makes sense ?                                                                     
20:01:26          rubens | it make sense, but I think we still need to fix or validate the alternate_machine_id method, because if not it will throw errors.                             
20:01:41     jpkroehling | well, feed it with bad data, it will throw bad data at you :)   
20:02:20     jpkroehling | but sure, something could catch this problem and tell the user about it                                                                             
20:02:50          rubens | jpkroehling: yes      

@jpkrohling
Copy link

We discussed a bit more on IRC, and there are a couple of things to note:

  1. hawkular/hawkular-agent@2205aec is not really correct. Either it should validate that the received value is an UUID, or it should be renamed to not be a machine-id. I would vote for renaming it, and storing it into the custom data map ( https://github.com/hawkular/hawkular-agent/blob/cfa468571e18f884ee1ac4db568f03607953d310/hawkular-wildfly-agent/src/main/java/org/hawkular/agent/monitor/extension/MonitorServiceConfiguration.java#L370 ) .

  2. The agent could possibly be enhanced to include a logic for reading the hostname (container ID), adding that to the custom data map mentioned above.

cc @jmazzitelli

@abonas
Copy link
Member

abonas commented Mar 19, 2017

I would like to understand better who is handling this issue and how.
I see here multiple things:

  1. suggestions to change the hawkular agent - issues should be opened in the relevant repo for that.
  2. issues with mapping machine id for containers use case - I'm unclear with how this is proposed to be handled?

@miq-bot miq-bot added the stale label Jan 3, 2018
@miq-bot
Copy link
Member

miq-bot commented Jan 3, 2018

This issue has been automatically marked as stale because it has not been updated for at least 6 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions!

@abonas
Copy link
Member

abonas commented Jan 4, 2018

@mtho11 can u close this pls

@mtho11
Copy link
Contributor Author

mtho11 commented Jan 10, 2018

@abonas closed

@mtho11 mtho11 closed this as completed Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants