Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Replacing the static pod definition for etcd ip. #3100

Merged
merged 1 commit into from
Jun 1, 2018

Conversation

kwoodson
Copy link
Contributor

@kwoodson kwoodson commented May 29, 2018

With recent changes the etcd health check url IP was incorrect. This provides us a method to update it.

Also includes a fix for the latest 3.10 changes to subsequent roles.
@jim-minter @Kargakis @pweil-

@codecov
Copy link

codecov bot commented May 29, 2018

Codecov Report

Merging #3100 into master will decrease coverage by 0.39%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #3100     +/-   ##
=========================================
- Coverage   51.89%   51.49%   -0.4%     
=========================================
  Files          99       99             
  Lines       15160    15038    -122     
=========================================
- Hits         7867     7744    -123     
- Misses       6571     6584     +13     
+ Partials      722      710     -12
Impacted Files Coverage Δ
pkg/acsengine/engine.go 63.1% <0%> (-0.09%) ⬇️
pkg/api/types.go 32.88% <0%> (-6.2%) ⬇️
pkg/api/common/versions.go 83.51% <0%> (-5.91%) ⬇️
pkg/api/v20170701/types.go 30.55% <0%> (-4.17%) ⬇️
pkg/api/vlabs/validate.go 46.22% <0%> (-4.01%) ⬇️
pkg/api/vlabs/types.go 21.42% <0%> (-0.65%) ⬇️
pkg/acsengine/defaults-apiserver.go 97.56% <0%> (-0.4%) ⬇️
pkg/acsengine/addons.go 100% <0%> (ø) ⬆️
pkg/acsengine/k8s_versions.go 100% <0%> (ø) ⬆️
pkg/api/converterfromapi.go 9.67% <0%> (+0.06%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 497e012...69bcd35. Read the comment docs.

@acs-bot acs-bot added size/S and removed size/XS labels May 29, 2018
@kwoodson kwoodson changed the title Replacing the static pod definition for etcd ip. [WIP] Replacing the static pod definition for etcd ip. May 29, 2018
@kwoodson
Copy link
Contributor Author

Still seeing errors with the template_service_broker. It requires this variable be set l_os_registry_url. I attempted to include the vars from openshift_facts but this variable was not updated as denoted by the debug statement:

TASK [debug] *******************************************************************
ok: [localhost] => {
    "l_os_registry_url": "VARIABLE IS NOT DEFINED!"
}

@@ -9,6 +9,7 @@ else
SERVICE_TYPE=origin
fi
VERSION="$(rpm -q $SERVICE_TYPE --queryformat %{VERSION})"
IP_ADDRESS="$(ip route get 1 | sed -n 's/.* src \([^ ]*\) .*/\1/p')"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would a dig +short on the hostname be easier here? Maybe with all the dns funkiness we've seen it isn't a good idea 😄

If this is the best way are the fields variable requiring the sed+regex or would a ip route get 1 | awk '{print $(NF-2);exit}' work? I'm a regex hater, they're hard to read, but ack that it may be the best thing so that's just a nit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil- I'm debated the different methods of finding the IP address. The issue I want to eliminate is hosts with multiple IP addresses. I want this to be a reliable way of returning it. I asked around and the ip route get 1 was recommended because it is very generic and worked with hosts that have multiple interfaces. This method chooses the IP address that is routing to the internet which is most likely the one we want to use.

In this case on my master this was what was needed with your suggestion. Does that change with multiple addresses?

 ip route get 1 | awk '{print $(NF); exit}'
10.0.0.11

I attempted this a few different ways that seemed to produce the answer we wanted but I'm not sure how these behave in a multi-ip environment:

hostname -i
ifconfig eth0 | awk '/inet/{ print $2 ; exit}'
ip a s eth0 | awk '/inet/{ split($2, a, "/"); print a[1]; exit}'
python -c 'import socket; print socket.gethostbyname(socket.gethostname())'

@pweil- @jim-minter @Kargakis @mjudeikis, thoughts here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty certain that hostname -i doesn’t guarantee order so that won’t work. Dig or what you have now I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dig command didn't return any results.

[root@ocp-master-18884060-0 ~]# dig +short $(hostname)
[root@ocp-master-18884060-0 ~]# 

Copy link
Contributor

@mjudeikis mjudeikis May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally Azure should give API to get this where we could curl and get the value. But I agree with ip route get 1 and awk over regexp as is the one I used in most of my scripts. If we can have boxes with multiple interfaces ToDo write a test for this case.

If no api, long term plan is to write separate function to get IP, do some reverse-lookups to check it responds back with what we want and inject it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my vote for now is ip get route 1 . But most of this script must go... 🗡️

As a long term I would like to consider writing our own small golang binary which would do all the curl,wget,ip etc. Which would not relay on the underlying infrastructure/image binaries. it would do simple things like cli get ip, cli get version etc etc? This could enable us to have more control over how we construct metadata. Overkill?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

derp...why was I saying dig...we were using host $(hostname). Anyway, the default route or the metadata service is fine. If it is any comfort, the assumption of interface[0] and ipaddress[0] is documented. I'm fine with the ip get route.

Copy link
Contributor Author

@kwoodson kwoodson May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my local machine(Fedora) the awk command returns different output so the version using NF wouldn't work.

If we are ok with the sed version I'll remove [WIP]. If you'd prefer the curl instead of sed then I'll swap it out.

Copy link
Contributor

@mjudeikis mjudeikis May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be overprotective, but if we do both and compare? And fail if they dont match?
or just to check default interface and if it matches the IP we going to use?

#check default interface
default_interface=$(route | grep '^default' | grep -o '[^ ]*$')
if [[ "$(ip route get 1)" == *"$default_interface"* ]]; then echo "this is my ip"; else echo "exit 1"; fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwoodson we are guaranteed to know the master IP address in advance - it is in the apimodel. Template this into the masterscript.sh, don't look it up at all. Pass the value through here https://github.com/jim-minter/acs-engine/blob/master/pkg/acsengine/engine.go#L1907-L1917, you'll need to add something like MasterIP: cs.Properties.MasterProfile.FirstConsecutiveStaticIP,.

@kwoodson
Copy link
Contributor Author

The latest deploy here was successful.

INFO[0025] Starting ARM Deployment (kwoodsoncluster-1597550861). This will take some time... 
INFO[0764] Finished ARM Deployment (kwoodsoncluster-1597550861). Succeeded 

@jim-minter
Copy link
Member

@kwoodson
Copy link
Contributor Author

@jim-minter, I have templated the master script as requested.

Copy link
Collaborator

@pweil- pweil- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@jim-minter ftw 😄

@acs-bot acs-bot added the lgtm label May 31, 2018
@kwoodson kwoodson changed the title [WIP] Replacing the static pod definition for etcd ip. Replacing the static pod definition for etcd ip. May 31, 2018
@@ -2,7 +2,11 @@
hosts: localmaster
gather_facts: false

tasks:
# we need variables from openshift_facts to be in scope
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwoodson why does this file change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jim-minter, inside of openshift-ansible the template service broker has the 'l_os_registry_url` variable defined.

https://github.com/openshift/openshift-ansible/blob/master/roles/template_service_broker/defaults/main.yml#L11

This variable is being set in openshift_facts and the expectation is that we include these facts to run these other roles:
https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_facts/defaults/main.yml#L15

I included the facts which brings those variables into scope.

Hope this helps clarify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@jim-minter
Copy link
Member

/assign CecileRobertMichon

@CecileRobertMichon please merge

@CecileRobertMichon
Copy link
Contributor

/approve

@acs-bot
Copy link

acs-bot commented May 31, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@0xmichalis
Copy link
Contributor

openshift test failure is the etcd timeout: #2918

@0xmichalis
Copy link
Contributor

flake was #3114

@0xmichalis
Copy link
Contributor

@CecileRobertMichon should these codecov contexts show up here?

@CecileRobertMichon
Copy link
Contributor

@Kargakis it should. Was the PR rebased since cb48589? I'll merge this one anyway

@CecileRobertMichon CecileRobertMichon merged commit 25ebceb into Azure:master Jun 1, 2018
@kwoodson kwoodson deleted the azure_etcd_ip branch June 1, 2018 17:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants