Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS Periscope install failure #14

Closed
dstampfli opened this issue Mar 11, 2020 · 11 comments
Closed

AKS Periscope install failure #14

dstampfli opened this issue Mar 11, 2020 · 11 comments
Labels
bug 🐛 Something isn't working help wanted 📣 Extra attention is needed

Comments

@dstampfli
Copy link

Hi, I'm trying to install AKS Periscope on a private cluster with egress through a firewall. Diagnostics settings for the cluster are configured and point to a storage account. When I try to run az aks kollect, I get the following error:

azureuser@jbl-vm1:~$ az aks kollect --name spoke1-aks --resource-group spoke1-vnet-rg
No storage account specified. Try getting storage account from diagnostic settings

This will deploy a daemon set to your cluster to collect logs and diagnostic information and save them to the storage account spoke1akslogs as outlined in http://aka.ms/AKSPeriscope.

If you share access to that storage account to Azure support, you consent to the terms outlined in http://aka.ms/DiagConsent.

Do you confirm? (y/N): y

Getting credentials for cluster spoke1-aks
Merged "spoke1-aks-admin" as current context in /tmp/tmpgfmwgfz_

Starts collecting diag info for cluster spoke1-aks

Cleaning up aks-periscope resources if existing
serviceaccount "aks-periscope-service-account" deleted
serviceaccount "default" deleted
configmap "containerlogs-config" deleted
configmap "kubeobjects-config" deleted
configmap "nodelogs-config" deleted
daemonset.extensions "aks-periscope" deleted
secret "azureblob-secret" deleted
secret "default-token-tgdh6" deleted
clusterrolebinding.rbac.authorization.k8s.io "aks-periscope-role-binding" deleted
clusterrolebinding.rbac.authorization.k8s.io "aks-periscope-role-binding-view" deleted
clusterrole.rbac.authorization.k8s.io "aks-periscope-role" deleted
diagnostic.aks-periscope.azure.github.com "aks-periscope-diagnostic-aks-nodepool1-36173997-vmss000000" deleted
diagnostic.aks-periscope.azure.github.com "aks-periscope-diagnostic-aks-nodepool1-36173997-vmss000002" deleted
customresourcedefinition.apiextensions.k8s.io "diagnostics.aks-periscope.azure.github.com" deleted

Deploying aks-periscope

The command failed with an unexpected error. Here is the traceback:

'NoneType' object has no attribute 'replace'
Traceback (most recent call last):
File "/opt/az/lib/python3.6/site-packages/knack/cli.py", line 206, in invoke
cmd_result = self.invocation.execute(args)
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 608, in execute
raise ex
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 666, in _run_jobs_serially
results.append(self._run_job(expanded_arg, cmd_copy))
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 659, in _run_job
six.reraise(*sys.exc_info())
File "/opt/az/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 636, in _run_job
result = cmd_copy(params)
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 306, in call
return self.handler(*args, **kwargs)
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/init.py", line 493, in default_command_handler
return op(**command_args)
File "/home/azureuser/.azure/cliextensions/aks-preview/azext_aks_preview/custom.py", line 1442, in aks_kollect
normalized_fqdn = mc.fqdn.replace('.', '-')
AttributeError: 'NoneType' object has no attribute 'replace'

@Tatsinnit Tatsinnit added the triage 💭 Need triage to replicate or identify right fix label Mar 15, 2020
@Tatsinnit
Copy link
Member

Tatsinnit commented Mar 15, 2020

Cool, I tried replicating this but not able to generate the error mentioned above, unless you have some old version of az cli running. The error mentioned is not specific to Periscope but failure in the underlying aks-preview, I tried looking little deep and the failure in your machine is caused by NoneType object here this: https://github.com/Azure/azure-cli-extensions/blob/master/src/aks-preview/azext_aks_preview/custom.py#L1442

💡 Can you please share the specific steps how you are able to get the above error?

I have a feeling update az cli if its running on old version might help along with https://github.com/Azure/aks-periscope#user-guide removing and adding the aks-extension again.

cc: @JunSun17 , for anything extra which can be added.

Thanks heaps,

@dstampfli
Copy link
Author

I tested from the same VM using another cluster that is not private (public API server endpoint) and I was able to get periscope installed. However, I still can't deploy to a private cluster with egress through a firewall. When I look at the pods in the aks-periscope namespace, they are in a error state. To reproduce, start with a private cluster and try installing periscope.

@Tatsinnit Tatsinnit added help wanted 📣 Extra attention is needed bug 🐛 Something isn't working and removed bug 🐛 Something isn't working labels Apr 16, 2020
@Tatsinnit
Copy link
Member

Tatsinnit commented May 5, 2020

Update:

I think cli donot support or has gap in support of private cluster aks-preview.

As far as I can see by playing around with it, seem like even though via a jumpbox the private-cluster is pingable, there is no support from periscope or the aks-preview command via az-cli to handle managed client and its fully qualified names.

I will reach-out to few folks to get clarity, and update the document in this case (If that is the scenario).

Essentially from the cli point of view this jumpbox scenario on private DNS Zone is not getting correct qualified name hence the cli errors our in case of normal cluster this is not the case hence it works.

Surely there must be some network complexity in play here, will keep this thread updated further if I dig any information further for the storage.

Thanks,

@dstampfli
Copy link
Author

Thanks @Tatsinnit for checking on this. As part of my testing, I also noted that I needed to open my firewall to allow access to "aksrepos.azurecr.io" on port 443. This is not mentioned here - https://docs.microsoft.com/en-us/azure/aks/limit-egress-traffic .

@Tatsinnit
Copy link
Member

Tatsinnit commented May 11, 2020

Just keeping this thread up-to-date:

Cool, to dig a little deeper, I have also opened an associated bug against az-cli-extension just to get full understanding, should be able to get to the bottom soon. Thanks.

Detail: Azure/azure-cli-extensions#1679

@dstampfli
Copy link
Author

@Tatsinnit Thanks for your help. It looks like the fix for this was merged yesterday - Azure/azure-cli-extensions#1685. Not sure when a new release will be pushed but I will test as soon as it is available.

@Tatsinnit Tatsinnit added the bug 🐛 Something isn't working label May 19, 2020
@Tatsinnit
Copy link
Member

Tatsinnit commented May 20, 2020

👍 All good @dstampfli , while playing with the whole thing I was able to replicate couple of other things which were relating to private cluster and team here collaborated across to fix all of those thing in 1 go.

Gist:

  • I have tested this internal with our test acr and it worked from my jumpbox in same subnet as private-cluster.
  • Once we have an official acr I can share documentation for direct deployment which you should be able to test.
  • Fyi: I am unsure as to when the cli-extension repo release cycle is to use it via Kollect command (will update latter on this).

Thanks

@Tatsinnit Tatsinnit removed the triage 💭 Need triage to replicate or identify right fix label May 20, 2020
@Tatsinnit
Copy link
Member

Tatsinnit commented May 21, 2020

#TLDR

💡 Note: The new acr image for periscope is out. With regards to the az-cli release this will go to release S-169 and S-170 which to me seems like in few week. (I will check what these time line means from az-cli release).

#DETAILED

To unblock any other user for private-cluster , I did this to test our periscope level changes and it worked for me.

Use Direct Kubectl apply deployment command:

You will see output from periscope run and then check your storage account, I was successfully able to see a new container name > along with latest logs from my private-cluster

@Tatsinnit
Copy link
Member

💡 Hiya: @dstampfli Can you you please check now, I think you need to update your az cli version and command should be hooked from az cli thanks.

@dstampfli
Copy link
Author

@Tatsinnit I tested on 2 private clusters and it is working now!!! I tested using az aks kollect with a storage account configured in the RG's diagnostic settings. Thank you for your help!

@Tatsinnit
Copy link
Member

Thank you so much for this loop back sounds awesome!! Closing this issue. 😊🙏

sophsoph321 pushed a commit to sophsoph321/aks-periscope that referenced this issue May 7, 2021
# This is the 1st commit message:

Revert "introduce arcmode"

This reverts commit 5f4fed4.

# This is the commit message #2:

remove secrets

# This is the commit message Azure#3:

add print statement

# This is the commit message Azure#4:

update print statement

# This is the commit message Azure#5:

committed

# This is the commit message Azure#6:

committed

# This is the commit message Azure#7:

committed

# This is the commit message Azure#8:

committed

# This is the commit message Azure#9:

remove print statements

# This is the commit message Azure#10:

add helm collector

# This is the commit message Azure#11:

change helm command

# This is the commit message Azure#12:

add helm 3 installation

# This is the commit message Azure#13:

add curl command installation

# This is the commit message Azure#14:

change helm command

# This is the commit message Azure#15:

remove helm history

# This is the commit message Azure#16:

debug helm history

# This is the commit message Azure#17:

add repo command

# This is the commit message Azure#18:

change stable repo name

# This is the commit message Azure#19:

add write to file

# This is the commit message Azure#20:

add kured

# This is the commit message Azure#21:

change

# This is the commit message Azure#22:

changes

# This is the commit message Azure#23:

add default namespace

# This is the commit message Azure#24:

change

# This is the commit message Azure#25:

add integration test

# This is the commit message Azure#26:

changes

# This is the commit message Azure#27:

add helm test

# This is the commit message Azure#28:

change print statement to error

# This is the commit message Azure#29:

change

# This is the commit message Azure#30:

more changes

# This is the commit message Azure#31:

add go installation

# This is the commit message Azure#32:

fix unit test

# This is the commit message Azure#33:

iptables to Helm

# This is the commit message Azure#34:

add custom resource collector

# This is the commit message Azure#35:

add new exporter, diagnoser, collector

# This is the commit message Azure#36:

comment unused variables

# This is the commit message Azure#37:

debug exporter

# This is the commit message Azure#38:

filenames

# This is the commit message Azure#39:

test zip function

# This is the commit message Azure#40:

list files

# This is the commit message Azure#41:

fmt to log

# This is the commit message Azure#42:

delete lines

# This is the commit message Azure#43:

changed

# This is the commit message Azure#44:

get current directory

# This is the commit message Azure#45:

remove some print statements

# This is the commit message Azure#46:

test zip

# This is the commit message Azure#47:

changes

# This is the commit message Azure#48:

add windir check

# This is the commit message Azure#49:

minor fix

# This is the commit message Azure#50:

get hostname

# This is the commit message Azure#51:

add expose in dockerfile

# This is the commit message Azure#52:

add exec collector

# This is the commit message Azure#53:

mitigate exit code 126

# This is the commit message Azure#54:

change curl url from example.com to dp endpoint

# This is the commit message Azure#55:

changes

# This is the commit message Azure#56:

uncomment exec

# This is the commit message Azure#57:

add new diagnoser

# This is the commit message Azure#58:

debugging

# This is the commit message Azure#59:

debug

# This is the commit message Azure#60:

debugging

# This is the commit message Azure#61:

remove print statements

# This is the commit message Azure#62:

remove print

# This is the commit message Azure#63:

add back crd print statement

# This is the commit message Azure#64:

change

# This is the commit message Azure#65:

change

# This is the commit message Azure#66:

update dataPoint name

# This is the commit message Azure#67:

modify forloop

# This is the commit message Azure#68:

add filename to datapoint

# This is the commit message Azure#69:

add back log prints

# This is the commit message Azure#70:

test

# This is the commit message Azure#71:

add fields to diagnostic signal

# This is the commit message Azure#72:

add config content to diagnoser

# This is the commit message Azure#73:

change format from yaml to json

# This is the commit message Azure#74:

add parameters for kubeobject config map

# This is the commit message Azure#75:

Revert "introduce arcmode"

This reverts commit 5f4fed4.

# This is the commit message Azure#76:

fix helm collector style

# This is the commit message Azure#77:

revert changes that test arc customizations

# This is the commit message Azure#78:

fix merge conflicts

# This is the commit message Azure#79:

fix merge conflicts

# This is the commit message Azure#80:

Revert "Add v0.3 acr image for Private cluster fix. (Azure#22)"

This reverts commit 49dd302.

# This is the commit message Azure#81:

fix merge conflicts

# This is the commit message Azure#82:

fix merge conflicts

# This is the commit message Azure#83:

add print statement

# This is the commit message Azure#84:

update print statement

# This is the commit message Azure#85:

committed

# This is the commit message Azure#86:

committed

# This is the commit message Azure#87:

committed

# This is the commit message Azure#88:

committed

# This is the commit message Azure#89:

remove print statements

# This is the commit message Azure#90:

fix merge conflicts

# This is the commit message Azure#91:

fix merge conflicts

# This is the commit message Azure#92:

fix merge conflicts

# This is the commit message Azure#93:

add repo command

# This is the commit message Azure#94:

change stable repo name

# This is the commit message Azure#95:

add write to file

# This is the commit message Azure#96:

add kured

# This is the commit message Azure#97:

change

# This is the commit message Azure#98:

changes

# This is the commit message Azure#99:

add default namespace

# This is the commit message Azure#100:

change

# This is the commit message Azure#101:

add integration test

# This is the commit message Azure#102:

changes

# This is the commit message Azure#103:

add helm test

# This is the commit message Azure#104:

change print statement to error

# This is the commit message Azure#105:

change

# This is the commit message Azure#106:

more changes

# This is the commit message Azure#107:

add go installation

# This is the commit message Azure#108:

fix unit test

# This is the commit message Azure#109:

add custom resource collector

# This is the commit message Azure#110:

fix merge conflicts

# This is the commit message Azure#111:

comment unused variables

# This is the commit message Azure#112:

debug exporter

# This is the commit message Azure#113:

filenames

# This is the commit message Azure#114:

test zip function

# This is the commit message Azure#115:

list files

# This is the commit message Azure#116:

fmt to log

# This is the commit message Azure#117:

delete lines

# This is the commit message Azure#118:

changed

# This is the commit message Azure#119:

get current directory

# This is the commit message Azure#120:

remove some print statements

# This is the commit message Azure#121:

test zip

# This is the commit message Azure#122:

changes

# This is the commit message Azure#123:

add windir check

# This is the commit message Azure#124:

minor fix

# This is the commit message Azure#125:

get hostname

# This is the commit message Azure#126:

add expose in dockerfile

# This is the commit message Azure#127:

add exec collector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working help wanted 📣 Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants