Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error deploying containers on VCH 0.6.0 #3031

Closed
vmwarelab opened this issue Nov 6, 2016 · 31 comments
Closed

Error deploying containers on VCH 0.6.0 #3031

vmwarelab opened this issue Nov 6, 2016 · 31 comments
Labels
area/appliance area/storage Storage-related functionality component/persona/docker source/customer Reported by a customer, directly or via an intermediary

Comments

@vmwarelab
Copy link

// A self-contained demonstration of the problem follows...

when executing the following from the client machine against VCH
docker -H 192.168.0.69:2376 --tls  run -p 80:80 vmwarecna/nginx 

Expected behavior:

run the nginx container

Actual behavior:

I get the following erorr
docker: Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default &{Code:0xc42020d2d0 Message:parent (scratch) doesn't exist in http://VCH01/storage/images/421de96b-21dd-8489-9624-2e10936158c7: cannot stat '[MGMT-LocalDisk1] VCH01/VIC/421de96b-21dd-8489-9624-2e10936158c7/images/scratch/manifest': No such file}.

Any help is appreciated. Thank you

@vmwarelab vmwarelab changed the title Error deploying containers on VCH Error deploying containers on VCH 0.6.0 Nov 6, 2016
@mlh78750
Copy link
Contributor

mlh78750 commented Nov 7, 2016

@vmwarelab Can you please collect the logs from the vic-admin page? And do you still happen to have the command line params you used to install the VCH?

@vmwarelab
Copy link
Author

vmwarelab commented Nov 7, 2016

container-logs (1).zip

command i have used

 ./vic-machine-linux create --name VCH01 -t '[email protected]:[email protected]' --compute-resource 'Management CL' --external-network dvPortGroup-Management-Network --bridge-network dvPortGroup-Bridge-VCH --image-store MGMT-LocalDisk1

i also found out from the documentation that the VIC plugin is not functional in VCH 0.6.0 release
https://vmware.github.io/vic/assets/files/html/vic_installation/plugin_verify_deployment.html

@mlh78750
Copy link
Contributor

mlh78750 commented Nov 7, 2016

From personality log:
time="2016-11-06T21:29:47Z" level=info msg="Creating image store"
time="2016-11-06T21:29:51Z" level=error msg="Failed to create image store"
time="2016-11-06T21:29:51Z" level=fatal msg="failed to initialize backend: [POST /storage][500] CreateImageStore default &{Code:0xc42036b248 Message:Put https://esx-01a.vmwarelab.local/folder/VCH01/VIC/421df1e6-9de7-d684-b8fb-384f2e2fc1c5/images/scratch/manifest?dsName=MGMT-LocalDisk1: dial tcp: lookup esx-01a.vmwarelab.local: no such host}"

cc: @jzt

@vmwarelab: can you try with the 0.7.0 release? We are going to root cause this for .6.0 but I'd like to see if you still have a problem with .7.0. If you do have the problem still in .7.0 will you please collect logs as well. .7.0 saw quite a few changes in the image storage code.

@vmwarelab vmwarelab added kind/defect Behavior that is inconsistent with what's intended area/appliance component/persona/docker area/storage Storage-related functionality source/customer Reported by a customer, directly or via an intermediary labels Nov 7, 2016
@vmwarelab
Copy link
Author

vmwarelab commented Nov 7, 2016

@mlh78750 this is bit different. here is the command i m using

  ./vic-machine-linux create --name VCH01 --target  'https://[email protected]:*****@192.168.0.20/VMWareLab DC' --compute-resource 'Management CL' --external-network dvPortGroup-Management-Network --bridge-network dvPortGroup-Bridge-VCH --image-store MGMT-LocalDisk1 --no-tls

but i m getting a different issue now

Failed to verify certificate for target=192.168.0.20 (thumbprint=AA:A2:7F:E1:2E:98:00:F6:24:6B:8B:73:E8:DC:EF:65:F2:D3:40:5A)
Create cannot continue: failed to create validator

@fdawg4l
Copy link
Contributor

fdawg4l commented Nov 7, 2016

@vmwarelab

I'm suspecting this isn't 0.7.0. What version of VIC did you download?

dial tcp: lookup esx-01a.vmwarelab.local: no such host

This also looks a bit suspect. When you log into the esx or VC UI, do you see any alerts related to this?

@vmwarelab
Copy link
Author

@fdawg4l 0.7.0 downloaded from here
https://bintray.com/vmware/vic/Download/v0.7.0

@fdawg4l
Copy link
Contributor

fdawg4l commented Nov 7, 2016

Yeah, that's definitely the right place.

@jzt
Copy link
Contributor

jzt commented Nov 7, 2016

@vmwarelab You can fix this by either specifying the thumbprint, like so: -thumbprint=E1:E1:61:DF:72:8F:2B:EB:6B:85:B4:13:62:9C:2B:DF:A6:CC:9C:5B, or by using --force as part of the vic-machine arguments.

@vmwarelab vmwarelab added this to the VIC GA Release milestone Nov 7, 2016
@mhagen-vmware
Copy link
Contributor

Pulling this into 0.8 for now, we need to further triage prior to release.

@vmwarelab
Copy link
Author

vmwarelab commented Nov 7, 2016

the 0.7.0 seems harder to install even with --no-tls , --force

i m using the following command now and it does deploy the VCH but the script never finishes properly. also i noticed it doesn't get to create the vic folder within the VCH folder created on the datastore like it used to when using VCH 0.6.0

  ./vic-machine-linux create --name VCH01 --target  'https://[email protected]:*****@192.168.0.20/VMWareLab DC' --compute-resource 'Management CL' --external-network dvPortGroup-Management-Network --bridge-network dvPortGroup-Bridge-VCH --image-store MGMT-LocalDisk1 --no-tls --force

and its timing out now throwing this error

ERRO[2016-11-07T14:35:37-05:00] Property collector error: context deadline exceeded
ERRO[2016-11-07T14:35:37-05:00] Unable to wait for extra config property guestinfo.vice..init.networks|client.assigned.IP: context deadline exceeded
INFO[2016-11-07T14:35:37-05:00] Unable to get vm config: context deadline exceeded
INFO[2016-11-07T14:35:37-05:00] Failed to retrieve IP for client interface
INFO[2016-11-07T14:35:37-05:00] State of all interfaces:
INFO[2016-11-07T14:35:37-05:00] "external" IP: "waiting for IP"
INFO[2016-11-07T14:35:37-05:00] "client" IP: "waiting for IP"
INFO[2016-11-07T14:35:37-05:00] "management" IP: "waiting for IP"
INFO[2016-11-07T14:35:37-05:00] "bridge" IP: "waiting for IP"
INFO[2016-11-07T14:35:37-05:00] State of components:
INFO[2016-11-07T14:35:37-05:00] "vicadmin": ""
INFO[2016-11-07T14:35:37-05:00] "docker-personality": ""
INFO[2016-11-07T14:35:37-05:00] "port-layer": ""
INFO[2016-11-07T14:35:37-05:00] Collecting C7FB0A62-9D41-4447-A619-5D298B91366C vpxd.log
ERRO[2016-11-07T14:35:37-05:00] Failed to collect C7FB0A62-9D41-4447-A619-5D298B91366C vpxd.log: context canceled
WARN[2016-11-07T14:35:37-05:00] No log data for C7FB0A62-9D41-4447-A619-5D298B91366C vpxd.log
ERRO[2016-11-07T14:35:37-05:00] --------------------
ERRO[2016-11-07T14:35:37-05:00] vic-machine-linux failed: Create timed out: use --timeout to add more time

@mlh78750
Copy link
Contributor

mlh78750 commented Nov 7, 2016

@vmwarelab For your original issue, are you deploying to a cluster of more than one host for the compute resource but to local storage on only one host?

For the .7.0 issue do you have DHCP available? If you don't you'll need to specify an IP. It looks like you're not getting any addresses. Is this a nested environment?

@vmwarelab
Copy link
Author

vmwarelab commented Nov 7, 2016

@mlh78750 i m deploying to a cluster with one physical host (not nested) with local storage Yes and DHCP is available as each of my deployment had an IP ..

i have changed how i specify the target by using

 'https://[email protected]:****@192.168.0.20/VMWareLab DC'

instead of

--target 'https://192.168.0.20/VMWareLab DC' --user [email protected] --password ****

but again its still times out with a different output

INFO[2016-11-07T15:49:12-05:00] Waiting for IP information
INFO[2016-11-07T15:49:38-05:00] Waiting for major appliance components to launch
INFO[2016-11-07T15:51:50-05:00] Collecting C7FB0A62-9D41-4447-A619-5D298B91366C vpxd.log
ERRO[2016-11-07T15:51:50-05:00] Failed to collect C7FB0A62-9D41-4447-A619-5D298B91366C vpxd.log: context deadline exceeded
WARN[2016-11-07T15:51:50-05:00] No log data for C7FB0A62-9D41-4447-A619-5D298B91366C vpxd.log
ERRO[2016-11-07T15:51:50-05:00] --------------------
ERRO[2016-11-07T15:51:50-05:00] vic-machine-linux failed: Create timed out: use --timeout to add more time

@mlh78750
Copy link
Contributor

mlh78750 commented Nov 8, 2016

@vmwarelab your PG dvPortGroup-Management-Network can reach the esx host as well? What is serving DHCP for you? it looks like your DNS settings from DHCP didn't make it into the appliance and I'm trying to figure out how that could happen.

@mlh78750
Copy link
Contributor

mlh78750 commented Nov 8, 2016

cc @hmahmood

@vmwarelab
Copy link
Author

vmwarelab commented Nov 8, 2016

@mlh78750 Yes.. of course its one flat network . my DHCP is my provider's router and it has no awareness of any DNS ? i dont think though thats the problem.. cause i had no issue deploying the VCH 0.6.0 release.

is there a way to specify the dns manually within the vic-machine-linux command ? to my knowledge you can no longer use the --dns-server parameter

@hmahmood
Copy link
Contributor

hmahmood commented Nov 8, 2016

@vmwarelab --dns-server should work if you specify a static ip for the external network (via --external-network-ip. I believe otherwise we get the DNS servers from DHCP. The change that does not request DNS servers from DHCP servers did not make it to 0.7.0, but should be present in the upcoming 0.8.0 release.

Can you give --dns-server a shot in any case?

@vmwarelab
Copy link
Author

@hmahmood it accepted the parameter --dns-server but still timed out. if i check the datastore , it doesnt get to creating the vic folder with all its contents . i only see a new folder called kvStores

@mlh78750
Copy link
Contributor

mlh78750 commented Nov 8, 2016

@vmwarelab can you run the install with --debug=1 and post the output please.

@mlh78750
Copy link
Contributor

mlh78750 commented Nov 8, 2016

@vmwarelab And my comment from an hour ago was trying to figure out what happened with .6.0. We can focus on getting your .7.0 setup.

@hmahmood
Copy link
Contributor

hmahmood commented Nov 8, 2016

@vmwarelab as far DNS behavior is concerned that is expected with 0.6.0 and 0.7.0. As long as you don't specify static ip for external network DNS servers will continue to get populated from DHCP, and --dns-server will have no effect. This will be fixed with 0.8.0.

@vmwarelab
Copy link
Author

vmwarelab commented Nov 8, 2016

@mlh78750 here is output at the end . there are tons of connection refused .. i just copied the end of the trail

DEBU[2016-11-08T13:32:10-05:00] Components not yet initialized, retrying
DEBU[2016-11-08T13:32:10-05:00] connection refused
DEBU[2016-11-08T13:32:11-05:00] Components not yet initialized, retrying
DEBU[2016-11-08T13:32:11-05:00] connection refused
time=2016-11-08T13:32:11.742321376-05:00 level=debug msg=[ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).CheckDockerAPI:678] [2m19.565902706s]
time=2016-11-08T13:32:11.742500723-05:00 level=debug msg=[BEGIN] [github.com/vmware/vic/lib/install/management.(*Dispatcher).CollectDiagnosticLogs:165]
INFO[2016-11-08T13:32:11-05:00] Collecting C7FB0A62-9D41-4447-A619-5D298B91366C vpxd.log
ERRO[2016-11-08T13:32:11-05:00] Failed to collect C7FB0A62-9D41-4447-A619-5D298B91366C vpxd.log: context deadline exceeded
WARN[2016-11-08T13:32:11-05:00] No log data for C7FB0A62-9D41-4447-A619-5D298B91366C vpxd.log
time=2016-11-08T13:32:11.744234572-05:00 level=debug msg=[ END ] [github.com/vmware/vic/lib/install/management.(*Dispatcher).CollectDiagnosticLogs:165] [1.734302ms]
ERRO[2016-11-08T13:32:11-05:00] --------------------
ERRO[2016-11-08T13:32:11-05:00] vic-machine-linux failed: Create timed out: use --timeout to add more time

@vmwarelab
Copy link
Author

vmwarelab commented Nov 8, 2016

Looks like we have a winner

./vic-machine-linux create --name VCH01 -t '[email protected]:*****@192.168.0.20' --compute-resource 'Management CL' --external-network dvPortGroup-Management-Network --bridge-network dvPortGroup-Bridge-VCH --image-store MGMT-LocalDisk1 --force --no-tlsverify --dns-server 192.168.0.10 --external-network-ip 192.168.0.22 --external-network-gateway 192.168.0.1/24 --debug=1

this command deployed the VCH successfully. now i can test the pulling and running containers so back to the original issue. also it seems that the VIC folder is created on the datastore with the manifest file which wasnt there in the 0.6.0 release. since it was looking for this file when trying to pull and run a container

@mlh78750
Copy link
Contributor

mlh78750 commented Nov 8, 2016

@vmwarelab Looking through your original issue on 0.6.0 it looks like the VCH appliance was somehow trying to connect to a FQDN called esx-01a.vmwarelab.local and could no resolve it. However your install command was only using an IP for the target. Hence the questions about your DHCP And DNS setup. If you target a FQDN the VCH appliance needs to be able to resolve that name as well to connect to the vSphere infrastructure to manage the containers, storage, etc.. It looks like on your original issue with 0.6.0 that somehow the host identifier was just a name and the VCH couldn't resolve that name. Does your VC use the names for the hosts and actually have DNS that resolves those names for the VC? Because the VCH needs to point to the same DNS server if you're setup like that.

And I suspect that is why the install on 0.7.0 worked with the specified DNS server in the command line. You said your DHCP was from your provider which I assume means that the DNS provided in the DHCP offer will also point to that provider.

@mlh78750
Copy link
Contributor

mlh78750 commented Nov 9, 2016

@vmwarelab Wanted you to be aware of the current networking limitations for the management interface on the VCH. It should work well if it is L2 adjacent to the vsphere endpoints, but if not, you will need to implement the work around in this issue: #3081.

@vmwarelab
Copy link
Author

@mlh78750 your absolutely right . that would be a great idea to provide better feedback so the users knows its a DNS related issue. .. thank you for your help on this. i still have to test the initial issue.

@vmwarelab
Copy link
Author

vmwarelab commented Nov 10, 2016

@mlh78750 so back to the original issue and testing running a simple container using the following command on the client machine against the vch host

 docker -H 192.168.0.22:2376 --tls run -d -p 80:80 vmwarecna/nginx

failed with with a different error using 0.7.0

docker: Error response from daemon: No volume store named (default) exists.

@jzt
Copy link
Contributor

jzt commented Nov 10, 2016

@vmwarelab Please see this doc: https://github.com/vmware/vic/blob/master/doc/user_doc/vic_app_dev/using_volumes_with_vic.md#create-a-volume-in-a-volume-store

This is likely caused by the vmwarecna/nginx image specifying a volume to use when creating a container.

If you create a default volume store when deploying your VCH, vmwarecna/nginx should be able to find and use it without error.

Instructions are here: https://github.com/vmware/vic/blob/master/doc/user_doc/vic_installation/vch_installer_options.md#--volume-store

Tagging @matthewavery @fdawg4l in case I'm incorrect. :)

@jzt
Copy link
Contributor

jzt commented Nov 10, 2016

Following up: I checked the vmwarecna/nginx metadata and was able to verify that it does use a volume:

    "Volumes": {
      "/var/cache/nginx": {}
    }

@vmwarelab
Copy link
Author

vmwarelab commented Nov 10, 2016

@jzt from the documentation you mentioned and thank you for that.

it mentioned that if you only require one volume store, you can set the volume store label to default. If you set the volume store label to default, container developers do not need to specify the --opt VolumeStore=volume_store_label option when they run docker volume create.

so i added the volume-store parameter and i set it to default and the running nginx container now works

i personally still don't get what is volume-store is for ? where its created ? and why would you need it or need multiple volumes

@fdawg4l
Copy link
Contributor

fdawg4l commented Nov 10, 2016

i personally still don't get what is volume-store is for ? where its created ? and why would you need it or need multiple volumes

Generally, enterprises have storage devices that present as different datastores in vsphere. Imagine the following datastores: [ReallyFastButNotBackedUp], [MarginallyFastAndBackedUpNightly], [ReallySlowButBackedUpHourly]. One can create volumes depending on their storage policy / backup policy for a given set of a containers. It's to allow the storage options to container admins and users which may be available in vsphere.

@mlh78750 mlh78750 removed the kind/defect Behavior that is inconsistent with what's intended label Nov 10, 2016
@mlh78750 mlh78750 modified the milestones: v0.9.0, VIC GA Release Nov 10, 2016
@mdubya66 mdubya66 removed this from the v0.9.0 milestone Jan 19, 2017
@mhagen-vmware
Copy link
Contributor

this appears to be generally resolved and has strayed quite far from the original issue. Please open a new issue if you are seeing any other problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/appliance area/storage Storage-related functionality component/persona/docker source/customer Reported by a customer, directly or via an intermediary
Projects
None yet
Development

No branches or pull requests

7 participants