Needs test: Context deadline during Pull causes an already exists error from govmomi on a subsequent pull #1409

matthewavery · 2016-07-07T19:37:42Z

During a pull busybox it looks like a context deadline occurred somewhere on the portlayer side during a write image causing an inconsistent state. On repeat of the pull the layer is found to already exists after trying a new right. On a subsequent create the busybox image is still not found and an attempt to pull it is made. The appropriate response should have been that the layer already exists on the second pull.

 % sudo ./vic-machine-linux create --target skinner.eng.vmware.com --user root --image-datastore Datastore --password ca\$hc0w --bridge-network vch-test --name mavery-vch-1 --compute-resource /Datacenter/host/Austin/Resources/mavery --volume-store=test-store:Datastore/testStore --generate-cert=false
INFO[2016-07-07T14:26:22-05:00] ### Installing VCH ####                      
WARN[2016-07-07T14:26:22-05:00] Configuring without TLS - to enable use -generate-cert or -key/-cert parameters 
INFO[2016-07-07T14:26:22-05:00] Validating supplied configuration            
INFO[2016-07-07T14:26:22-05:00] Firewall status: DISABLED on /Datacenter/host/Austin/patty.eng.vmware.com 
INFO[2016-07-07T14:26:22-05:00] Firewall configuration OK on hosts:          
INFO[2016-07-07T14:26:22-05:00]   /Datacenter/host/Austin/patty.eng.vmware.com 
INFO[2016-07-07T14:26:23-05:00] License check OK on hosts:                   
INFO[2016-07-07T14:26:23-05:00]   /Datacenter/host/Austin/patty.eng.vmware.com 
INFO[2016-07-07T14:26:23-05:00] DRS check OK on:                             
INFO[2016-07-07T14:26:23-05:00]   /Datacenter/host/Austin/Resources/mavery   
INFO[2016-07-07T14:26:27-05:00] Creating Resource Pool mavery-vch-1          
INFO[2016-07-07T14:26:27-05:00] Creating directory [Datastore] testStore     
INFO[2016-07-07T14:26:27-05:00] Datastore path is [Datastore] testStore      
INFO[2016-07-07T14:26:27-05:00] Creating appliance on target                 
INFO[2016-07-07T14:26:27-05:00] Network role client is sharing NIC with external 
INFO[2016-07-07T14:26:27-05:00] Network role management is sharing NIC with external 
INFO[2016-07-07T14:26:30-05:00] Uploading images for container               
INFO[2016-07-07T14:26:30-05:00]     bootstrap.iso                               
INFO[2016-07-07T14:26:30-05:00]     appliance.iso                               
INFO[2016-07-07T14:26:42-05:00] Registering VCH as a vSphere extension       
INFO[2016-07-07T14:26:51-05:00] Waiting for IP information                   
INFO[2016-07-07T14:28:51-05:00] Waiting for major appliance components to launch 
INFO[2016-07-07T14:28:51-05:00] Initialization of appliance successful       
INFO[2016-07-07T14:28:51-05:00]                                              
INFO[2016-07-07T14:28:51-05:00] SSH to appliance (default=root:password)     
INFO[2016-07-07T14:28:51-05:00] ssh [email protected]                        
INFO[2016-07-07T14:28:51-05:00]                                              
INFO[2016-07-07T14:28:51-05:00] Log server:                                  
INFO[2016-07-07T14:28:51-05:00] http://10.17.109.94:2378                     
INFO[2016-07-07T14:28:51-05:00]                                              
INFO[2016-07-07T14:28:51-05:00] DOCKER_HOST=10.17.109.94:2375                
INFO[2016-07-07T14:28:51-05:00] DOCKER_OPTS="-H 10.17.109.94:2375"           
INFO[2016-07-07T14:28:51-05:00]                                              
INFO[2016-07-07T14:28:51-05:00] Connect to docker:                           
INFO[2016-07-07T14:28:51-05:00] docker -H 10.17.109.94:2375 info             
INFO[2016-07-07T14:28:51-05:00] Installer completed successfully             
matthewavery@ubuntu ~/go/src/github.com/vmware/vic/bin
 % docker -H 10.17.109.94:2375 pull busybox
Using default tag: latest
latest: Pulling from library/busybox

a3ed95caeb02: Extracting [==================================================>]     32 B/32 B
8ddc19f16526: Pull complete 
Failed to write to image store: context deadline exceeded
matthewavery@ubuntu ~/go/src/github.com/vmware/vic/bin
 % docker -H 10.17.109.94:2375 pull busybox
Using default tag: latest
latest: Pulling from library/busybox

8ddc19f16526: Already exists 
a3ed95caeb02: Extracting [==================================================>]     32 B/32 B
Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default  &{Code:0xc82039f368 Message:Cannot complete the operation because the file or folder [Datastore] VIC/4223eb68-0def-b8f7-2f5c-4d5da6568e9d/images/b05baf071fd542c3146f08e5f20ad63e76fa4b4bd91f274c4838ddc41f3409f8/b05baf071fd542c3146f08e5f20ad63e76fa4b4bd91f274c4838ddc41f3409f8.vmdk already exists}
matthewavery@ubuntu ~/go/src/github.com/vmware/vic/bin
 % 
matthewavery@ubuntu ~/go/src/github.com/vmware/vic/bin
 % docker -H 10.17.109.94:2375 create -v /testVol busybox /bin/sh
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox

8ddc19f16526: Already exists 
a3ed95caeb02: Already exists 
Digest: sha256:65ce39ce3eb0997074a460adfb568d0b9f0f6a4392d97b6035630c9d7bf92402
Status: Image is up to date for library/busybox:latest
08ac166ac3736a481bc32627e465a8140ce126617edb6d43185b05b83b4a9c3a
matthewavery@ubuntu ~/go/src/github.com/vmware/vic/bin
 % docker -H 10.17.109.94:2375 create -v /testVol busybox /bin/sh
8baba67575c9c26d4b90eb83e61f465629a498214494092730a34354ab3f7a7d
matthewavery@ubuntu ~/go/src/github.com/vmware/vic/bin
 % docker -H 10.17.109.94:2375 pull busybox                      
Using default tag: latest
latest: Pulling from library/busybox

8ddc19f16526: Already exists 
a3ed95caeb02: Already exists 
Digest: sha256:65ce39ce3eb0997074a460adfb568d0b9f0f6a4392d97b6035630c9d7bf92402
Status: Image is up to date for library/busybox:latest
matthewavery@ubuntu ~/go/src/github.com/vmware/vic/bin

The text was updated successfully, but these errors were encountered:

mdubya66 · 2016-07-19T13:47:32Z

assigning to @hickeng for triage

hickeng · 2016-07-26T21:13:15Z

Assigning @jzt to look into the context timeout - pulls could take a very long time with a slow network connection; this is not an operation that we should timeout (unless docker has a timeout here, in which case it should be on the personality side). Putting in initial estimate for investigation and fix of timeout.

#933 is the issue that will address the already exist/re-pull behaviour.

jzt · 2016-09-14T22:04:55Z

There are currently no context timeouts from the persona to the portlayer, and imagec uses a 3600 second timeout for pulls, so I don't think that's it. There is one in the disk package (5 seconds) that may be where this came from:

vic/pkg/vsphere/disk/util.go

Line 40 in 9609f26

ctx, _ = context.WithTimeout(ctx, timeout)

This is exercised during the WriteImage path. If you are still seeing these timeouts, this is probably the culprit.

Tagging @fdawg4l in case he has something to add.

fdawg4l · 2016-09-14T22:06:56Z

Sounds reasonable. We can modify the timeout in the VCH; the time it takes to create the vmdk and attach it to the VCH is completely up to vsphere. I don't know what the upper bound of that operation is on a busy system. @hickeng?

hickeng · 2016-09-16T22:53:56Z

@fdawg4l Almost unbounded I think. We're generally moving to a cancellation based approach in the PL, and the caller (in this case the personality) can furnish timeouts if that's required semantics.

fdawg4l · 2016-09-16T23:37:56Z

Sounds good. We'll move to cancellation and nuke the timeout in the disk pkg.

As far as this issue goes, we need 2 things
1 - We need to prevent more than 1 image being pulled at the same time
2 - We need to cleanup when a pull fails.

2 is done. We merged this a little over a month ago. We just need 1 now and to remove the timeout. That is not to say we aren't going to see a timeout somewhere else in the stack- does the docker client have a timeout? Do the handlers in the docker daemon have a default timeout per handler?

We can solve the things mentioned, but largely close the issue since subsquent pulls already work. Timeouts / cancellation in general need their own epics.

fdawg4l · 2016-09-19T23:07:22Z

We can solve the things mentioned, but largely close the issue since subsquent pulls already work. Timeouts / cancellation in general need their own epics.

#2385

mdubya66 · 2016-10-10T18:22:58Z

Workaround: Issue the command again and it should work.

matthewavery · 2016-11-01T22:10:51Z

This may involve a fair amount of code change, and since there is a workaround this may not be a big issue for 0.8.0. It also occurs rarely, so it might be best if we table this one for now. @mdubya66 @hickeng

fdawg4l · 2016-11-02T16:54:03Z

@matthewavery, @jzt's recent imagec changes prevent multiple pulls of the same layer id from happening. It's likely this is now fixed.

stuclem · 2016-11-07T18:01:18Z

Already in the release notes, so removing the kind/note tag.

stuclem · 2016-11-07T18:01:32Z

Already in the release notes, so removing the kind/note tag.

mhagen-vmware · 2017-02-01T16:47:02Z

Marking this a low priority est 2 issue now, with the assumption this is fixed already and now making this a test item to make sure we have a test that covers this scenario.

hickeng · 2017-04-13T13:20:04Z

@jzt can you confirm that this is fixed?

corrieb · 2017-07-19T19:27:51Z

@jzt ^^^ let's close if this is fixed

jzt · 2017-07-19T19:30:22Z

@corrieb @mhagen-vmware I'm down to close. I haven't seen/heard of this happening for quite a long time. @mhagen-vmware do you agree?

hickeng · 2017-07-19T20:40:52Z

#5372 is another instance of this - as per #5372 (comment) we should be deleting incomplete layers so they can be re-pulled.
We shouldn't close this without a test that confirms correctness.

mdubya66 · 2017-10-31T18:19:07Z

This looks to be resolved in 5372. Moving from bug to test issue. A test needs to be written to cover this situation.

matthewavery added kind/defect Behavior that is inconsistent with what's intended kind/investigation A scoped effort to learn the answers to a set of questions which may include prototyping labels Jul 7, 2016

mdubya66 assigned hickeng Jul 19, 2016

mdubya66 added this to the VIC GA Release milestone Jul 22, 2016

hickeng added component/imagec component/portlayer/storage labels Jul 26, 2016

hickeng assigned jzt and unassigned hickeng Jul 26, 2016

mhagen-vmware added the impact/doc/note Requires creation of or changes to an official release note label Aug 8, 2016

mdubya66 closed this as completed Oct 10, 2016

mdubya66 reopened this Oct 10, 2016

jzt mentioned this issue Oct 10, 2016

Imagec library enabling concurrent downloads #2587

Merged

mdubya66 modified the milestones: VIC GA Release, v1.0 Nov 2, 2016

matthewavery closed this as completed Nov 2, 2016

matthewavery reopened this Nov 2, 2016

matthewavery removed the impact/doc/note Requires creation of or changes to an official release note label Nov 7, 2016

mhagen-vmware unassigned jzt Feb 1, 2017

mhagen-vmware added priority/p4 component/test Tests not covered by a more specific component label labels Feb 1, 2017

sflxn added the team/container label Sep 26, 2017

mdubya66 added priority/p2 and removed priority/p4 kind/defect Behavior that is inconsistent with what's intended kind/investigation A scoped effort to learn the answers to a set of questions which may include prototyping labels Oct 31, 2017

zjs added severity/3-moderate Medium usability or functional impact. Potentially has an inconvenient workaround. kind/debt Problems that increase the cost of other work and removed priority/p2 team/container labels Mar 6, 2019

zjs changed the title ~~Context deadline during Pull causes an already exists error from govmomi on a subsequent pull~~ Needs test: Context deadline during Pull causes an already exists error from govmomi on a subsequent pull Mar 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Needs test: Context deadline during Pull causes an already exists error from govmomi on a subsequent pull #1409

Needs test: Context deadline during Pull causes an already exists error from govmomi on a subsequent pull #1409

matthewavery commented Jul 7, 2016 •

edited

Loading

mdubya66 commented Jul 19, 2016

hickeng commented Jul 26, 2016

jzt commented Sep 14, 2016

fdawg4l commented Sep 14, 2016

hickeng commented Sep 16, 2016

fdawg4l commented Sep 16, 2016

fdawg4l commented Sep 19, 2016

mdubya66 commented Oct 10, 2016

matthewavery commented Nov 1, 2016

fdawg4l commented Nov 2, 2016 •

edited

Loading

stuclem commented Nov 7, 2016

stuclem commented Nov 7, 2016

mhagen-vmware commented Feb 1, 2017

hickeng commented Apr 13, 2017

corrieb commented Jul 19, 2017

jzt commented Jul 19, 2017

hickeng commented Jul 19, 2017

mdubya66 commented Oct 31, 2017

Needs test: Context deadline during Pull causes an already exists error from govmomi on a subsequent pull #1409

Needs test: Context deadline during Pull causes an already exists error from govmomi on a subsequent pull #1409

Comments

matthewavery commented Jul 7, 2016 • edited Loading

mdubya66 commented Jul 19, 2016

hickeng commented Jul 26, 2016

jzt commented Sep 14, 2016

fdawg4l commented Sep 14, 2016

hickeng commented Sep 16, 2016

fdawg4l commented Sep 16, 2016

fdawg4l commented Sep 19, 2016

mdubya66 commented Oct 10, 2016

matthewavery commented Nov 1, 2016

fdawg4l commented Nov 2, 2016 • edited Loading

stuclem commented Nov 7, 2016

stuclem commented Nov 7, 2016

mhagen-vmware commented Feb 1, 2017

hickeng commented Apr 13, 2017

corrieb commented Jul 19, 2017

jzt commented Jul 19, 2017

hickeng commented Jul 19, 2017

mdubya66 commented Oct 31, 2017

matthewavery commented Jul 7, 2016 •

edited

Loading

fdawg4l commented Nov 2, 2016 •

edited

Loading