Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Windows container support #745

Open
BlueAndi opened this issue Aug 14, 2019 · 44 comments
Open

[Feature request] Windows container support #745

BlueAndi opened this issue Aug 14, 2019 · 44 comments
Labels
feature a PR providing a new feature.

Comments

@BlueAndi
Copy link

Would be great to have windows container support as well.

@bverkron
Copy link

What exactly is / isn't supported on the Windows side for this plugin?

We currently have ephemeral Jenkins build agent on the Linux side via this plugin and need to set one up for Windows as well.

@pjdarton pjdarton added the feature a PR providing a new feature. label Oct 1, 2019
@pjdarton
Copy link
Member

pjdarton commented Oct 1, 2019

That's a good question, and I don't know the answer as I don't use Windows containers myself (and nobody who does has set out exactly what the issues are).

FYI, internally, the plugin doesn't care what OS you're using.
Internally, it's all Java (as is Jenkins as a whole) and it's talking to the docker daemon(s) via a Java library (docker-java) so if Microsoft's implementation of docker is a compliant implementation of docker (rather than something that is not docker but that Microsoft call docker, which can happen when corporations believe there's "no standard they can't improve on" 😠 ) then it should "just work" ...
... but I presume there must be at least one reason why it doesn't "just work" otherwise folks wouldn't be raising this kind of issue.

If anyone's willing to investigate and implement this (see CONTRIBUTING guidelines) then I'd be happy to review the code and, ultimately, merge things in.

@bverkron
Copy link

bverkron commented Oct 1, 2019

Sounds like my assumption that it should work (but might have some quirks / problems) was correct.

We're attempting to use docker containers in our declarative pipelines but it's falling down in an odd place on the Windows slaves whereas it worked on the Linux side with the identical setup.

It's not clear to me if this is caused by the way the plugin behaves with Windows slaves or if it's some different behavior in the Windows implementation of Docker.

Pipeline script
Same syntax as working Linux jobs except for image and label tags.

pipeline {
    agent {
        docker {
            image 'iis'
            label 'dockerEnabledWindows'
        }
    }
    stages {
        stage('Example Build') {
            steps {
                sh 'hostname'
            }
        }
    }
}

Build Console Log

Running on Windows in C:/jenkins/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled Windows slave
[Pipeline] {
[Pipeline] withEnv
[Pipeline] {
[Pipeline] withDockerRegistry
Using the existing docker config file.Removing blacklisted property: auths$ docker login -u ****** -p ******** https://index.docker.io/v1/
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Error response from daemon: Get https://registry-1.docker.io/v2/: unauthorized: incorrect username or password
[Pipeline] // withDockerRegistry
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
ERROR: docker login failed
Finished: FAILURE

Below is the first error we got so I setup an account with Docker Hub and ran the docker login command to cache the login, hoping that would work. But it just produced a variation of the save error (above).

$ docker login -u ******* -p ******** https://index.docker.io/v1/
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Error response from daemon: Get https://registry-1.docker.io/v2/: unauthorized: incorrect username or password

The docker login is the odd piece to me as we're not making any explicit attempts to connect to a registry and definitely not a private one thus IMO there should be not need for the login command nor credentials. On the Linux side it happily connects to Docker Hub without the login command as far as I can tell and we've never had to do anything with credentials.

Trying to determine where the problem might lie so I know whether to dig down the Jenkins Docker plugin path or the Windows Docker configuration path

@pjdarton
Copy link
Member

pjdarton commented Oct 1, 2019

Ah, withDockerRegistry suggests that you're using the docker-workflow-plugin not the docker-plugin.
Different plugin, different way of working, different GitHub repo ... and not a plugin I know much about, aside from everyone confusing it for this one.

@bverkron
Copy link

bverkron commented Oct 1, 2019

What would cause Jenkins to use the docker-workflow-plugin instead of docker-plugin like the other job? They are configured the same way (literally just the declarative pipeline script above) and just pointing at a Linux vs Windows node / image

This is what the Linux node job runs...

[Pipeline] {
[Pipeline] sh
+ docker inspect -f . maven
.
[Pipeline] withDockerContainer
Linux does not seem to be running inside a container
$ docker run -t -d -u 164263:164263 -w "/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2" -v "/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2:/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2:rw,z" -v "/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2@tmp:/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2@tmp:rw,z" -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** maven cat
$ docker top 489b5b3d2dc7327c787cfdf8044a945b9735737c690c3162abdb45e051138e71 -eo pid,comm
[Pipeline] {
[Pipeline] stage
[Pipeline] { (Example Build)
[Pipeline] sh
...

Edit: Re-running the linux job now returns the same docker login error as the Windows job. AFAIK nothing in the Jenkins master config has changed. Where / how do we specify which plugin to use? Uninstall of the docker-workflow-plugin (i.e. Docker Pipeline) doesn't seem to be an option as it's inactive under pluginManager/installed and I can't interact with it.

@BlueAndi
Copy link
Author

BlueAndi commented Oct 1, 2019

My test setup was a pc with windows 10, docker desktop ce installed and the docker daemon running, as well a registry on it. It provides several agents, based on windows docker images. The jenkins master runs on a different pc.

I configured a "windows" cloud in the jenkins master (provided by the docker plugin) with a test agent.

Calling this test agent now in a job, will result that a corresponding docker container is created, but then the access to it fails.

Tomorrow I can provide the error logs and more information about the configuration.

@bverkron
Copy link

bverkron commented Oct 1, 2019

Edit: Re-running the linux job now returns the same docker login error as the Windows job. AFAIK nothing in the Jenkins master config has changed. Where / how do we specify which plugin to use?

Found the solution to the docker login problem. Under Manage Jenkins > Configure System > Pipeline Model Definition a value had been selected under Registry credentials. Since the other two fields were blank it was simply trying to force a login for the public Docker Hub with the selected credentials and being a global settings it was effecting all docker related jobs.

This also indicates that the pipeline scripts were not actually using the docker-workflow-plugin (aka Docker Pipeline) plugin as previously suggested.

Now the problem becomes the docker-plugin seemingly trying to treat the Windows host as a Linux host and trying to execute the nohup command.

Running on Windows in C:/jenkins/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave Windows
[Pipeline] {
[Pipeline] sh
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
...
Caused: java.io.IOException: Cannot run program "nohup" (in directory "C:\jenkins\workspace\Ephemeral Build Agent PoC\Ephemeral Build Agent PoC v6 Docker enabled slave Windows"): CreateProcess error=2, The system cannot find the file specified

Which could be a configuration thing but also seems that docker-plugin is trying to use the nohup command when doing the docker steps which of course isn't available on Windows by default.

I think this can be solved via a process like this (i.e. installing & configuring git-bash or similar) but is this the expected / correct setup for docker on Windows slave hosts?
https://stackoverflow.com/a/45151156

Will poke around but any guidance would be appreciated.

@pjdarton
Copy link
Member

pjdarton commented Oct 1, 2019

Disclaimer: on mobile, from home, going from memory and not looking stuff up as it's my bed time...

Take a look at the "advanced" connection properties, e.g. jnlp or direct or SSH. In there you may find the ability to override default "start slave" commands. The online help may even tell you what the defaults are.
That'll be in manage Jenkins -> configure system -> scroll down to "clouds" and look in the templates you've defined ... if you are using this plug-in to provide your executors and not the docker-workflow-plugin, that is ;-)

@bverkron
Copy link

bverkron commented Oct 2, 2019

I will take a look. Away from the office ATM so will be tomorrow. In the meantime...

How do we differentiate between which plugin is being used by the commands? I believe the pipeline script is being used based on experimentation, syntax, and discussion in other threads for docker-plugin, but how to I confirm?

We don't have any clouds defined as, right now, we're dealing with specifics Jenkins slaves w/Docker installed so we can define the images in the declarative pipeline script itself (and thus in SCM) and/or Dockerfile files rather than the Jenkins UI. In our case I think the relevant connection settings would be under the slave itself under the nodes section of Jenkins master.

@BlueAndi
Copy link
Author

BlueAndi commented Oct 2, 2019

Using this plugin and not the docker-workflow plugin ;-), I get the following result:

Asked to provision 1 slave(s) for: win-agent
Oct 02, 2019 8:45:02 AM INFO io.jenkins.docker.client.DockerAPI$1 entryDroppedFromCache
Dropped connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@1fe310cf to DockerClientParameters{dockerUri=tcp://lp13007:2375, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
Oct 02, 2019 8:45:02 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Provisioning 'localhost:5000/win-agent' number 1 (of 1) on 'windows cloud agents'; Total containers: 0 (of 4)
Oct 02, 2019 8:45:02 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Will provision 'localhost:5000/win-agent', for label: 'win-agent', in cloud: 'windows cloud agents'
Oct 02, 2019 8:45:02 AM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning Image of localhost:5000/win-agent from windows cloud agents with 1 executors. Remaining excess workload: 0
Oct 02, 2019 8:45:02 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Pulling image 'localhost:5000/win-agent:latest'. This may take awhile...
Oct 02, 2019 8:45:02 AM INFO io.jenkins.docker.client.DockerAPI getOrMakeClient
Cached connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@5ea08dda to DockerClientParameters{dockerUri=tcp://lp13007:2375, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Finished pulling image 'localhost:5000/win-agent:latest', took 994 ms
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for localhost:5000/win-agent
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for node win-agent-0001r6amcl1rs from image: localhost:5000/win-agent
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Started container ID 6e1c47d50b63636f580761fbe2b367ba53c902a07ad648eba7edd5c88c826d04 for node win-agent-0001r6amcl1rs from image: localhost:5000/win-agent
Oct 02, 2019 8:45:03 AM SEVERE com.github.dockerjava.core.async.ResultCallbackTemplate onError
Error during callback
com.github.dockerjava.api.exception.NotFoundException: {"message":"Could not find the file /root in container 6e1c47d50b63636f580761fbe2b367ba53c902a07ad648eba7edd5c88c826d04"}

	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103)
	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)

Oct 02, 2019 8:45:03 AM SEVERE com.nirima.jenkins.plugins.docker.DockerCloud$1 run
Error in provisioning; template='DockerTemplate{configVersion=2, labelString='win-agent', connector=io.jenkins.docker.connector.DockerComputerSSHConnector@5f068bde, remoteFs='C:\Users\jenkins', instanceCap=1, mode=NORMAL, retentionStrategy=com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy@4bcce0a5, dockerTemplateBase=DockerTemplateBase{image='localhost:5000/win-agent', pullCredentialsId='', registry=DockerRegistryEndpoint[null;credentialsId=null], dockerCommand='', hostname='', dnsHosts=[], network='', volumes=[], volumesFrom2=[], environment=[], bindPorts='', bindAllPorts=false, memoryLimit=null, memorySwap=null, cpuShares=null, shmSize=null, privileged=false, tty=false, macAddress='null', extraHosts=[]}, removeVolumes=false, pullStrategy=PULL_ALWAYS, nodeProperties=[], disabled=BySystem,0 ms,4 min 59 sec,Template provisioning failed.}' for cloud='windows cloud agents'
com.github.dockerjava.api.exception.NotFoundException: {"message":"Could not find the file /root in container 6e1c47d50b63636f580761fbe2b367ba53c902a07ad648eba7edd5c88c826d04"}

	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103)
	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)

According to the message Could not find the file /root in container it looks like the assumption is still a linux container.

@pjdarton
Copy link
Member

pjdarton commented Oct 2, 2019

What would cause Jenkins to use the docker-workflow-plugin instead of docker-plugin

It all comes down to what pipeline syntax you use.
Different plugins provide different functionality with different words.
The docker-plugin provides very little pipeline functionality, just the dockerNode keyword.
The docker-workflow-plugin is what most people are using when they're doing pipelines with docker as that's what's documented in the Jenkins documentation - that's the plugin that provides pipeline keywords like withDockerRegistry or docker.image etc.
One other indication is that if your logs mention any docker command-line stuff then that's the docker-workflow-plugin - the docker-plugin doesn't use/need a docker command-line client as it uses a Java docker client to talk to docker daemons.

If you've defined some clouds and templates in Manage Jenkins -> Configure System -> docker clouds and then defined a pipeline to use a slave whose label matches one of the templates you've defined then your builds would be running on docker containers that are created by the docker-plugin. FYI that's the docker-plugin's primary use case.

...and, yes, I know it's confusing, which is why the README, ISSUE_TEMPLATE and CONTRIBUTING docs in this plugin all mention this issue and tell folks to be sure of what they're using so they can report things in the right place, because it confuses the hell out of everyone and makes it all too easy to make mistakes.

docker-plugin is trying to use the nohup command

FYI the docker-plugin knows nothing of the nohup command; the word "nohup" is not in its code (it's not in the docker-workflow-plugin's code either).
However, nohup what the Jenkins durable-task-plugin's step will add when it's not on a Darwin (mac) OS (for Windows, it assumes you're using Cygwin and will have nohup).

I'd suggest that, when doing Windows pipelines, you use something like the bat pipeline command.
Or, alternatively, use the echo pipeline command as that's platform agnostic, or run a groovy command to list all the environment variables etc.

Could not find the file /root in container

Aha! Yes, now we're getting somewhere 😁
These logs did come from the docker-plugin.
I've checked the plugin's code and, sure enough, /root/ is in the code - if your template is defined to use the SSH connector and you're using an injected key then it'll start the container with the command /usr/sbin/sshd -D -p <port> -o AuthorizedKeysCommand=/root/authorized_key -o AuthorizedKeysCommandUser=root (see DockerComputerSSHConnector.java line 180) ... and would also try to run a /bin/sh script in the container to inject the key when the container starts too, so that's not going to work on Windows.
However, even if you don't inject a key, the SSH connector would still tell the container to start with the command /usr/sbin/sshd -D -p <port>, and that's unlikely to work on Windows either.

I think you need to get your Windows containers to use the JNLP or Attach connection method instead. It looks like the SSH method has a lot of hard-coded unixy stuff in it at present.
...and, to be frank, I wouldn't recommend relying on SSH on Windows - an SSH environment in Windows is tricky to make work and be secure; you usually end up having to choose between "secure" and "lets you do everything your build needs to do".

It looks like the Attach method runs the following command on the container: java -jar <remoteFs>/slave.jar -noReconnect -noKeepAlive -slaveLog <remoteFs>/agent.log
So, as long as you ensure that java in on the %PATH% and that you're setting the template's remoteFs correctly and that slave.jar is already present there then it's likely to "just work".

The JNLP method provides more customisation capabilities (hidden in its "advanced" bits) so you can specify exactly what command the container should run, so if Attach doesn't work then JNLP can be forced to work.

@bverkron
Copy link

bverkron commented Oct 2, 2019

The docker-plugin provides very little pipeline functionality, just the dockerNode keyword.

This is the critical differentiator and sadly this wasn't mentioned anywhere in the extensive reading I've done on these plugins.

...and, yes, I know it's confusing, which is why the README, ISSUE_TEMPLATE and CONTRIBUTING docs in this plugin all mention this issue and tell folks to be sure of what they're using so they can report things in the right place, because it confuses the hell out of everyone and makes it all too easy to make mistakes.

I have read those and do apologize for letting this thread slip from a clarification request for the original FR into a Q&A. I did try the mailing list and even Reddit and it's mostly dead air out there for these kinds of questions. Zero responses elsewhere, sadly. Additionally even in those links there isn't anything that clearly states how to tell the difference at the top of the layer of the syntax (so to speak). We are using just docker {} and dockerfile {} syntax without directly specifying any of the underlying calls like with withDockerRegistry so until that started bubbling up in some of these Windows jobs it was not clear to us that the docker-workflow (aka pipeline) plugin was actually being used and there seemed to be evidence to the contrary. Furthermore, even in deep conversations in this thread despite detailed examples of what we were trying to do (which I would have thought would have made it obvious which plugin was actually being used) nothing was said about the difference between docker-plugin and docker-workflow-plugin in our context.

Since we want to "codify" everything directly in the declarative pipeline scripts and/or Dockerfile files (via SCM) it seems we're restricted to using the docker pipeline plugin. It's unclear to me if this docker-plugin may be able to suite our needs via dockerNode in declarative pipeline if/when PR 681 is eventually released. Hopefully one day things will merged / deprecated / documented as necessary to make this all more clear.

I'd suggest that, when doing Windows pipelines, you use something like the bat pipeline command. Or, alternatively, use the echo pipeline command as that's platform agnostic, or run a groovy command to list all the environment variables etc.

Attempted to use both echo and bat but the nohup failure is occurring before those lines are even reached. It's failing at the first step of creating the container as far as I can tell from the log. I will investigate the cygwin approach and try to pursue this elsewhere in the context of the docker pipeline plugin.

Greatly appreciate you taking the time to respond here and apologies again for derailing this thread.

@pjdarton
Copy link
Member

pjdarton commented Oct 2, 2019

sadly this wasn't mentioned anywhere

Yes; sadly a lack of docs is not an uncommon issue with free software - the folks who write code to make it do things don't need documentation telling them what they did.
Where I've enhanced the docker-plugin, I've tried to also enhance the help text that's built into the UI, but that doesn't affect the official documentation (which is mostly telling folks to use stuff that's provided by the docker-workflow-plugin ... which is why I believed that this plugin would be superseded by it until the discussion in #681 said otherwise).

if/when PR 681 is eventually released

FYI you can try out PR 681 right now - or any PR - go to the "checks" bit at the bottom, click "show checks" and follow the link to the build (the pr-merge bit) "details" to take you to the Jenkins ci server that built it, and then to the "Artifacts" from the build - there you'll find a .hpi file you can download and install (via manage jenkins -> manage plugins -> advanced -> upload).
To be honest, I could really do with a 2nd-opinion on that PR as it's totally outside my knowhow, so please give it a test and let me know if it's worth having.

nohup

I'm not sure where it's coming from, as github.com can't find "nohup" in docker-plugin or docker-workflow-plugin. If you can figure out what parameters are being used when the container is created then you'll be able to see if it's something coming from Jenkins or something built into the container image itself (maybe docker inspect can help here too).

taking the time to respond

FYI I'd like to have things working on Windows too; at present, where I work, all our Windows-based stuff is on VMs (which take an age to boot up) and docker containers are lighter-weight and more efficient, which would mean I get more builds done on existing hardware.
If I can un-stick you, maybe I'll learn how to do the same myself...
i.e. it's not all altruism - I want it too ;-)

@bverkron
Copy link

bverkron commented Oct 2, 2019

Yes; sadly a lack of docs is not an uncommon issue with free software - the folks who write code to make it do things don't need documentation telling them what they did.

Indeed. I totally get the struggle. Developers time is precious especially when, as often is the case, it's done 'side of the desk' to a real job or other commitments. The efforts are greatly appreciated and valuable to so many people. Documentation falls to the back burner 90% of the time and I've seen full blown commercial, enterprise (and expensive) software with worse documentation than open-source projects maintained by a single person. Docs is often the trade off for free software.

if/when PR 681 is eventually released

FYI you can try out PR 681 right now

I would love to do this but unsure if I will be able to near-term. I have some other things like this Windows build agent I need to squash first. Granted the docker-plugin might help solve that or work around it but it seems like there are some underlying issues here that need sorting first.

nohup

I'm not sure where it's coming from

The Jenkins job build log doesn't give any visibility and the main Jenkins master log has no entries related to this job. Is there additional logging that can be enabled that's relevant to this?

Running on Windows in C:/jenkins/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave Windows
[Pipeline] {
[Pipeline] sh
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
...
Caused: java.io.IOException: Cannot run program "nohup" 

I ran another test pipeline job that that uses the same Windows node without the docker {} syntax and it ran fine. The docker based job is still trying to run an sh command at the start of the job despite sh not being used anywhere in the pipeline script explicitly. Hence my assumption it's related to the docker plugin (though I guess docker pipeline plugin in this case). That's the only difference in the pipeline scripts. Perhaps as you said earlier sh and by proxy nohup are being called in durable-task-plugin as you said earlier?

Successful non-docker

pipeline {
    agent {
        label 'dockerEnabledWindowsSlave'
    }
    stages {
        stage('Example Build') {
            steps {
                echo 'test'
            }
        }
    }
}

Unsuccessful docker job

pipeline {
    agent {
        docker {
            image 'iis'
            label 'dockerEnabledWindowsSlave'
        }
    }
    stages {
        stage('Example Build') {
            steps {
                echo 'test'
            }
        }
    }
}

i.e. it's not all altruism - I want it too ;-)

Maybe we can figure something out together and even update the doc :D

At this point I may try to get nohup working via this method mentioned earlier to see if we can at least get to the next step and see what it's trying to do unless you have another suggestion.

Edit: I successfully setup git-bash tools via this suggestion and it resolved the sh/nohup issue. Now I am running into the "invalid volume specification" error, which I see you've discussed here and is clearly related to the docker pipeline plugin not this one (as we've already established).
#666. Looking here it seems like this error might be a dead end for docker pipeline on Windows. I'll keep digging elsewhere

Suggestion. Maybe included something at the bottom of the readme.md for docker-plugin that mentions that dockerNode and Jenkins UI are specific to this plugin and absence of those also means it's probably the docker-workflow-plugin that's being used?

@bverkron
Copy link

bverkron commented Oct 2, 2019

Looking here it seems like this error might be a dead end for docker pipeline on Windows.

Assuming this is true we may need to migrate to this docker-plugin at least for Windows stuff if not everything. That being said can docker-plugin support declarative pipelines AND Dockerfile files referenced in the declarative pipeline script? It doesn't seem like it based on the documentation and my experimenting thus far.

@BlueAndi
Copy link
Author

BlueAndi commented Oct 4, 2019

I think you need to get your Windows containers to use the JNLP or Attach connection method instead. It looks like the SSH method has a lot of hard-coded unixy stuff in it at present.
...and, to be frank, I wouldn't recommend relying on SSH on Windows - an SSH environment in Windows is tricky to make work and be secure; you usually end up having to choose between "secure" and "lets you do everything your build needs to do".

You are right, its configured with ssh key injection. In the container openSSH is already running and works. ;-) I can connect via ssh it. Therefore the idea was to use the same mechanism, as for the linux containers.

But I will try the JNLP approach, as the windows containers are running on a different machine than the jenkins master.

@Heneman
Copy link

Heneman commented Nov 7, 2019

I've got our entire build setup working on Windows based images. Your configuration is incorrect, this should not be an issue for this plugin.

@BlueAndi
Copy link
Author

BlueAndi commented Nov 7, 2019

With ssh key injection configured?

@pjdarton
Copy link
Member

Update: I recently had to go delving in this area in order to fix the SSH unit-tests and so I took a good long look at the code.
In the process of trying to figure out why the SSH-connector unit tests had stopped working, I coded up a connector that avoids specifying /bin/sshd as the CMD; it just passes the SSH-key as the sole argument (i.e. exactly as the standard Jenkins SSH-slave image wants).

Disclaimer: This code is not finished. It's not as polished as it could be; at the very least, it'll need improvements to the online help to explain the difference between the connection methods (as it's obvious that this plugin needs better docs in this area!). It's also not tested - the only testing I've done is run the unit-tests, and only on linux (I have no Windows docker resource at present).
However, the new SSH connection method (if it works at all) might well work for Windows docker folks where the others do not; it might be worth your while trying it out.

You can find this code in PR #763 and you can find a .hpi file here - that .hpi file is build from the master branch (i.e. latest bleeding-edge code, aka release 1.1.9 right now) plus that PR's changes.
If these changes are well received then it'd be worthwhile improving them to the point where they're fit for merge...

@BlueAndi
Copy link
Author

BlueAndi commented Dec 5, 2019

@pjdarton This sounds good. I hope I can try it today and give some feedback.

@BlueAndi
Copy link
Author

BlueAndi commented Dec 5, 2019

Same error happended:

Provisioning 'lp13007:5000/docker-ssh-slave:win-1903' number 1 (of 1) on 'windows cloud agents'; Total containers: 0 (of 4)
Dec 05, 2019 11:03:32 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Will provision 'lp13007:5000/docker-ssh-slave:win-1903', for label: 'win-agent', in cloud: 'windows cloud agents'
Dec 05, 2019 11:03:32 AM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning Image of lp13007:5000/docker-ssh-slave:win-1903 from windows cloud agents with 1 executors. Remaining excess workload: 0
Dec 05, 2019 11:03:32 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Pulling image 'lp13007:5000/docker-ssh-slave:win-1903'. This may take awhile...
Dec 05, 2019 11:03:32 AM INFO io.jenkins.docker.client.DockerAPI getOrMakeClient
Cached connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@1c14058f to DockerClientParameters{dockerUri=tcp://lp13007:2375, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Finished pulling image 'lp13007:5000/docker-ssh-slave:win-1903', took 1058 ms
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for lp13007:5000/docker-ssh-slave:win-1903
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for node win-agent-0002nvuz6oiow from image: lp13007:5000/docker-ssh-slave:win-1903
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Started container ID 96ef9a8f7f43d232a75c52e4ad350ce66d457da9ddb93798cef62afc6fc290a8 for node win-agent-0002nvuz6oiow from image: lp13007:5000/docker-ssh-slave:win-1903
Dec 05, 2019 11:03:33 AM SEVERE com.github.dockerjava.core.async.ResultCallbackTemplate onError
Error during callback
com.github.dockerjava.api.exception.NotFoundException: {"message":"Could not find the file /root in container 96ef9a8f7f43d232a75c52e4ad350ce66d457da9ddb93798cef62afc6fc290a8"}

=> "Could not find the file /root in container ...

@pjdarton
Copy link
Member

pjdarton commented Dec 5, 2019

The only place /root happens in the plugin is when it's using the InjectSSHKey connection method, which (with this PR's code installed) shows up in the WebUI configuration page as "Inject SSH key using SSH AuthorizedKeysCommand option" (previously, this option was simply called "Inject SSH key").

You need to switch to the new InjectSSHKeyAsContainerArgument connection method which will show up in the WebUI configuration page as "Inject SSH key as 1st container argument".

@BlueAndi
Copy link
Author

BlueAndi commented Dec 5, 2019

Argh ... I missed to change to InjectSSHKeyAsContainerArgument. I will try again and come back.

@BlueAndi
Copy link
Author

BlueAndi commented Dec 6, 2019

Result looks better now. The container itself is started and it looks like (according to the logs) that a SSH connection was established (SSH port is open on lp13007:55137).

The jenkins pipeline script shall now just call a powershell 'dir' command and this doesn't happen.

According to the logs, another agent is requested and etc.

Logs:

Started container ID 5d61972f90fc76d696fad74efb9866eeaf1e598143878c348f506f3d9d597196 for node win-agent-0003fn8t8zrkc from image: lp13007:5000/docker-ssh-slave:win-1903
Dec 06, 2019 8:49:03 AM INFO com.nirima.jenkins.plugins.docker.utils.PortUtils$ConnectionCheckSSH executeOnce
SSH port is open on lp13007:55147
Dec 06, 2019 8:49:04 AM INFO hudson.slaves.NodeProvisioner lambda$update$6
Image of lp13007:5000/docker-ssh-slave:win-1903 provisioning successfully completed. We have now 3 computer(s)
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'

This repeats after container watchdog is triggered.

@pjdarton
Copy link
Member

pjdarton commented Dec 6, 2019

Hmm.
Those logs show that the container started and its SSH port opened.
They don't show much more than that 😟 ... but at least they're not showing an exception 😁

The fact that the docker plugin is still being asked to provision a node 20 seconds later implies that the slave failed to come online (i.e. the container exists, but Jenkins wasn't able to connect to it and run the Jenkins slave.jar code on it), which would imply that the SSH connection process didn't complete ... but that wouldn't show up here,

I think that the next place to look would be the log for the slave node itself.

i.e. you should see the docker slave node appearing in Jenkins' list of executors/slaves and that WebUI page has a "log" page on it - check what that's reporting as that's where any SSHConnector issues will be shown.
(For example, when I was debugging why the plugin's ssh-connector unit-tests were failing, I eventually found the "we can't find where java is on this container" error in the slave's log page)

@BlueAndi
Copy link
Author

BlueAndi commented Dec 9, 2019

To check the agents log was a good hint:

SSHLauncher{host='lp13007', port=55267, credentialsId='InstanceIdentity', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=30, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/09/19 11:51:48] [SSH] Opening SSH connection to lp13007:55267.
[12/09/19 11:51:49] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
ERROR: Server rejected the 1 private key(s) for root (credentialId:InstanceIdentity/method:publickey)
[12/09/19 11:51:49] [SSH] Authentication failed.
Authentication failed.
[12/09/19 11:51:49] Launch failed - cleaning up connection
[12/09/19 11:51:49] [SSH] Connection closed.

The node itselfs uses a image (windows container), based on https://github.com/jenkinsci/docker-ssh-slave

@pjdarton
Copy link
Member

OK, so the WARNING: SSH Host Keys are not being verified is a good sign...
...but the ERROR: Server rejected the 1 private key(s) for root is not.

However, the cause is revealed right there - you're trying to login to a Windows container as "root".
I believe that the username should be jenkins (for both the Windows and Linux docker-ssh-slave images).
i.e. it must not be root for a Windows image 😁

Once you've sorted that out, if it still isn't working then the next step of the investigation is to use docker inspect on the container (which you'll have to do soon after it's created otherwise it'll be cleaned up).
What we're looking for there is indication of the argument(s) provided to the container (by the docker plugin's new InjectSSHKeyAsContainerArgument method.
If that looks good then that implies that the fault is in the container ... if it doesn't look right then the fault would be in my code.

FYI what I'm expecting is that you'll tell the docker-plugin code that it needs to log in as "jenkins" (and probably have to also tell it the home directory is C:/Users/jenkins too), the plugin will tell the container the public key it should accept (which will show up in docker inspect), and then the SSH connection code should try to connect as user "jenkins" using the private key matching the public key the container was told to accept (which will show up in the slave's log) ... and it should work...

@BlueAndi
Copy link
Author

The "Remote File System Root" configuration parameter is set to "C:\Users\jenkins". The user inside the container should be jenkins, thats true. But I can not configure the user with InjectSSHKeyAs1stParameter, only with a different configuration. This may be the problem.

@pjdarton
Copy link
Member

Ah... yes, that would be a problem - it definitely needs to be jenkins for the public Windows docker image (unless you've built one yourself and passed in USER=root during the image build process).
I probably missed out a config.jelly file allowing this to be configured (I've never fully understood the Jenkins jelly/binding process so some "trial and error" always seems necessary...)

Before I go rummaging in the code ... can you confirm that you're able to configure the username for the original key-injection method:
image
...but you can't for the new one (meaning that it stays with the default value of root).
If that's correct then I'm pretty sure I know what I'll need to do to fix this... I just don't have time right this minute...

In the meantime: If you are able to manually edit your Jenkins server's config.xml file then you need not wait for me to get that done - you can use a text editor to fiddle with the cloud configuration even through the Jenkins WebUI is missing that field.
Use the Jenkins WebUI to configure Jenkins to use the new connection method (so that'll tell it to use the default root username), save that (which will save the data to the file config.xml in your Jenkins home directory.
Edit the config.xml file and look for the <connector class="io.jenkins.docker.connector.DockerComputerSSHConnector"> section.
Find the <sshKeyStrategy class="io.jenkins.docker.connector.DockerComputerSSHConnector$InjectSSHKeyAsContainerArgument"> element - that should have a <user>root</user> element (or might not have that element at all).
Change "root" to "jenkins":

<sshKeyStrategy class="io.jenkins.docker.connector.DockerComputerSSHConnector$InjectSSHKeyAsContainerArgument">
  <user>jenkins</user>
</sshKeyStrategy>

Save the file.
Tell Jenkins to "Reload configuration from disk".

That should let you manually do what the WebUI doesn't currently let you do, and should let you get further with testing while you wait for an updated plugin (which might take a while as I'm busy on other things at present - I can spare the time to type out advice here, but coding will take longer...)

@BlueAndi
Copy link
Author

BlueAndi commented Dec 12, 2019

There are three options to choose:

  • Inject SSH key as 1st container argument
  • Inject SSH key using SSH AuthorizedKeysCommand option
  • Use configured SSH credentials

Choosing the 2nd one, its possible to enter the username. I guess its the old "Inject SSH key" option, but you changed the text, isn't it?

I will follow your suggestion, changing the username in the xml file and come back with feedback.

BTW to not forget it, thanks for your support, its very much appreciated! We keep it similar to pair programming, you advice and I test. ;-)

@BlueAndi
Copy link
Author

A short look into the config.xml:

      <name>windows cloud agents</name>
      <templates>
        <com.nirima.jenkins.plugins.docker.DockerTemplate>
          <configVersion>2</configVersion>
          <labelString>win-agent</labelString>
          <connector class="io.jenkins.docker.connector.DockerComputerSSHConnector">
            <sshKeyStrategy class="io.jenkins.docker.connector.DockerComputerSSHConnector$InjectSSHKeyAsContainerArgument"/>
            <port>22</port>
            <maxNumRetries>30</maxNumRetries>
            <retryWaitTime>2</retryWaitTime>
          </connector>
          <remoteFs>C:\Users\jenkins</remoteFs>
          <instanceCap>1</instanceCap>
          <mode>NORMAL</mode>

shows, the user tag is missing. I will try to put it in manually and see what happens.

@BlueAndi
Copy link
Author

BlueAndi commented Dec 12, 2019

I was too fast :-) ... it looks better now. Here is the log from the agent:

SSHLauncher{host='lp13007', port=65431, credentialsId='InstanceIdentity', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=30, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/12/19 16:05:11] [SSH] Opening SSH connection to lp13007:65431.
[12/12/19 16:05:11] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
[12/12/19 16:05:11] [SSH] Authentication successful.
[12/12/19 16:05:13] [SSH] The remote user's environment is:
Set-Variable : Cannot process command because of one or more missing mandatory parameters: Name.
At line:1 char:1
+ set
+ ~~~
    + CategoryInfo          : InvalidArgument: (:) [Set-Variable], ParameterBindingException
    + FullyQualifiedErrorId : MissingMandatoryParameter,Microsoft.PowerShell.Commands.SetVariable 
   Command
 
[12/12/19 16:05:16] [SSH] Checking java version of C:\Users\jenkins/jdk/bin/java
Couldn't figure out the Java version of C:\Users\jenkins/jdk/bin/java
C:\Users\jenkins/jdk/bin/java : The term 'C:\Users\jenkins/jdk/bin/java' is not recognized as the 
name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or 
if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ C:\Users\jenkins/jdk/bin/java  -version
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\Users\jenkins/jdk/bin/java:String) [], CommandN 
   otFoundException
    + FullyQualifiedErrorId : CommandNotFoundException
 

[12/12/19 16:05:18] [SSH] Checking java version of java
[12/12/19 16:05:20] [SSH] java -version returned 1.8.0_222.
[12/12/19 16:05:20] [SSH] Starting sftp client.
[12/12/19 16:05:22] [SSH] Copying latest remoting.jar...
[12/12/19 16:05:22] [SSH] Copied 872,440 bytes.
Expanded the channel window size to 4MB
[12/12/19 16:05:22] [SSH] Starting agent process: cd "C:\Users\jenkins" && java  -jar remoting.jar -workDir C:\Users\jenkins -jar-cache C:\Users\jenkins/remoting/jarCache
At line:1 char:23
+ cd "C:\Users\jenkins" && java  -jar remoting.jar -workDir C:\Users\je ...
+                       ~~
The token '&&' is not a valid statement separator in this version.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : InvalidEndOfLine
 
Slave JVM has terminated. Exit code=1
[12/12/19 16:05:24] Launch failed - cleaning up connection
[12/12/19 16:05:24] [SSH] Connection closed.

@pjdarton
Copy link
Member

Oooh! That looks like the docker plugin part is working! 😁

...but then the SSH-connection bit is not 😞
I believe that the command that's failing comes from the SSHLauncher itself, not the docker plugin: SSHLauncher.startSlave has code saying:

        String cmd = "cd \"" + workingDirectory + "\" && " + java + " " + getJvmOptions() + " -jar slave.jar";

        //This will wrap the cmd with prefix commands and suffix commands if they are set.
        cmd = getPrefixStartSlaveCmd() + cmd + getSuffixStartSlaveCmd();

        listener.getLogger().println(Messages.SSHLauncher_StartingSlaveProcess(getTimestamp(), cmd));
        session.execCommand(cmd);

...and, as you can see, the `" && " bit is hard-coded, so if the Windows docker-ssh-slave image can't cope with that then our options may be somewhat limited...

You might find that experimenting with the SSH-connector's "advanced" options gives you parameters you can edit that might allow you to workaround this problem ... but I have my doubts 😢

To do that, it'd be best if you manually ran one of these containers and told Jenkins to add a slave node that it should connect to via SSH - this way, you can experiment with different SSH connection options until you find some that work (otherwise you'll have to hand-edit the config.xml file over and over again).
FYI IME debugging docker-container connection issues is usually done best by first removing the docker-plugin from the mix (by starting the container manually and adding the slave to Jenkins via the Jenkins WebUI), fiddling around until it works, and then providing the working config to the docker plugin.

However, first I'd recommend you go googling and see if anyone anywhere has managed to get the Jenkins SSHLauncher working with the Windows docker ssh container image. It might be that there are bugs in the SSHLauncher that are fixed in later (possibly unreleased) versions and/or forks of the plugin.
It may also be well worth pinging the maintainer(s) of the ssh-slaves-plugin and/or the maintainer(s) of the Windows docker-ssh-slave image to establish a dialog there.

@pjdarton
Copy link
Member

...FYI the error text The token '&&' is not a valid statement separator in this version. makes me wonder if there might be a version where it is a valid statement separator. If you can find (or create) one where that's the case...

@BlueAndi
Copy link
Author

There is a PR for the powershell to support && and etc.
PowerShell/PowerShell#9849
Seems to be merged to powershell 7 preview 5. This may solve the problem later on.

A similar discussion is on the jenkins JIRA: https://issues.jenkins-ci.org/browse/JENKINS-42856?focusedCommentId=355486&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-355486

I will search a little bit more and talk to the maintainers. Posting the results here again.

@slide How did you manage the SSHLauncher part (see above)?

@BlueAndi
Copy link
Author

BlueAndi commented Dec 13, 2019

In the linked JIRA issue comment, Mark Waite showed the workaround.
Configure

  • "Prefix Start Slave Command": powershell -Command "cd C:\Users\jenkins ; java -jar remoting.jar -workDir C:\Users\jenkins -jar-cache C:\Users\jenkins/remoting/jarCache" ; exit 0 ; rem '
  • "Suffix Start Slave Command": '

The trick is the remark keyword at the end. With this the SSHlauncher command is framed and only noted as comment. ;-)

And voila:

SSHLauncher{host='lp13007', port=58981, credentialsId='InstanceIdentity', jvmOptions='', javaPath='', prefixStartSlaveCmd='powershell -Command "cd C:\Users\jenkins ; java -jar remoting.jar -workDir C:\Users\jenkins -jar-cache C:\Users\jenkins/remoting/jarCache" ; exit 0 ; rem '', suffixStartSlaveCmd=''', launchTimeoutSeconds=210, maxNumRetries=30, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/13/19 12:08:00] [SSH] Opening SSH connection to lp13007:58981.
[12/13/19 12:08:00] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
[12/13/19 12:08:00] [SSH] Authentication successful.
[12/13/19 12:08:02] [SSH] The remote user's environment is:
Set-Variable : Cannot process command because of one or more missing mandatory parameters: Name.
At line:1 char:1
+ set
+ ~~~
    + CategoryInfo          : InvalidArgument: (:) [Set-Variable], ParameterBindingException
    + FullyQualifiedErrorId : MissingMandatoryParameter,Microsoft.PowerShell.Commands.SetVariable 
   Command
 
[12/13/19 12:08:05] [SSH] Checking java version of C:\Users\jenkins/jdk/bin/java
Couldn't figure out the Java version of C:\Users\jenkins/jdk/bin/java
C:\Users\jenkins/jdk/bin/java : The term 'C:\Users\jenkins/jdk/bin/java' is not recognized as the 
name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or 
if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ C:\Users\jenkins/jdk/bin/java  -version
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\Users\jenkins/jdk/bin/java:String) [], CommandN 
   otFoundException
    + FullyQualifiedErrorId : CommandNotFoundException
 

[12/13/19 12:08:07] [SSH] Checking java version of java
[12/13/19 12:08:09] [SSH] java -version returned 1.8.0_222.
[12/13/19 12:08:09] [SSH] Starting sftp client.
[12/13/19 12:08:11] [SSH] Copying latest remoting.jar...
[12/13/19 12:08:11] [SSH] Copied 872,440 bytes.
Expanded the channel window size to 4MB
[12/13/19 12:08:11] [SSH] Starting agent process: powershell -Command "cd C:\Users\jenkins ; java -jar remoting.jar -workDir C:\Users\jenkins -jar-cache C:\Users\jenkins/remoting/jarCache" ; exit 0 ; rem 'cd "C:\Users\jenkins" && java  -jar remoting.jar -workDir C:\Users\jenkins -jar-cache C:\Users\jenkins/remoting/jarCache'
Dec 13, 2019 5:08:16 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using C:\Users\jenkins\remoting as a remoting work directory
Dec 13, 2019 5:08:16 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to C:\Users\jenkins\remoting
<===[JENKINS REMOTING CAPACITY]===>channel started
Remoting version: 3.33
This is a Windows agent
Agent successfully connected and online

@slide
Copy link
Member

slide commented Dec 13, 2019

I launched sshd directly in the powershell script for the docker ssh agent repo, similar to what was being done on the Linux side.

@pjdarton
Copy link
Member

FYI I've pushed another commit that (if I've got things right) should add the missing username field.
Disclaimer: I've not tested it myself.

@BlueAndi
Copy link
Author

Looks good:
image

And the config.xml is updated accordingly.

@BlueAndi
Copy link
Author

@pjdarton Happy christmas Peter and thanks for your support!

@pjdarton
Copy link
Member

You're welcome. I hope to make use of this myself where I work (we need Windows and, right now, only have those through VMs, which aren't as efficient as docker containers), so it's of mutual interest to get this all working ;-)

What I'd suggest is that, next year, ping me again on this (or on the PR) and we can figure out where to go from here.
e.g. the code I wrote isn't exactly top-notch stuff and could do with some renames ... and naming things is something where having a 2nd opinion is really useful 😉

@pvshewale
Copy link

Any further updates on this? Is it available in latest release?

@pjdarton
Copy link
Member

No, it isn't; progress is currently stalled.
However, what might be useful is the ability to override/redefine the entire command string sent to the docker container for the "attach" connection method.
At present, the SSH method has unix-specific stuff hard-coded (see above) which requires nasty workarounds (see above) ... but since that discussion took place, there have been enhancements (#790) made to the "attach" connection method that [I think] could be used to provide a windows-specific command for a windows container.

...however, none of that's been released (or tested enough that I'd be confortable releasing) yet and I've got other (high priority) work stuff right now that means I can't work on this at present.

@pvshewale
Copy link

Thanks @pjdarton for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a PR providing a new feature.
Projects
None yet
Development

No branches or pull requests

6 participants