-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Windows container support #745
Comments
What exactly is / isn't supported on the Windows side for this plugin? We currently have ephemeral Jenkins build agent on the Linux side via this plugin and need to set one up for Windows as well. |
That's a good question, and I don't know the answer as I don't use Windows containers myself (and nobody who does has set out exactly what the issues are). FYI, internally, the plugin doesn't care what OS you're using. If anyone's willing to investigate and implement this (see CONTRIBUTING guidelines) then I'd be happy to review the code and, ultimately, merge things in. |
Sounds like my assumption that it should work (but might have some quirks / problems) was correct. We're attempting to use docker containers in our declarative pipelines but it's falling down in an odd place on the Windows slaves whereas it worked on the Linux side with the identical setup. It's not clear to me if this is caused by the way the plugin behaves with Windows slaves or if it's some different behavior in the Windows implementation of Docker. Pipeline script
Build Console Log
Below is the first error we got so I setup an account with Docker Hub and ran the docker login command to cache the login, hoping that would work. But it just produced a variation of the save error (above).
The docker login is the odd piece to me as we're not making any explicit attempts to connect to a registry and definitely not a private one thus IMO there should be not need for the login command nor credentials. On the Linux side it happily connects to Docker Hub without the login command as far as I can tell and we've never had to do anything with credentials. Trying to determine where the problem might lie so I know whether to dig down the Jenkins Docker plugin path or the Windows Docker configuration path |
Ah, |
What would cause Jenkins to use the docker-workflow-plugin instead of docker-plugin like the other job? They are configured the same way (literally just the declarative pipeline script above) and just pointing at a Linux vs Windows node / image This is what the Linux node job runs...
Edit: Re-running the linux job now returns the same |
My test setup was a pc with windows 10, docker desktop ce installed and the docker daemon running, as well a registry on it. It provides several agents, based on windows docker images. The jenkins master runs on a different pc. I configured a "windows" cloud in the jenkins master (provided by the docker plugin) with a test agent. Calling this test agent now in a job, will result that a corresponding docker container is created, but then the access to it fails. Tomorrow I can provide the error logs and more information about the configuration. |
Found the solution to the This also indicates that the pipeline scripts were not actually using the docker-workflow-plugin (aka Docker Pipeline) plugin as previously suggested. Now the problem becomes the docker-plugin seemingly trying to treat the Windows host as a Linux host and trying to execute the
Which could be a configuration thing but also seems that docker-plugin is trying to use the nohup command when doing the docker steps which of course isn't available on Windows by default. I think this can be solved via a process like this (i.e. installing & configuring git-bash or similar) but is this the expected / correct setup for docker on Windows slave hosts? Will poke around but any guidance would be appreciated. |
Disclaimer: on mobile, from home, going from memory and not looking stuff up as it's my bed time... Take a look at the "advanced" connection properties, e.g. jnlp or direct or SSH. In there you may find the ability to override default "start slave" commands. The online help may even tell you what the defaults are. |
I will take a look. Away from the office ATM so will be tomorrow. In the meantime... How do we differentiate between which plugin is being used by the commands? I believe the pipeline script is being used based on experimentation, syntax, and discussion in other threads for docker-plugin, but how to I confirm? We don't have any clouds defined as, right now, we're dealing with specifics Jenkins slaves w/Docker installed so we can define the images in the declarative pipeline script itself (and thus in SCM) and/or Dockerfile files rather than the Jenkins UI. In our case I think the relevant connection settings would be under the slave itself under the nodes section of Jenkins master. |
Using this plugin and not the docker-workflow plugin ;-), I get the following result:
According to the message |
It all comes down to what pipeline syntax you use. If you've defined some clouds and templates in Manage Jenkins -> Configure System -> docker clouds and then defined a pipeline to use a slave whose label matches one of the templates you've defined then your builds would be running on docker containers that are created by the ...and, yes, I know it's confusing, which is why the README, ISSUE_TEMPLATE and CONTRIBUTING docs in this plugin all mention this issue and tell folks to be sure of what they're using so they can report things in the right place, because it confuses the hell out of everyone and makes it all too easy to make mistakes.
FYI the docker-plugin knows nothing of the nohup command; the word "nohup" is not in its code (it's not in the docker-workflow-plugin's code either). I'd suggest that, when doing Windows pipelines, you use something like the
Aha! Yes, now we're getting somewhere 😁 I think you need to get your Windows containers to use the JNLP or Attach connection method instead. It looks like the SSH method has a lot of hard-coded unixy stuff in it at present. It looks like the Attach method runs the following command on the container: The JNLP method provides more customisation capabilities (hidden in its "advanced" bits) so you can specify exactly what command the container should run, so if Attach doesn't work then JNLP can be forced to work. |
This is the critical differentiator and sadly this wasn't mentioned anywhere in the extensive reading I've done on these plugins.
I have read those and do apologize for letting this thread slip from a clarification request for the original FR into a Q&A. I did try the mailing list and even Reddit and it's mostly dead air out there for these kinds of questions. Zero responses elsewhere, sadly. Additionally even in those links there isn't anything that clearly states how to tell the difference at the top of the layer of the syntax (so to speak). We are using just Since we want to "codify" everything directly in the declarative pipeline scripts and/or Dockerfile files (via SCM) it seems we're restricted to using the docker pipeline plugin. It's unclear to me if this
Attempted to use both Greatly appreciate you taking the time to respond here and apologies again for derailing this thread. |
Yes; sadly a lack of docs is not an uncommon issue with free software - the folks who write code to make it do things don't need documentation telling them what they did.
FYI you can try out PR 681 right now - or any PR - go to the "checks" bit at the bottom, click "show checks" and follow the link to the build (the pr-merge bit) "details" to take you to the Jenkins ci server that built it, and then to the "Artifacts" from the build - there you'll find a .hpi file you can download and install (via manage jenkins -> manage plugins -> advanced -> upload).
I'm not sure where it's coming from, as github.com can't find "nohup" in docker-plugin or docker-workflow-plugin. If you can figure out what parameters are being used when the container is created then you'll be able to see if it's something coming from Jenkins or something built into the container image itself (maybe docker inspect can help here too).
FYI I'd like to have things working on Windows too; at present, where I work, all our Windows-based stuff is on VMs (which take an age to boot up) and docker containers are lighter-weight and more efficient, which would mean I get more builds done on existing hardware. |
Indeed. I totally get the struggle. Developers time is precious especially when, as often is the case, it's done 'side of the desk' to a real job or other commitments. The efforts are greatly appreciated and valuable to so many people. Documentation falls to the back burner 90% of the time and I've seen full blown commercial, enterprise (and expensive) software with worse documentation than open-source projects maintained by a single person. Docs is often the trade off for free software.
I would love to do this but unsure if I will be able to near-term. I have some other things like this Windows build agent I need to squash first. Granted the docker-plugin might help solve that or work around it but it seems like there are some underlying issues here that need sorting first.
The Jenkins job build log doesn't give any visibility and the main Jenkins master log has no entries related to this job. Is there additional logging that can be enabled that's relevant to this?
I ran another test pipeline job that that uses the same Windows node without the Successful non-docker
Unsuccessful docker job
Maybe we can figure something out together and even update the doc :D At this point I may try to get nohup working via this method mentioned earlier to see if we can at least get to the next step and see what it's trying to do unless you have another suggestion. Edit: I successfully setup git-bash tools via this suggestion and it resolved the sh/nohup issue. Now I am running into the "invalid volume specification" error, which I see you've discussed here and is clearly related to the docker pipeline plugin not this one (as we've already established). Suggestion. Maybe included something at the bottom of the readme.md for |
Assuming this is true we may need to migrate to this |
You are right, its configured with ssh key injection. In the container openSSH is already running and works. ;-) I can connect via ssh it. Therefore the idea was to use the same mechanism, as for the linux containers. But I will try the JNLP approach, as the windows containers are running on a different machine than the jenkins master. |
I've got our entire build setup working on Windows based images. Your configuration is incorrect, this should not be an issue for this plugin. |
With ssh key injection configured? |
Update: I recently had to go delving in this area in order to fix the SSH unit-tests and so I took a good long look at the code. Disclaimer: This code is not finished. It's not as polished as it could be; at the very least, it'll need improvements to the online help to explain the difference between the connection methods (as it's obvious that this plugin needs better docs in this area!). It's also not tested - the only testing I've done is run the unit-tests, and only on linux (I have no Windows docker resource at present). You can find this code in PR #763 and you can find a |
@pjdarton This sounds good. I hope I can try it today and give some feedback. |
Same error happended:
=> "Could not find the file /root in container ... |
The only place You need to switch to the new |
Argh ... I missed to change to InjectSSHKeyAsContainerArgument. I will try again and come back. |
Result looks better now. The container itself is started and it looks like (according to the logs) that a SSH connection was established (SSH port is open on lp13007:55137). The jenkins pipeline script shall now just call a powershell 'dir' command and this doesn't happen. According to the logs, another agent is requested and etc. Logs:
This repeats after container watchdog is triggered. |
Hmm. The fact that the docker plugin is still being asked to provision a node 20 seconds later implies that the slave failed to come online (i.e. the container exists, but Jenkins wasn't able to connect to it and run the Jenkins slave.jar code on it), which would imply that the SSH connection process didn't complete ... but that wouldn't show up here, I think that the next place to look would be the log for the slave node itself. i.e. you should see the docker slave node appearing in Jenkins' list of executors/slaves and that WebUI page has a "log" page on it - check what that's reporting as that's where any SSHConnector issues will be shown. |
To check the agents log was a good hint:
The node itselfs uses a image (windows container), based on https://github.com/jenkinsci/docker-ssh-slave |
OK, so the However, the cause is revealed right there - you're trying to login to a Windows container as "root". Once you've sorted that out, if it still isn't working then the next step of the investigation is to use FYI what I'm expecting is that you'll tell the docker-plugin code that it needs to log in as "jenkins" (and probably have to also tell it the home directory is |
The "Remote File System Root" configuration parameter is set to "C:\Users\jenkins". The user inside the container should be jenkins, thats true. But I can not configure the user with InjectSSHKeyAs1stParameter, only with a different configuration. This may be the problem. |
There are three options to choose:
Choosing the 2nd one, its possible to enter the username. I guess its the old "Inject SSH key" option, but you changed the text, isn't it? I will follow your suggestion, changing the username in the xml file and come back with feedback. BTW to not forget it, thanks for your support, its very much appreciated! We keep it similar to pair programming, you advice and I test. ;-) |
A short look into the config.xml: <name>windows cloud agents</name>
<templates>
<com.nirima.jenkins.plugins.docker.DockerTemplate>
<configVersion>2</configVersion>
<labelString>win-agent</labelString>
<connector class="io.jenkins.docker.connector.DockerComputerSSHConnector">
<sshKeyStrategy class="io.jenkins.docker.connector.DockerComputerSSHConnector$InjectSSHKeyAsContainerArgument"/>
<port>22</port>
<maxNumRetries>30</maxNumRetries>
<retryWaitTime>2</retryWaitTime>
</connector>
<remoteFs>C:\Users\jenkins</remoteFs>
<instanceCap>1</instanceCap>
<mode>NORMAL</mode> shows, the user tag is missing. I will try to put it in manually and see what happens. |
I was too fast :-) ... it looks better now. Here is the log from the agent:
|
Oooh! That looks like the docker plugin part is working! 😁 ...but then the SSH-connection bit is not 😞
...and, as you can see, the `" && " bit is hard-coded, so if the Windows docker-ssh-slave image can't cope with that then our options may be somewhat limited... You might find that experimenting with the SSH-connector's "advanced" options gives you parameters you can edit that might allow you to workaround this problem ... but I have my doubts 😢 To do that, it'd be best if you manually ran one of these containers and told Jenkins to add a slave node that it should connect to via SSH - this way, you can experiment with different SSH connection options until you find some that work (otherwise you'll have to hand-edit the config.xml file over and over again). However, first I'd recommend you go googling and see if anyone anywhere has managed to get the Jenkins SSHLauncher working with the Windows docker ssh container image. It might be that there are bugs in the SSHLauncher that are fixed in later (possibly unreleased) versions and/or forks of the plugin. |
...FYI the error text |
There is a PR for the powershell to support A similar discussion is on the jenkins JIRA: https://issues.jenkins-ci.org/browse/JENKINS-42856?focusedCommentId=355486&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-355486 I will search a little bit more and talk to the maintainers. Posting the results here again. @slide How did you manage the SSHLauncher part (see above)? |
In the linked JIRA issue comment, Mark Waite showed the workaround.
The trick is the remark keyword at the end. With this the SSHlauncher command is framed and only noted as comment. ;-) And voila:
|
I launched sshd directly in the powershell script for the docker ssh agent repo, similar to what was being done on the Linux side. |
FYI I've pushed another commit that (if I've got things right) should add the missing username field. |
@pjdarton Happy christmas Peter and thanks for your support! |
You're welcome. I hope to make use of this myself where I work (we need Windows and, right now, only have those through VMs, which aren't as efficient as docker containers), so it's of mutual interest to get this all working ;-) What I'd suggest is that, next year, ping me again on this (or on the PR) and we can figure out where to go from here. |
Any further updates on this? Is it available in latest release? |
No, it isn't; progress is currently stalled. ...however, none of that's been released (or tested enough that I'd be confortable releasing) yet and I've got other (high priority) work stuff right now that means I can't work on this at present. |
Thanks @pjdarton for the update. |
Would be great to have windows container support as well.
The text was updated successfully, but these errors were encountered: