Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis CI errors with EMACS_BINARY=emacs-git-snapshot-travis #2120

Closed
gonewest818 opened this issue Dec 4, 2017 · 18 comments
Closed

Travis CI errors with EMACS_BINARY=emacs-git-snapshot-travis #2120

gonewest818 opened this issue Dec 4, 2017 · 18 comments
Labels

Comments

@gonewest818
Copy link
Contributor

gonewest818 commented Dec 4, 2017

Expected behavior

Travis tests should succeed

Actual behavior

Travis tests are failing in the "before_script".

Steps to reproduce the problem

See a recent build, for example
https://travis-ci.org/clojure-emacs/cider/builds/307426738

I noticed this issue when submitting #2111. Copying some of that discussion into a new issue to keep the other ticket clean.

Analysis

With respect to the TLS error in Travis... I think the difference is emacs versions <=25.2 fall back to s_client when gnutls-cli fails, but emacs 26 does not (see discussion in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=766397 and commit
emacs-mirror/emacs@6e45de6)

Even in the failed Travis jobs, there's a point where evm successfully bootstraps itself using emacs-24.3-travis: https://travis-ci.org/clojure-emacs/cider/jobs/307715504#L485-L492
where you see it's the s_client attempt that actually succeeds.

The build failure occurs later, as cask is installing dependencies for emacs-git-snapshot-travis:
https://travis-ci.org/clojure-emacs/cider/jobs/307715504#L879-L885
and you can see there is no s_client fallback attempt.

I poked around a little bit and discovered elpa.gnu.org and melpa.org are using certs from Let's Encrypt. However the Travis CI workers happen to be running Ubuntu 14.04, and I haven't checked, but it seems likely they do not have the CA certs required to complete the chain of trust for those Lets Encrypt certs. If so, then the solution would be to do whatever incantations are necessary in the .travis.yml to make that happen.

Links

For posterity here are some references.

https://bugs.launchpad.net/ubuntu/+source/gnutls26/+bug/1373422
https://tools.ietf.org/html/rfc5246
https://www.ssllabs.com/ssltest/analyze.html?d=elpa.gnu.org
https://cryptoreport.websecurity.symantec.com/checker/

@bbatsov bbatsov added the bug label Dec 5, 2017
@bbatsov
Copy link
Member

bbatsov commented Dec 5, 2017

Thanks for looking into this and opening the ticket. Working against the broken CI is not fun.

@gonewest818
Copy link
Contributor Author

I'm taking some time to look at this (if that's okay?)

I'm starting by pulling the docker image they use for travis builds (they've published the images) with the intent to reproduce the issue locally. If so then the debugging will go much faster, compared to branching pushing .travis.yml over and over.

@bbatsov
Copy link
Member

bbatsov commented Dec 10, 2017

It's more than OK - it's much appreciated!

@gonewest818
Copy link
Contributor Author

gonewest818 commented Dec 10, 2017

I’m not 100% certain but the problem “feels” intermittent, in particular item 2 below

  1. gnutls-cli and ca-certificate packages are missing from that docker image. That’s easy to fix

  2. Jobs are randomly failing but if restarted they eventually succeed. As if there are multiple Melpa servers behind a load balancer but one of them is misconfigured

  3. Something about the TLS cert for elpa.gnu.org causes the gnutls handshake to fail every time.

@gonewest818
Copy link
Contributor Author

gonewest818 commented Dec 10, 2017

The cert for elpa.gnu.org has extra certs: https://www.ssllabs.com/ssltest/analyze.html?d=elpa.gnu.org

Certificates provided | 3 (3732 bytes)
Chain issues | Incorrect order, Extra certs

where certificate[0] and certificate[1] are identical.

# gnutls-cli elpa.gnu.org
Resolving 'elpa.gnu.org'...
Connecting to '208.118.235.89:443'...
- Ephemeral Diffie-Hellman parameters
 - Using prime: 2048 bits
 - Secret key: 2046 bits
 - Peer's public key: 2046 bits
- Certificate type: X.509
 - Got a certificate list of 3 certificates.
 - Certificate[0] info:
  - subject `CN=elpa.gnu.org', issuer `C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3', RSA key 2048 bits, signed using RSA-SHA256, activated `2017-12-02 10:00:36 UTC', expires `2018-03-02 10:00:36 UTC', SHA-1 fingerprint `dd8020dc5cdb1d4e9c331f9044ec57a0928e7b97'
 - Certificate[1] info:
  - subject `CN=elpa.gnu.org', issuer `C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3', RSA key 2048 bits, signed using RSA-SHA256, activated `2017-12-02 10:00:36 UTC', expires `2018-03-02 10:00:36 UTC', SHA-1 fingerprint `dd8020dc5cdb1d4e9c331f9044ec57a0928e7b97'
 - Certificate[2] info:
  - subject `C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3', issuer `O=Digital Signature Trust Co.,CN=DST Root CA X3', RSA key 2048 bits, signed using RSA-SHA256, activated `2016-03-17 16:40:46 UTC', expires `2021-03-17 16:40:46 UTC', SHA-1 fingerprint `e6a3b45b062d509b3382282d196efe97d5956ccb'
- The hostname in the certificate matches 'elpa.gnu.org'.
- Peer's certificate issuer is unknown
- Peer's certificate is NOT trusted
- Version: TLS1.2
- Key Exchange: DHE-RSA
- Cipher: AES-128-CBC
- MAC: SHA1
- Compression: NULL
- Handshake was completed

- Simple Client Mode:

- Peer has closed the GnuTLS connection

Is the duplication considered "out of order?" because I've seen references to out-of-order certificates in the chain not being supported by gnutls-cli. And indeed,

# gnutls-cli -v
gnutls-cli (GnuTLS) 2.12.23
Packaged by Debian (2.12.23-12ubuntu2.8)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Nikos Mavrogiannopoulos.
#
# gnutls-cli --x509cafile /etc/ssl/certs/ca-certificates.crt -p 443 elpa.gnu.org
Processed 148 CA certificate(s).
Resolving 'elpa.gnu.org'...
Connecting to '208.118.235.89:443'...
*** Verifying server certificate failed...
*** Fatal error: Error in the certificate.
*** Handshake has failed
GnuTLS error: Error in the certificate.

@bbatsov
Copy link
Member

bbatsov commented Dec 10, 2017

Might be best to post this to the Emacs dev mailing list, as the maintainers of GNU ELPA are all there and are probably way more knowledgable on the subject than me.

@gonewest818
Copy link
Contributor Author

gonewest818 commented Dec 10, 2017 via email

@gonewest818
Copy link
Contributor Author

... and success on Ubuntu 16.04 which happens to use gnutls-cli 3.4.10. It suggests how we're going to fix this.

root@6f7cef996b3a:/# gnutls-cli -v
gnutls-cli 3.4.10
Copyright (C) 2000-2016 Free Software Foundation, and others, all rights reserved.
This is free software. It is licensed for use, modification and
redistribution under the terms of the GNU General Public License,
version 3 or later <http://gnu.org/licenses/gpl.html>


Please send bug reports to:  <[email protected]>
root@6f7cef996b3a:/# 
root@6f7cef996b3a:/# gnutls-cli --x509cafile /etc/ssl/certs/ca-certificates.crt -p 443 elpa.gnu.org
Processed 148 CA certificate(s).
Resolving 'elpa.gnu.org'...
Connecting to '208.118.235.89:443'...
- Certificate type: X.509
- Got a certificate list of 3 certificates.
- Certificate[0] info:
 - subject `CN=elpa.gnu.org', issuer `C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3', RSA key 2048 bits, signed using RSA-SHA256, activated `2017-12-02 10:00:36 UTC', expires `2018-03-02 10:00:36 UTC', SHA-1 fingerprint `dd8020dc5cdb1d4e9c331f9044ec57a0928e7b97'
	Public Key ID:
		a055226618cb098619db153e7d847d0f2637b836
	Public key's random art:
		+--[ RSA 2048]----+
		|++.o*..oo.       |
		|+=.B o.++ *      |
		|. = o + .* +     |
		|     + oE   .    |
		|    .  .S.       |
		|                 |
		|                 |
		|                 |
		|                 |
		+-----------------+

- Certificate[1] info:
 - subject `CN=elpa.gnu.org', issuer `C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3', RSA key 2048 bits, signed using RSA-SHA256, activated `2017-12-02 10:00:36 UTC', expires `2018-03-02 10:00:36 UTC', SHA-1 fingerprint `dd8020dc5cdb1d4e9c331f9044ec57a0928e7b97'
- Certificate[2] info:
 - subject `C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3', issuer `O=Digital Signature Trust Co.,CN=DST Root CA X3', RSA key 2048 bits, signed using RSA-SHA256, activated `2016-03-17 16:40:46 UTC', expires `2021-03-17 16:40:46 UTC', SHA-1 fingerprint `e6a3b45b062d509b3382282d196efe97d5956ccb'
- Status: The certificate is trusted. 
- Description: (TLS1.2)-(ECDHE-RSA-SECP256R1)-(AES-128-GCM)
- Session ID: 04:88:EA:6F:E9:C3:76:AC:F0:13:D3:B9:38:43:68:87:08:B7:1A:BC:9B:B2:B4:C1:98:B8:6D:AB:0F:7D:74:9E
- Ephemeral EC Diffie-Hellman parameters
 - Using curve: SECP256R1
 - Curve size: 256 bits
- Version: TLS1.2
- Key Exchange: ECDHE-RSA
- Server Signature: RSA-SHA256
- Cipher: AES-128-GCM
- MAC: AEAD
- Compression: NULL
- Options: safe renegotiation,
- Handshake was completed

- Simple Client Mode:

@gonewest818
Copy link
Contributor Author

I'm getting close to a solution. Basically the idea is to install a newer version of gnutls-cli for the builds, because the default version that ships in Ubuntu 14.04 is out of date.

The actual implementation is a little more involved, unfortunately, because I'm having to build gnutls-cli and one of its dependencies from source, and I'm trying to cache those builds to avoid excessive CI turnaround times.

gonewest818 added a commit to gonewest818/cider that referenced this issue Dec 11, 2017
bbatsov pushed a commit that referenced this issue Dec 11, 2017
Fixes the Travis CI build errors reported in #2120.

Diffs to .travis.yml are as follows:

(1) I've added some explicit package dependencies into addons.apt.packages. This is basically the compiler toolchain needed to build gnutls from source

(2) Configured $HOME/local to be cached by Travis between subsequent builds. This is done with the setting cache.directories.

See Travis documentation for further information about the caching of data. Basically, the contents of that folder are rolled up into a tar ball and saved to an S3 bucket whenever the contents change. The tarball is retrieved and installed prior to the start of your CI job.

(3) Added a utility script named travis-ci/travis-gnutls.sh which handles the downloading and compiling of the sources.

The script is organized so that the version of gnutls can be easily updated whenever needed. Just edit the version numbers in that script and push the change. The script is able to detect when it doesn't have the specific version requested, and when that happens it deletes the cache and rebuilds gnutls.

(4) While I was at it, I added emacs 25.3-travis and 26-pretest-travis to the build matrix. Note that git-snapshot-travis now reports itself as being version 27. (Also note, emacs 26 and 27 are still broken builds, but at least now it's not because of the build script itself.)
@bbatsov
Copy link
Member

bbatsov commented Dec 11, 2017

Btw, another approach would be to just use own docker image https://docs.travis-ci.com/user/docker/ Might be better in the long run, and simpler perhaps.

@timvisher
Copy link

On ubuntu 14.04 you may also be able to run apt update-ca-certs -f if the cert validation is what's failing.

@gonewest818
Copy link
Contributor Author

Hi @timvisher it was a combination of factors including ca-certificates and also the version of gnutls-cli. See #2128 for details.

@timvisher
Copy link

Cool. Nevermind me. Just lurking and wishing I could help more. :)

@gonewest818
Copy link
Contributor Author

@bbatsov, ok to close this?

@bbatsov bbatsov closed this as completed Dec 12, 2017
@bbatsov
Copy link
Member

bbatsov commented Dec 12, 2017

Sure.

@xiongtx
Copy link
Member

xiongtx commented Dec 12, 2017

ok to close this?

Is this solved now or...?

@gonewest818
Copy link
Contributor Author

Well yes, the TLS certificate / trust issue is fixed. That was a combination of a missing ca-certificates.crt file, and an older version of gnutls-cli being installed.

However there are still intermittent network issues causing builds to fail. These issues manifest themselves as disconnects during the download (or update) of other dependencies in the build. That's ticket #2129, and I'm working on that too.

@gonewest818
Copy link
Contributor Author

For historical reference:

According to this email the format of the TLS cert on elpa.gnu.org was fixed by the FSF server administrators. In principle this means we could attempt to remove the custom "install-gnutls.sh" from the Travis CI builds.

But that's probably irrelevant if we move to CircleCI, as I suspect we will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants