-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong probe_ssl_earliest_cert_expiry value for certificates with multiple trust chains #340
Comments
The metric purposefully works off the cert that expires first, so that you don't get caught out by something in your chain expiring before the main cert. It doesn't understand multiple trust chains, and it's not clear to me if we should change this. |
We have the same issue when Chrome and OpenSSL shows one date, but blackbox shows date earlier. |
It will be great to have another metric to return probe_ssl_cert_expiry to have the same date that we have now in all browsers/openssl. |
So turns out this is definitely still an issue - has there been any discussion in terms of how to move forward? The behavior as-is is basically useless. Edit: I understand the use case for maybe a private CA, but for people trying to monitor externally issued certs, this probably isn't helpful. I'm very interested in a solution that just adds a new value for the end of the chain, and would be willing to work on a patch if one isn't started and that seems a reasonable path forward that would be accepted. |
For those of you interested in only alerting on specific certificates in the chain, that can be achieved with: https://github.com/ribbybibby/ssl_exporter. |
…data Extend information about last cert of chain
https://thesslonline.com/blog/sectigo-addtrust-external-ca-root-expiring-may-30-2020
A lot of certificates use |
No, sadly
…On Thu, Apr 30, 2020, 05:48 Karan Sharma ***@***.***> wrote:
https://thesslonline.com/blog/sectigo-addtrust-external-ca-root-expiring-may-30-2020
Sectigo operates a root certificate named the AddTrust External CA Root
used to establish cross-certificates to Sectigo’s modern root certificates,
the COMODO RSA Certification Authority and USERTrust RSA Certification
Authority. Until 2038, those roots do not expire.
The AddTrust External CA Root, however, expires on May 30th 2020.
After this date, clients and browsers will be chaining back to the modern
roots used to cross sign with the older AddTrust. No errors will be shown
on any patched, existing or modified system or network.
A lot of certificates use Sectigo AddTrust External CA in their chain and
it's going to expire next month. Is there a way to disable the checks on
these kind of intermediate certs?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#340 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AENJAJ3EXHMODTY645ZGZVDRPFQSTANCNFSM4FIDOBAQ>
.
|
From the blackbox exporter standpoint that is a true positive, and you should look at either updating that cert or removing it if it's no longer relevant. |
That isn't always an option, though. It even says in the documentation:
That doesn't mean we should have to upgrade our (still valid) certs. This is a shortcoming, and frankly an issue, with blackbox at this point. |
If you're sure it's not a problem then I suggest silencing the alert for a month, and then removing it at that point. |
It isn't just going to be a month, though, and it still means that other certs in the chain could be missed. I just don't understand your resistance to just adding some functionality to only report on the end of the chain, which a lot of people have asked for, provided patches for, etc. If this many people are asking, it's clearly sought after and would be helpful. As a sysadmin, I can tell you, this is a HUGE limiting factor in needing to run blackbox vs. having to export our own SSL metrics. It's not possible to always control certs in the way you describe. They are issued to us, often on a per-cert basis, and we aren't looking to renew 200 certs just because blackbox is going to yell at us for another year about them. That isn't a good use of money. |
I have yet to receive a patch which would avoid the situation where "other certs in the chain could be missed.", as all proposed patches thus far only looked at the last cert in the file.
It's not something I've done myself, but I believe you can remove the particular certs from the set returned to clients as each individual cert is self-contained - which will also reduce bandwidth usage. |
Right, and in 90% of cases, that's what we want to monitor as sysadmins - the last cert in the file or at the end of the trust chain. I don't care if another cert in the chain is going to expire, so I would disable alerting for that entirely, because they're outside of my purview or control. Certs expire in certain parts of chains, especially when bridging is occurring, just like in this scenario, all the time when talking about certs that come from huge places like Sectigo/Comodo. Some people would still find it useful, so I don't propose getting rid of the functionality at all. Manual cert manipulation isn't worth it to save the kiliobytes of bandwidth, load time, etc. It's more risky that you will break something or return something a client will find invalid. We get certs delivered to us as a bundle that we stick into place and call it good until it's time to renew. That's the industry behavior. Edit: it just seems unreasonable to not ADD functionality to an already great tool that would make it better for a lot of users and completely not impact people who don't want to use it vs. asking people to change their entire config management and cert management paradigms. Truly, not trying to be argumentative. I want to find a solution to this, but I don't understand your opposition, and I would like to. |
I'm not willing to accept a feature that enables users to purposefully ignore a definitive upcoming breakage of their application. To be clear I have rejected PRs that only look at the last cert in the list, what no one has sent me is a PR that provides the time when the last chain will expire. |
In the case of the presented issue this morning, it will NOT break almost anyone using the certs. Are you saying if I submit a PR that actually looks at the time when the last chain will expire you will consider it? I am more than willing to put in the time. It is worth noting, though, that a majority of server certs are the last in the list because most software requires that to be the order. Root -> intermediates -> server cert. |
Yes. |
Do you have a specific test case where the last cert in the list is NOT the end of the chain for reference so I have something to test against? Would you agree that the following code will return the entire chain as presented by the server, the equivalent of doing an openssl s_client -connect. In many cases, this is going to mirror the same data as what people have presented in PRs before:
|
My point was more that those PRs weren't considering intermediaries. Based on your responses above it sounds like you already have a good selection of potential test cases available to you. I'd have to dig into the code, but I expect the challenge here will be what to do after you have all the certs that were returned. |
Any intermediates would be accounted for in the already existing metric,
though, right? The expiry of every cert in the chain with the snippet
provided is available, and the chain is in order so one can easily
determine which cert is truly last.
…On Thu, Apr 30, 2020 at 5:50 PM Brian Brazil ***@***.***> wrote:
My point was more that those PRs weren't considering intermediaries. Based
on your responses above it sounds like you already have a good selection of
potential test cases available to you.
I'd have to dig into the code, but I expect the challenge here will be
what to do after you have all the certs that were returned.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#340 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AENJAJ4KKGCN6K6CYKWPJODRPIFGDANCNFSM4FIDOBAQ>
.
|
All potential intermediates will be accounted for. What you're looking to implement is to only check one chain of intermediates. Remember that there's no requirement that the last cert in a chain be the first one to expire, which is what the blackbox exporter is trying to protect you against. |
I think what needs to happen is that we report on expiration of certificate paths, rather than the certificate itself, and have an option whether the expiration of a path triggers an alert. Right now I have to remove the alert because the legacy chain cert is expiring, but the 2nd path is still going to be fine. Here's some more info for the Sectigo certs: https://support.sectigo.com/Com_KnowledgeDetailPage?Id=kA03l00000117LT I'm surprised this hasn't been a larger issue for people. |
Looking at the docs, Go already derives the chains for us in VerifiedChains so this shouldn't be too difficult to code from there. |
I think an important topic is missing from this discussion: availability of trust achors.
The thing is that the With C as a valid trust anchor, an SSL client that has to investigate the validity of the certificate chain, will look at The fact that So what would be correct behavior? What could be done is to only check the certificate tree up to and including the level of a trust root that is available on the system. In the example case from above: only check Because it might be hard to make sure that blackbox has the same perspective on valid trust anchors as the visiting clients have, a better option might be some exclude list to flag those certificates that should not be considered in the check. Blackbox could even provide this list for well-known cases like the mentioned My $0.02, I hope they can help this discussion forward. |
I had thought a bit about this, and you can already specify which CAs you'd like Go to use which seems to me like it'd cover this sort of case (in both directions). |
Is that an answer to my post? I'm not talking about whitelisting CA certificates there. In fact, my proposal was to allow blacklisting CA certificates, for which it's known that they are going to expire, but which will not disappear quickly since they were used for cross signing like With the example chain that I mentioned in mind: about everybody in the world has the trust anchor for |
It is. The blackbox exporter already allows you to specify exactly the set of CA roots you want to apply to a tls connection. |
I would like to be able to specify exactly what set of CA roots not to consider when computing the minimum expiry date for a certificate chain. By specifying that I want to apply the COMODO root certificate, and by not specifying the AddTrust CA root, the But well, it seems one of us is missing the point here. |
I really like this idea. @brian-brazil, what are your thoughts in the context of what we discussed last week? |
That's exactly the sort of new metric I indicated I'd accept a PR for. |
New metric? I'd say this is the fixed version of the existing metric @brian-brazil I think if you'd poll your users "what do you expect to gain from this metric? A) the first upcoming expiry of any of the currently valid certificate chains B) the expiry time at which my users will run into problems connecting", that they will go massively for B). |
Yes, a new metric. |
It describes what the metric does, it does not do what people expected from it. "Earliest" was likely chosen as a description for "the expiry date for the first certificate in the full certificate chain that will expire" (that is what the code reflects as well). The thing is that there are multiple chains to consider and your code disregards this. However, you know seem to argue that "earliest" is all of a sudden the semantic for "the expiry date for the first certificate that will expire in the first certificate chain that will expire". The semantics have been modified to match the current (unexpected) behavior. If you want to go for your strict semantics, then it's time to fix your code I'd say, since you're only considering a single certificate chain and not both chains to see which one wins the "earliest expiry" price. Once you've done that, you'll have produced a rather useless metric, since nobody cares that for example end of May one of the chains will expire, while actual problems will only occur in 2038. Your users are interested in 2038 here. If you don't care about what your users expect from this metric, then have fun with that. |
As a general thing metric names should describe what they do. If a metric name correctly describes what it does but doesn't do what a user expects, then that's a case for finding/adding a metric that does - not creating a confusing situation where the metric doesn't match its name. In this particular case the current metric name says that it is the earliest cert that expires, which is what it does. If there is to be a metric that looks at chains instead of certs, and looking for the latest rather than the earliest, then that needs a new metric name as |
I get your point, don't worry. It's just a very bureaucratic point. If I would follow your line of strict reasoning, then the current metric should be fully dropped, in favor of a new metric Following strics semantics, you cannot state that I'm sure you will not agree. |
I'd suggest that if you want a resolution to this issue that preparing a PR along the lines discussed above would be more productive than debating metric naming best practices. |
I'm really not the one starting semantics debates here. If @giganteous wants to finish a pull request to provide a different metric altogether, that would be great. But could you please state what would be the one and only possible correct name for such metric in your world? I'm afraid that otherwise a PR might simply lead to yet another discussion about metric naming, instead of an actual fix for this issue. |
What about making blackbox_exporter.yml
default behavior for I'm not a developer, therefor I cannot provide any code/PRs. Just an Idea. |
A given metric should only have one meaning, and adding configurability would only make things worse as now you have to guess what a given metric means (in addition to the metric now being sometimes misleadingly named). If you want a different meaning you have to use a different metric name. |
The idea of making the behavior configurable seems a good way of handling things to me. The proposed naming doesn't cover the issue at hand though I think. The case here is that there are multiple valid chains, all with valid certificates in them, but one of those chains has a certificate in it that is going to expire soon. Chain A: expires by the end of this month "expires" here means "earliest expiry date", since we're looking at all certificates in each of these two chains to decide which one is going to expire the soonest within each chain. It doesn't matter if it's a root certificate, intermediate certificate or a host certificate. Once one level of the chain expires, the whole chain becomes invalid. Hence, the issue was created because we are running into warnings that the certificate chain A is soon to expire. However, the real expiry date for the available chains is in 2038, so the sky is blue and all is good for now. No need for warnings. Back to the proposal of configuring the behavior Based on the stuff I wrote above, the problem does not lie within the logic that determines "earliest expiry date for a chain". The full chain is checked and that is fine. The problem lies within the absence of logic when there are multiple chains available. The configuration option would be able to tell the exporter to either:
A good key and value for configuring the behavior doesn't pop up right away. That doesn't matter too much as I highly doubt that Brian will accept the idea of changing behavior for determining a metric. |
Yet, something like I am starting to understand why this project has way more forks than pull requests. |
If neither configuration nor changing the behavior of
This one only shows the expiry of the first certificate in the chain. That is the one that most of us can control like mentioned in this thread some time ago. |
Ignoring part of the chain and only looking at the host certificate would be possible, however it's cleaner to actually validate all chains and use the expiry date of the chain that will expire last (since the user's browser does the same thing). The term |
I was affected by this issue. What I did as a workaround was to edit the PEM file to remove the intermediate certificates expiring on 2020-05-30. (Using the BEGIN CERTIFICATE / END CERTIFICATE separators and (Just in case somebody finds this useful after a google search). |
Moved to SSL exporter to validate SSL connection. Thank you @ribbybibby for the link. |
Resolves prometheus#340 Based on disscution in the issue above, this metric will help determine when the SSL/TLS certificate expiration error actually happens on clients like a browser that attempts to verify certificates by building one or more chains from peer certificates.
Resolves prometheus#340 Based on discussion in the issue above, this metric will help determine when the SSL/TLS certificate expiration error actually happens on clients like a browser that attempts to verify certificates by building one or more chains from peer certificates.
Resolves prometheus#340 Based on the discussion in the issue above, this metric will help determine when the SSL/TLS certificate expiration error actually happens on clients like a browser that attempts to verify certificates by building one or more chains from peer certificates.
Resolves prometheus#340 Based on the discussion in the issue above, this metric will help determine when the SSL/TLS certificate expiration error actually happens on clients like a browser that attempts to verify certificates by building one or more chains from peer certificates.
Resolves prometheus#340 Based on the discussion in the issue above, this metric will help determine when the SSL/TLS certificate expiration error actually happens on clients like a browser that attempts to verify certificates by building one or more chains from peer certificates. Signed-off-by: Takuya Kosugiyama <[email protected]>
* Add new probe_ssl_latest_verified_chain_expiry metric Resolves #340 Based on the discussion in the issue above, this metric will help determine when the SSL/TLS certificate expiration error actually happens on clients like a browser that attempts to verify certificates by building one or more chains from peer certificates. Signed-off-by: Takuya Kosugiyama <[email protected]>
Thanks a lot for this :) Can we expect a release with this commit sometime soon? |
I'm planning on getting #635 in and then releasing, so hopefully in the next week or so. |
We try to monitor cert expiration date for this certificate: tvonline.swb-gruppe.de
With openssl and using Google Chrome browser we get an expected value (Oct 26 23:59:59 2019).
echo | openssl s_client -servername tvonline.swb-gruppe.de -connect tvonline.swb-gruppe.de:443 2>/dev/null | openssl x509 -noout -dates notBefore=Oct 26 00:00:00 2017 GMT notAfter=Oct 26 23:59:59 2019 GMT
But with blackbox_exporter we get:
probe_ssl_earliest_cert_expiry 1.534824e+09
which is GMT: Tuesday, 21 August 2018 04:00:00.
It seems a similar problem is described here:
https://security.stackexchange.com/questions/66487/what-happens-when-certificates-further-up-the-chain-expires-before-mine-equifa
It's recommended to update openssl to v1.0.2 there which has a fix for this issue but I guess golang/blackbox_exporter use some other mechanism to work with SSL. Is there any workaround or fix for this issue?
The text was updated successfully, but these errors were encountered: