Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix for domain_enforces_https() logic #192

Merged
merged 4 commits into from
May 21, 2019

Conversation

jsf9k
Copy link
Member

@jsf9k jsf9k commented May 3, 2019

Remove an extraneous call to is_http_redirect_domain(domain). It's unclear to me why that function call is there, but it is not needed. When this logic was originally simplified by me in #180, the existing logic was:

return is_domain_supports_https(domain) and (
    is_strictly_forces_https(domain) and
    (
        is_defaults_to_https(domain) or
        is_redirect(domain)
    ) or (
        (not is_strictly_forces_https(domain)) and
        is_defaults_to_https(domain)
    )
)

For some reason the extra call to is_http_redirect_domain(domain) was added in commit a65544b. With this bugfix, the logic is now reverted back to:

return is_domain_supports_https(domain) and (
    is_defaults_to_https(domain) or (
        is_strictly_forces_https(domain) and is_redirect_domain(domain)
    )
)

A quick-but-tedious logic test using the eight possible values of A = is_strictly_forces_https(domain), B = is_defaults_to_https(domain), and C = is_redirect(domain) shows that the old and new expressions are now in agreement.

Another way to look at this is to notice that the old logic can be written as (remembering that and has higher precedence than or):

(A and (B or C)) or (not A and B)

Note that if B is True then the statement becomes (A and True) or (not A and True), which reduces to True regardless of the value of A. If B is false then (not A and B) evaluates to False, and the larger statement can only be true if A and (False or C) evaluates to True, which would mean that A and C is True. Therefore the original statement must be equivalent to

B or (A and C)

This was noticed thanks to a comment by @climber-girl.

jsf9k added 2 commits May 3, 2019 11:08
It's unclear to me why that function call is there, but it is not
needed.  When this code was last modified, the existing logic was:

    return is_domain_supports_https(domain) and (
        is_strictly_forces_https(domain) and
        (
            is_defaults_to_https(domain) or
            is_redirect(domain)
        ) or (
            (not is_strictly_forces_https(domain)) and
            is_defaults_to_https(domain)
        )
    )

With this bugfix, the logic is now:

    return is_domain_supports_https(domain) and (
        is_defaults_to_https(domain) or (
            is_strictly_forces_https(domain) and is_redirect_domain(domain)
        )
    )

A quick logic test using the eight possible values of
is_strictly_forces_https(domain), is_defaults_to_https(domain), and
is_redirect(domain) shows that the old and new expressions are now in
agreement.

This was noticed thanks to a comment by @climber-girl.
@jsf9k jsf9k requested review from h-m-f-t, IanLee1521, bjb28, climber-girl, echudow and a team May 3, 2019 15:45
@jsf9k jsf9k self-assigned this May 3, 2019
@jsf9k
Copy link
Member Author

jsf9k commented May 3, 2019

@echudow, if there is a reason for including the call to is_http_redirect_domain(domain) that I am missing, please chime in.

Copy link
Member

@dav3r dav3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, but let's hear from @echudow in case we are missing out on some particular case that he had in mind.

@echudow
Copy link
Collaborator

echudow commented May 3, 2019

I don't remember all the specifics at the moment, but I think there was a case something along the lines of the site's http endpoint redirected to HTTPS on a different site but also had a live https endpoint on its own site without redirecting, so the domain was was failing the enforce HTTPS check because is_redirect_domain() was False. So I added is_http_redirect_domain() instead. Logically, as long as that's ok, the is_redirect_domain() could be removed and just replaced with is_http_redirect_domain() and the refactor that @jsf9k did should work fine with just replacing is_redirect_domain() with is_http_redirect_domain(). I don't have time now, but I'll try to dig up the specific details over the weekend or next week.

@jsf9k
Copy link
Member Author

jsf9k commented May 3, 2019

I don't remember all the specifics at the moment, but I think there was a case something along the lines of the site's http endpoint redirected to HTTPS on a different site but also had a live https endpoint on its own site without redirecting, so the domain was was failing the enforce HTTPS check because is_redirect_domain() was False. So I added is_http_redirect_domain() instead. Logically, as long as that's ok, the is_redirect_domain() could be removed and just replaced with is_http_redirect_domain() and the refactor that @jsf9k did should work fine with just replacing is_redirect_domain() with is_http_redirect_domain(). I don't have time now, but I'll try to dig up the specific details over the weekend or next week.

Thanks for the input, @echudow!

We probably need @h-m-f-t to weigh in as to whether we should count such a case as "enforcing HTTPS". In the case that caused me to investigate this, the site in question redirected all four endpoints to HTTPS except for the http://www endpoint. That endpoint was being redirected to an HTTP site, but was being given credit for "enforces HTTPS" (because of the presence of is_http_redirect_domain()). In that case I don't think the code was doing the correct thing.

It sounds like in your case the host did not default to HTTPS, in which case I don't think we want to count it as "enforcing HTTPS."

@echudow
Copy link
Collaborator

echudow commented May 7, 2019

Thanks for the input, @echudow!

We probably need @h-m-f-t to weigh in as to whether we should count such a case as "enforcing HTTPS". In the case that caused me to investigate this, the site in question redirected all four endpoints to HTTPS except for the http://www endpoint. That endpoint was being redirected to an HTTP site, but was being given credit for "enforces HTTPS" (because of the presence of is_http_redirect_domain()). In that case I don't think the code was doing the correct thing.

It sounds like in your case the host did not default to HTTPS, in which case I don't think we want to count it as "enforcing HTTPS."

One of the sites that caused me to add the http_redirect was cloudservices.disa.mil. The http endpoint immediately redirects to https://www.milcloud.mil/, but the https endpoint is live and doesn't redirect, and both www endpoints are down, so it isn't a "redirect_domain". It seems to me like it enforces HTTPS though since both live endpoints are either https or redirect to https, but fails the old check because the http endpoint is live so it doesn't default_to_https and it isn't a redirect_domain.

@jsf9k, for the case that you were looking into, shouldn't the site still fail to enforce_https because it doesn't strictly_forces_https since the www endpoint redirects to an HTTP site? strictly_forces_https is in an "and" with the (http_)redirect_domain, so it should still fail the overall domain_enforces_https check.

@jsf9k, sorry for losing your refactor since I was working on an earlier version of the code and then neglected to merge in all the upstream updates. The refactor looks good to me, but I think could be okay with is_http_redirect_domain rather than is_redirect_domain.

@jsf9k
Copy link
Member Author

jsf9k commented May 7, 2019

@echudow - I'm trying to compare our two cases, but the site that caused me to investigate this (malware.us-cert.gov) is now returning a 503 instead of a redirect on the www endpoint. I think that's possibly related to a power/internet issue at one of their physical locations.

I'm told things will be back up tomorrow.

@jsf9k
Copy link
Member Author

jsf9k commented May 10, 2019

@jsf9k, for the case that you were looking into, shouldn't the site still fail to enforce_https because it doesn't strictly_forces_https since the www endpoint redirects to an HTTP site? strictly_forces_https is in an "and" with the (http_)redirect_domain, so it should still fail the overall domain_enforces_https check.

I think you are correct, @echudow. The presence of is_http_redirect_domain() vs is_redirect_domain() does not affect my malware.us-cert.gov example. Digging into that example in more detail, I found the following.

The domain malware.us-cert.govstill results in a domain_enforces_https value of True with the code in develop. This is actually because the canonical domain is an https domain, and therefore is_defaults_to_https() is True. (See here).

Why is the canonical domain an https domain in this case? Well, because it satisfies these rules.

The issue I have is that malware.us-cert.gov is emerging as BOD 18-01 compliant even though it redirects one of the http endpoints to an http site. Maybe this is OK, but @climber-girl and I both think it is suspect. I think either (1) those canonical domain rules need to be updated, or (2) the domain_enforces_https logic needs to be changed, or (3) we need to update the logic for BOD 18-01 compliance.

Right now, for BOD 18-01 compliance we require "supports_https", "domain_enforces_https", "domain_uses_strong_hsts", and "no crappy crypto". Should we change "domain_enforces_https" to "strictly_forces_https" when calculating BOD 18-01 compliance?

@h-m-f-t, can you weigh in here?

@jsf9k
Copy link
Member Author

jsf9k commented May 10, 2019

I think the thing to do is to (1) keep my refactor, but (2) revert back to using is_http_redirect_domain(), so that the code becomes:

return is_domain_supports_https(domain) and (
    is_defaults_to_https(domain) or (
        is_strictly_forces_https(domain) and is_http_redirect_domain(domain)
    )
)

This will have the simplification from the refactor but will give the same results as what is currently in develop. The issue I raised in my previous comment can then be taken up in a separate pull request.

Are you OK with this, @echudow and @h-m-f-t?

@echudow
Copy link
Collaborator

echudow commented May 10, 2019

@jsf9k, @h-m-f-t, I think that would be great. That's what I suggested above and I already made that change in my fork a few days ago.

For the malware.us-cert.gov issue, I think the actual issue isn't is_defaults_to_https() but actually is_domain_supports_https() which checks for downgrades using is_downgrades_https() that only checks the canonical endpoint. I'm not sure that I fully understand the intent of is_domain_enforces_https() and how it is supposed to be different from is_strictly_forces_https(). Is it meant to do the is_strictly_forces_https() check and also check that the https endpoints are valid and don't downgrade? If so, then I think the code should require that by being:

    return is_domain_supports_https(domain) and is_strictly_forces_https(domain) and (
        is_defaults_to_https(domain) or is_http_redirect_domain(domain)
    )

@h-m-f-t
Copy link
Member

h-m-f-t commented May 13, 2019

Hmm, yeah, malware.us-cert.gov should not be True for Domain Enforces HTTPS because the httpwww endpoint doesn't eventually land on an https hostname.

image

IIRC, the difference between Domain Enforces HTTPS and Strictly Enforces HTTPS was one of the immediacy of redirects.

  • Domain Enforces HTTPS should be False when there are endpoints that don't eventually force the user onto HTTPS, which could be on a hostname different than the one being scanned
  • Strictly Enforces HTTPS is when the two HTTP endpoints are down or immediately redirect to HTTPS.

@echudow
Copy link
Collaborator

echudow commented May 15, 2019

So, should we change Domain Enforces HTTPS to require Strictly Forces HTTPS so that the check for immediate redirects to HTTPS will apply to all endpoints rather than how it is now where it just applies to the canonical endpoint? What about if further in the redirect chain an endpoint downgrades the HTTPS to HTTP, should that cause it to fail Domain Enforces HTTPS? Should Domain Enforces HTTPS fail if any of the endpoints redirect externally to non-HTTPS sites?

Are we good with the change from is_redirect_domain(domain) to is_http_redirect_domain(domain)?

@jsf9k
Copy link
Member Author

jsf9k commented May 16, 2019

So, should we change Domain Enforces HTTPS to require Strictly Forces HTTPS so that the check for immediate redirects to HTTPS will apply to all endpoints rather than how it is now where it just applies to the canonical endpoint? What about if further in the redirect chain an endpoint downgrades the HTTPS to HTTP, should that cause it to fail Domain Enforces HTTPS? Should Domain Enforces HTTPS fail if any of the endpoints redirect externally to non-HTTPS sites?

Are we good with the change from is_redirect_domain(domain) to is_http_redirect_domain(domain)?

@echudow, I think changing Domain Enforces HTTPS to require Strictly Forces HTTPS sounds like a good idea. I'm also good with changing is_redirect_domain(domain) to is_http_redirect_domain(domain). I'd like to get @h-m-f-t's opinion, though, since most of this logic was put in place before my time. I'm not sure if there are specific corner cases they were trying to handle.

Copy link
Member

@h-m-f-t h-m-f-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd recommend a test run to check for surprises.

@jsf9k
Copy link
Member Author

jsf9k commented May 16, 2019

LGTM. I'd recommend a test run to check for surprises.

@h-m-f-t, what do you think about changing Domain Enforces HTTPS to require Strictly Forces HTTPS? That change hasn't been made yet.

@echudow
Copy link
Collaborator

echudow commented May 16, 2019

@jsf9k, can you run a test using 46954e3 to see which sites would now fail Domain Enforces HTTPS so we can look into if those results make sense and we should require Strictly Forces HTTPS or if the results don't make sense so we shouldn't require Strictly Forces HTTPS?

@jsf9k
Copy link
Member Author

jsf9k commented May 19, 2019

@echudow, I will kick that test off on Monday. I should have results Tuesday.

@jsf9k
Copy link
Member Author

jsf9k commented May 21, 2019

Looking at the results of the full test run, I still think changing Domain Enforces HTTPS to Strictly Forces HTTPS makes sense. The sites whose score changes appear to be mostly ones where, for example, the http://example.gov endpoint redirects to http://www.example.gov, which in turn redirects to https://www.example.gov. I think it's OK to require folks to tighten up those redirects and redirect immediately to an HTTPS endpoint.

@jsf9k jsf9k merged commit 7880f2b into develop May 21, 2019
@jsf9k jsf9k deleted the bugfix/domain_enforces_https_logic branch May 21, 2019 21:36
@h-m-f-t
Copy link
Member

h-m-f-t commented May 22, 2019

Time, how does it work 🕦

Layer 8 comment: frustration could be expressed by those whose scores this negatively changes. "I haven't changed anything; why don't I pass now?"

Prior communication (sent generally) about the change can help lessen frustration and give notice that action needs to be taken.

@climber-girl
Copy link

Another example that can be used to check logic changes during tests is studentsabroad.state.gov, which should be failing Supports HTTPS because it’s https-www endpoint should be flagging for BadHostname (cert is issued to *.state.gov, which wouldn't cover www.studentsabroad.state.gov).
However, a manual pshtt scan through docker that I just ran shows it as meeting Supports HTTPS because the plain domain endpoints are good to go.

$ curl --head https://www.studentsabroad.state.gov
curl: (51) SSL: no alternative certificate subject name matches target host name 'www.studentsabroad.state.gov'

@DOS-cyber
Copy link

DOS-cyber commented May 22, 2019 via email

jsf9k added a commit to cisagov/lambda_functions that referenced this pull request May 24, 2019
The latest version of pshtt tightens up the is_domain_supports_https()
logic.  See cisagov/pshtt#192 for details.
jsf9k added a commit to 18F/domain-scan that referenced this pull request May 24, 2019
The latest version of pshtt tightens up the logic in the
is_domain_supports_https() method. See cisagov/pshtt#192 for details.
cisagovbot pushed a commit that referenced this pull request Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants