Treating hostnames as regexp is not that safe #7

amaembo · 2014-05-21T01:30:15Z

Probably this should be an optional feature (turned off by default). Otherwise there's a big risk of misconfiguration. Imagine that I whitelisted the only domain "myapp.mydomain.com". Then attacker registers a new domain, say, "myapp1mydomain.com", which passes domain check and allows cURL to download from there. It can be properly whitelisted using "myapp.mydomain.com", but I'm sure many users will not do this.

wkcaj · 2014-05-21T12:55:11Z

Hey,

Cheers for raising this. There have been a few issues with the use of regexes, so I think a new strategy should be used.

One idea is to do straight string matching:

//Normalise URL
$url = trim(strtolower($url));
$domain = 'safecurl.fin1te.net';

if ($url == $domain || $url == $domain . '.') {
    //Invalid URL
}

However, this means that a developer would have specify every domain they want blacklisted. This maybe a problem if they have say 100+ domains. Also, there would be issues with UTF-8.

We could generate a regex for the user, which might work:

$domain = 'safecurl.fin1te.net';
$domain = str_replace('.', '\.', $domain); //safecurl\.fin1te\.net

$regex = '/' . $domain . '\.?/';

if (preg_match($regex, $url)) {
    //Invalid URL
}

What do you think of this plan?

Cheers,
Jack

amaembo · 2014-05-21T14:35:12Z

I guess, it would be better to support both ways: either regexp or direct match (including trailing dot as you suggested). But for regexp mode it should be up to user to specify proper regexp. Regexp may start normally with '/' as domain name doesn't contain '/' for sure. So configuration might be like this:

$options->setList('whitelist', ['mydomain.com', '/.+\.mydomain\.com\.?/'], 'domain');

So in regexp mode it's user's responsibility to make everything correct, as this is somewhat advanced feature. I don't like the idea to "patch" user's regexp as there are cases when using regexp syntax might be actually useful. For example if user has many subdomains like sub1.mydomain.com, sub2.mydomain.com, he may want to specify '/sub..mydomain.com.?/'. In this case replacing every dot would be bad idea.

An alternative to regexp would be wildcards syntax using fnmatch PHP function ( http://php.net/fnmatch ). It has simpler syntax and may be more convenient for some users. Also it doesn't treat dots specially. So the above example would look like this (it's assumed that trailing dot possibility is also checked):

$options->setList('whitelist', ['mydomain.com', '*.mydomain.com'], 'domain');

justinsteven · 2019-10-04T09:37:14Z

A fix for the regex whitelist/blacklist bypass bug is in #25

A non-regex solution for the subdomain matching would be to mimic the CSP syntax.

A specification of company.com would match exactly company.com.

A specification of *.company.com would match all subdomains (and subdomains of subdomains etc.) of company.com but not company.com itself.

See https://stackoverflow.com/a/44857848

We could do:

        if ($type == 'domain') {
            // For domains, a case-insensitive match and subdomain check is needed
            foreach ($this->{$list}[$type] as $domain) {
                if (substr($domain, 0, 2) === '*.') {
                    // Listed domain starts with '*.' so we do a subdomain match

                    // Remove the leading '*'
                    $domain_suffix = substr($domain, 1);

                    $match = true;

                    if (strlen($value) < strlen($domain_suffix)) {
                        # this catches company.com not being a subdomain of *.company.com
                        $match = false;
                    }

                    // do a case-insensitive ends-with check
                    if (substr_compare(
                        $value,
                        $domain_suffix,
                        -strlen($domain_suffix),
                        max(strlen($domain_suffix), strlen($value) - 1),
                        True) !== 0) {
                            $match = false;
                    };

                    if ($match) {
                        // Both checks passed. Subdomain match found.
                        return true;
                    }

                } else {
                    // Listed domain doesn't start with '*.' so we do a case-insensitive domain check
                    if (!strcasecmp($domain, $value)) {
                        return true;
                    }
                }
            }

            return false;
        } else {
            // For non-domains we can do a simple in_array check
            return (in_array($value, $this->{$list}[$type]));
        }
    }

(This got a bit out of hand)

It causes the following tests to pass:

    public function testUrlSubdomainWhitelistMatchesSubdomain() {
        $options = new Options();
        $options->setList('whitelist', array('*.google.com'), 'domain');
        $this->expectNotToPerformAssertions();  // As long as this doesn't throw we're happy
        URL::validateUrl('http://accounts.google.com', $options);
    }

    public function testUrlSubdomainWhitelistDoesntMatchDomain() {
        $options = new Options();
        $options->setList('whitelist', array('*.google.com'), 'domain');
        $this->expectException(InvalidDomainException::Class);
        URL::validateUrl('http://google.com', $options);
    }

This would not satisfy the sub1.mydomain.com through sub9.mydomain.com case - but I think it strikes a nice balance without having to drag regex into this :^)

It would also allow a developer to specify *company.com which is probably a bad idea, but if it's good enough for CSP it's good enough for us?

Thoughts?

Edit: I don't know if this handles UTF-8

This was referenced May 21, 2014

Bypass host blacklist check by adding dot in the end of the hostname #6

Closed

If url schema in capital letters it wont work #9

Open

wkcaj added the bug label May 21, 2014

justinsteven mentioned this issue Oct 4, 2019

Vulnerability: Domain whitelist/blacklist checking is done via regex, can be bypassed #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treating hostnames as regexp is not that safe #7

Treating hostnames as regexp is not that safe #7

amaembo commented May 21, 2014

wkcaj commented May 21, 2014

amaembo commented May 21, 2014

justinsteven commented Oct 4, 2019 •

edited

Loading

Treating hostnames as regexp is not that safe #7

Treating hostnames as regexp is not that safe #7

Comments

amaembo commented May 21, 2014

wkcaj commented May 21, 2014

amaembo commented May 21, 2014

justinsteven commented Oct 4, 2019 • edited Loading

justinsteven commented Oct 4, 2019 •

edited

Loading