Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treating hostnames as regexp is not that safe #7

Open
amaembo opened this issue May 21, 2014 · 3 comments
Open

Treating hostnames as regexp is not that safe #7

amaembo opened this issue May 21, 2014 · 3 comments
Labels

Comments

@amaembo
Copy link

amaembo commented May 21, 2014

Probably this should be an optional feature (turned off by default). Otherwise there's a big risk of misconfiguration. Imagine that I whitelisted the only domain "myapp.mydomain.com". Then attacker registers a new domain, say, "myapp1mydomain.com", which passes domain check and allows cURL to download from there. It can be properly whitelisted using "myapp.mydomain.com", but I'm sure many users will not do this.

@wkcaj
Copy link
Owner

wkcaj commented May 21, 2014

Hey,

Cheers for raising this. There have been a few issues with the use of regexes, so I think a new strategy should be used.

One idea is to do straight string matching:

//Normalise URL
$url = trim(strtolower($url));
$domain = 'safecurl.fin1te.net';

if ($url == $domain || $url == $domain . '.') {
    //Invalid URL
}

However, this means that a developer would have specify every domain they want blacklisted. This maybe a problem if they have say 100+ domains. Also, there would be issues with UTF-8.

We could generate a regex for the user, which might work:

$domain = 'safecurl.fin1te.net';
$domain = str_replace('.', '\.', $domain); //safecurl\.fin1te\.net

$regex = '/' . $domain . '\.?/';

if (preg_match($regex, $url)) {
    //Invalid URL
}

What do you think of this plan?

Cheers,
Jack

@wkcaj wkcaj added the bug label May 21, 2014
@amaembo
Copy link
Author

amaembo commented May 21, 2014

I guess, it would be better to support both ways: either regexp or direct match (including trailing dot as you suggested). But for regexp mode it should be up to user to specify proper regexp. Regexp may start normally with '/' as domain name doesn't contain '/' for sure. So configuration might be like this:

$options->setList('whitelist', ['mydomain.com', '/.+\.mydomain\.com\.?/'], 'domain');

So in regexp mode it's user's responsibility to make everything correct, as this is somewhat advanced feature. I don't like the idea to "patch" user's regexp as there are cases when using regexp syntax might be actually useful. For example if user has many subdomains like sub1.mydomain.com, sub2.mydomain.com, he may want to specify '/sub..mydomain.com.?/'. In this case replacing every dot would be bad idea.

An alternative to regexp would be wildcards syntax using fnmatch PHP function ( http://php.net/fnmatch ). It has simpler syntax and may be more convenient for some users. Also it doesn't treat dots specially. So the above example would look like this (it's assumed that trailing dot possibility is also checked):

$options->setList('whitelist', ['mydomain.com', '*.mydomain.com'], 'domain');

@justinsteven
Copy link

justinsteven commented Oct 4, 2019

A fix for the regex whitelist/blacklist bypass bug is in #25

A non-regex solution for the subdomain matching would be to mimic the CSP syntax.

A specification of company.com would match exactly company.com.

A specification of *.company.com would match all subdomains (and subdomains of subdomains etc.) of company.com but not company.com itself.

See https://stackoverflow.com/a/44857848

We could do:

        if ($type == 'domain') {
            // For domains, a case-insensitive match and subdomain check is needed
            foreach ($this->{$list}[$type] as $domain) {
                if (substr($domain, 0, 2) === '*.') {
                    // Listed domain starts with '*.' so we do a subdomain match

                    // Remove the leading '*'
                    $domain_suffix = substr($domain, 1);

                    $match = true;

                    if (strlen($value) < strlen($domain_suffix)) {
                        # this catches company.com not being a subdomain of *.company.com
                        $match = false;
                    }

                    // do a case-insensitive ends-with check
                    if (substr_compare(
                        $value,
                        $domain_suffix,
                        -strlen($domain_suffix),
                        max(strlen($domain_suffix), strlen($value) - 1),
                        True) !== 0) {
                            $match = false;
                    };

                    if ($match) {
                        // Both checks passed. Subdomain match found.
                        return true;
                    }

                } else {
                    // Listed domain doesn't start with '*.' so we do a case-insensitive domain check
                    if (!strcasecmp($domain, $value)) {
                        return true;
                    }
                }
            }

            return false;
        } else {
            // For non-domains we can do a simple in_array check
            return (in_array($value, $this->{$list}[$type]));
        }
    }

(This got a bit out of hand)

It causes the following tests to pass:

    public function testUrlSubdomainWhitelistMatchesSubdomain() {
        $options = new Options();
        $options->setList('whitelist', array('*.google.com'), 'domain');
        $this->expectNotToPerformAssertions();  // As long as this doesn't throw we're happy
        URL::validateUrl('http://accounts.google.com', $options);
    }

    public function testUrlSubdomainWhitelistDoesntMatchDomain() {
        $options = new Options();
        $options->setList('whitelist', array('*.google.com'), 'domain');
        $this->expectException(InvalidDomainException::Class);
        URL::validateUrl('http://google.com', $options);
    }

This would not satisfy the sub1.mydomain.com through sub9.mydomain.com case - but I think it strikes a nice balance without having to drag regex into this :^)

It would also allow a developer to specify *company.com which is probably a bad idea, but if it's good enough for CSP it's good enough for us?

Thoughts?

Edit: I don't know if this handles UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants